From fyang at openjdk.org Fri Nov 1 00:15:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:15:31 GMT Subject: RFR: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 08:08:05 GMT, Robbin Ehn wrote: >> Hi, please review this small change. >> >> The current max size these two stubs is a bit overestimated and thus is more than needed. >> Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always >> emit 2 instructions for address inside the code cache, we can make the max size more accurate. >> >> Testing on linux-riscv64 platform: >> - [x] tier1-tier3 (release) >> - [x] hotspot:tier1 (fastdebug) > > Seems fine, thanks. @robehn @feilongjiang : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21732#issuecomment-2451051688 From fyang at openjdk.org Fri Nov 1 00:15:32 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:15:32 GMT Subject: Integrated: 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 04:09:57 GMT, Fei Yang wrote: > Hi, please review this small change. > > The current max size these two stubs is a bit overestimated and thus is more than needed. > Since `la`, `far_call` and `far_jump` assember routines used by these two stubs will always > emit 2 instructions for address inside the code cache, we can make the max size more accurate. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) This pull request has now been integrated. Changeset: 803612ee Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/803612ee9377f7875d1b3ceb6f055048703e148c Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod 8343121: RISC-V: More accurate max size for C2SafepointPollStub and C2EntryBarrierStub Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21732 From mdoerr at openjdk.org Fri Nov 1 00:22:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Nov 2024 00:22:42 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: > This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Minor improvements (review feedback). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21812/files - new: https://git.openjdk.org/jdk/pull/21812/files/ea2fa546..c229422b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21812&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21812&range=00-01 Stats: 12 lines in 2 files changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21812/head:pull/21812 PR: https://git.openjdk.org/jdk/pull/21812 From mdoerr at openjdk.org Fri Nov 1 00:27:31 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 1 Nov 2024 00:27:31 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 17:46:17 GMT, Vladimir Kozlov wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor improvements (review feedback). > > src/hotspot/share/compiler/compileBroker.cpp line 1027: > >> 1025: >> 1026: int old_c2_count = 0, new_c2_count = 0, old_c1_count = 0, new_c1_count = 0; >> 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; > > Any reason to have such numbers (2 and 4)? Any experiments were done to select the best numbers? Please note that these constants are not new. I have only given them names. I had done some experiments when implementing [JDK-8198756](https://bugs.openjdk.org/browse/JDK-8198756) for JDK11. C1 is faster than C2. Therefore, we can have more C1 tasks per C1 thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1825304691 From fyang at openjdk.org Fri Nov 1 00:57:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:57:35 GMT Subject: RFR: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will help reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21733#issuecomment-2451095523 From fyang at openjdk.org Fri Nov 1 00:57:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 00:57:35 GMT Subject: Integrated: 8343122: RISC-V: C2: Small improvement for real runtime callouts In-Reply-To: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> References: <3AbaT2SwHVxQcQRu82L8CWzKBhhAxukYOMT5Bjgjt_c=.197639f3-dc6b-46ec-9ecd-82569e7eb074@github.com> Message-ID: On Mon, 28 Oct 2024 04:39:17 GMT, Fei Yang wrote: > Hi, please review this small improvement. > > Currently, we do 11 instructions for real C2 runtime callouts (See riscv_enc_java_to_runtime). > Seems we can materialize the pointer faster with `movptr2`, which will help reduce 2 instructions. > But we will need to reorder the original calling sequence a bit to make `t0` available for `movptr2`. > > Testing on linux-riscv64 platform: > - [x] tier1-tier3 (release) > - [x] hotspot:tier1 (fastdebug) This pull request has now been integrated. Changeset: cbda7580 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/cbda758010c22b0c1b9aec16004d4bfd24ab5c81 Stats: 11 lines in 1 file changed: 4 ins; 2 del; 5 mod 8343122: RISC-V: C2: Small improvement for real runtime callouts Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21733 From jbhateja at openjdk.org Fri Nov 1 01:42:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 01:42:31 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v3] In-Reply-To: References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 22:11:31 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2632: >> >>> 2630: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >>> 2631: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >>> 2632: evex_prefix_nf(src, 0, dst->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); >> >> Could you also replace VEX_OPCODE_OF_3C with the standard naming convention of VEX_OPCODE_MAP4? >> I added /*MAP4*/ in the comments after the prefix for the setzuCC instruction, but it's better to make this change consistently in all places. > > Hi Jatin, > > If I understand correctly, are you suggesting that I add a comment in the front like `/* MAP4 */VEX_OPCODE_0F_3C` for all occurrences of VEX_OPCODE_OF_3C in this PR? I would prefer directly using VEX_OPCODE_MAP4 as its a standard naming convention used by [APX specifications](https://cdrdv2.intel.com/v1/dl/getContent/784266) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1825338330 From fyang at openjdk.org Fri Nov 1 02:35:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 02:35:02 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one Message-ID: Hi, please consider this small change. There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 Testing on linux-riscv64: - [x] tier1 (fastdebug build) ------------- Commit messages: - 8343415: RISC-V: Increased maximum size of C2EntryBarrierStub by one Changes: https://git.openjdk.org/jdk/pull/21818/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343415 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21818/head:pull/21818 PR: https://git.openjdk.org/jdk/pull/21818 From fjiang at openjdk.org Fri Nov 1 02:35:02 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 1 Nov 2024 02:35:02 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 02:13:16 GMT, Fei Yang wrote: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) Looks reasonable, thanks. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/21818#pullrequestreview-2409366643 From fyang at openjdk.org Fri Nov 1 02:43:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 1 Nov 2024 02:43:06 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by one [v2] In-Reply-To: References: Message-ID: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Comment typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21818/files - new: https://git.openjdk.org/jdk/pull/21818/files/b133d081..e07f6e37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21818&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21818/head:pull/21818 PR: https://git.openjdk.org/jdk/pull/21818 From jbhateja at openjdk.org Fri Nov 1 03:50:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 03:50:37 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp Message-ID: This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. All existing VectorAPI jtreg regressions are now passing with -Xcomp. Best Regards, Jatin ------------- Commit messages: - 8343297: Vector unsigned min/max test are failing with -Xcomp Changes: https://git.openjdk.org/jdk/pull/21819/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21819&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343297 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21819.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21819/head:pull/21819 PR: https://git.openjdk.org/jdk/pull/21819 From swen at openjdk.org Fri Nov 1 04:59:32 2024 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 1 Nov 2024 04:59:32 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Tue, 29 Oct 2024 18:29:04 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix distance assert In the toString scenario of Integer/Long and the StringBuilder.appendNull/appendBoolean scenario, we can refactor the code to optimize based on unsafe mergestore. I am waiting for this PR to be merged, and then continue to complete PR #19626 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2451293546 From thartmann at openjdk.org Fri Nov 1 06:15:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:15:27 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin src/hotspot/cpu/x86/x86.ad line 6567: > 6565: %} > 6566: > 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ Should `src2` be renamed to `src`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825463517 From jbhateja at openjdk.org Fri Nov 1 06:26:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 06:26:27 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 06:13:02 GMT, Tobias Hartmann wrote: >> This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. >> All existing VectorAPI jtreg regressions are now passing with -Xcomp. >> >> Best Regards, >> Jatin > > src/hotspot/cpu/x86/x86.ad line 6567: > >> 6565: %} >> 6566: >> 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ > > Should `src2` be renamed to `src`? For predicated vector operations, we either populate destination vector lane with the result of the operation if the corresponding mask bit is true or else retain the original contents of lanes. `vec1.lanewise(VectorOperators.UMIN, vec2) ` Here, UMinVNode (vec1, vec2) IR has two source inputs, and two addr matcher pattern alias the first source and destination operand. So src2 looks appropriate and is inline with other predicated operation patterns. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825468346 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v5] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Use is_encodable instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/3da09500..bab7c5df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 10:10:47 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert Thanks for the review, Vladimir and Coleen. I updated the assert according to Coleen's suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21784#issuecomment-2451375945 From thartmann at openjdk.org Fri Nov 1 06:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:34:00 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 18:33:38 GMT, Coleen Phillimore wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Moved assert > > src/hotspot/share/opto/compile.cpp line 3789: > >> 3787: const TypePtr* tp = n->as_Type()->type()->make_ptr(); >> 3788: ciKlass* klass = tp->is_klassptr()->exact_klass(); >> 3789: assert(!klass->is_interface() && !klass->is_abstract(), "Interface or abstract class pointers should not be compressed"); > > Can you make this assert be instead: > > #include "oops/compressedKlass.hpp" > ... > if debug > Klass* k = klass->metadata(); // get the real klass > assert(CompressedKlassPointers::is_encodable(k), "should be encodable"); > endif // debug Sure, good point. I updated the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1825473202 From thartmann at openjdk.org Fri Nov 1 06:40:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:40:03 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v6] In-Reply-To: References: Message-ID: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Now using the right method .. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21784/files - new: https://git.openjdk.org/jdk/pull/21784/files/bab7c5df..b4f98bde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21784&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21784/head:pull/21784 PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Fri Nov 1 06:43:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:43:28 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: <7NebkcjNRckiqhWwo7Mpjucuf4nzi2KYE4bYL_WIMmM=.e3c7877d-1dc4-4434-a6b9-aabb5fddd86f@github.com> On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin The fix looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21819#pullrequestreview-2409549191 From thartmann at openjdk.org Fri Nov 1 06:43:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:43:29 GMT Subject: RFR: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 06:22:30 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 6567: >> >>> 6565: %} >>> 6566: >>> 6567: instruct vector_uminmax_reg_masked(vec dst, vec src2, kReg mask) %{ >> >> Should `src2` be renamed to `src`? > > For predicated vector operations, we either populate destination vector lane with the result of the operation if the corresponding mask bit is true or else retain the original contents of lanes. > `vec1.lanewise(VectorOperators.UMIN, vec2) > ` > Here, UMinVNode (vec1, vec2) IR has two source inputs, and two addr matcher pattern alias the first source and destination operand. So src2 looks appropriate and is inline with other predicated operation patterns. Thanks for the clarification, makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21819#discussion_r1825479298 From chagedorn at openjdk.org Fri Nov 1 06:54:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 06:54:34 GMT Subject: RFR: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21805#issuecomment-2451395078 From chagedorn at openjdk.org Fri Nov 1 06:54:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 06:54:35 GMT Subject: Integrated: 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 12:27:45 GMT, Christian Hagedorn wrote: > The assert added in [JDK-8342043](https://bugs.openjdk.org/browse/JDK-8342043) turns out to be too strong as shown with the test cases. I was unsure about that in the first place when I added it here: > > https://github.com/openjdk/jdk/pull/21608#discussion_r1808732859 > > The assert was more of a best guess and just an additional guarantee that does not provide any benefit. I've found two cases where we have once an `OuterStripMinedLoopEnd` node and once a `ParsePredicate` in `ConnectionGraph::can_reduce_check_users()` which trigger the assert. How we end up with such a graph is explained in the comments at the test cases. > > I don't think it's worth to tweak the assert as we simply bail out afterwards anyway. I therefore propose to simply get rid of the assert again. > > Thanks, > Christian This pull request has now been integrated. Changeset: 6f6cfe64 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/6f6cfe643b48c21c9b7349b584d31b813c025abd Stats: 111 lines in 2 files changed: 108 ins; 2 del; 1 mod 8343380: C2: assert(iff->in(1)->is_OpaqueNotNull()) failed: must be OpaqueNotNull Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21805 From thartmann at openjdk.org Fri Nov 1 06:58:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 06:58:29 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). That looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21812#pullrequestreview-2409562451 From epeter at openjdk.org Fri Nov 1 07:13:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 07:13:31 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v3] In-Reply-To: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> References: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> Message-ID: On Thu, 31 Oct 2024 16:53:57 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add platform checks to IR > - Merge branch 'master' into minmax_identities > - Suggestions from review > - Min/Max identities The IR rules look ok to me. Nice progress :) test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 120: > 118: > 119: @Test > 120: // @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) I would say you should make them negative for now, i.e. make them `failOn`. Otherwise we won't catch these cases when JDK-8307513 gets integrated ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2451413223 PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1825495196 From epeter at openjdk.org Fri Nov 1 07:14:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 07:14:34 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 04:56:46 GMT, Shaojin Wen wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix distance assert > > In the toString scenario of Integer/Long and the StringBuilder.appendNull/appendBoolean scenario, we can refactor the code to optimize based on unsafe mergestore. I am waiting for this PR to be merged, and then continue to complete PR #19626 @wenshao Thanks for your patience. @chhagedorn is doing a thorough review right now, so I hope we are only a few days away from integration ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2451414734 From jbhateja at openjdk.org Fri Nov 1 07:37:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 07:37:34 GMT Subject: Integrated: 8343297: Vector unsigned min/max test are failing with -Xcomp In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 03:45:27 GMT, Jatin Bhateja wrote: > This bugfix patch fixes the incorrect predicated UMinV/UMaxV pattern. > All existing VectorAPI jtreg regressions are now passing with -Xcomp. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 8d4d589f Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/8d4d589fc5895f328c7db93bae72048e8711d727 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod 8343297: Vector unsigned min/max test are failing with -Xcomp Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21819 From dlong at openjdk.org Fri Nov 1 08:26:27 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Nov 2024 08:26:27 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Thu, 31 Oct 2024 10:01:17 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/compile.cpp line 3498: >> >>> 3496: assert(false, "Interface or abstract class pointers should not be compressed"); >>> 3497: } else { >>> 3498: new_in2 = ConNode::make(t->make_narrowklass()); >> >> When I was looking through this code, I was hoping there'd be some sort of assert in the make_narrowklass function so any caller would assert but maybe you don't have that info? > > Right, I was hoping for that too and tried to move the assert into `TypeNarrowKlass::make`. We do have all the information there but we hit false positives in rare cases like this when `MyAbstract` does not have any subtypes at compile time (mostly with `-Xcomp`): > > MyAbstract obj = ...; > obj.getClass(); > > C2 will add a dependency that will invalidate the code once a subclass is loaded and then optimizes the narrow class load from `obj` to be of constant narrow class type `MyAbstract`. The assert will trigger but we will never emit a compressed class pointer because the narrow class load + decode is folded to a non-narrow constant. > > We could move the assert to a later stage though. I'll give that a try. Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1825573872 From aph at openjdk.org Fri Nov 1 09:07:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 09:07:29 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <8gAEXgKSgry8yzUkhw9c3sNC1FjKkBxUNBUaKe6RgS4=.d622b381-2be4-4944-b4ca-2d860fd93379@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Do we really have to wait for JMH tests? PRs should be reasonably reviewable, and this doesn't help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451553100 From chagedorn at openjdk.org Fri Nov 1 10:04:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 10:04:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Tue, 29 Oct 2024 18:29:04 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix distance assert src/hotspot/share/opto/mempointer.cpp line 44: > 42: int traversal_count = 0; > 43: while (_worklist.is_nonempty()) { > 44: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } Maybe also add a comment as below that we bail out if the graph is too complex. src/hotspot/share/opto/mempointer.cpp line 48: > 46: } > 47: > 48: // Check for constant overflow. To match bail out message below for scale: Suggestion: // Bail out if there is a constant overflow. src/hotspot/share/opto/mempointer.cpp line 52: > 50: > 51: // Sort summands by variable->_idx > 52: _summands.sort(MemPointerSummand::cmp_for_sort); When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. src/hotspot/share/opto/mempointer.cpp line 58: > 56: int pos_get = 0; > 57: while (pos_get < _summands.length()) { > 58: MemPointerSummand summand = _summands.at(pos_get++); Won't this create a new local object? So, if you were to change `summand`, then the `MemPointerSummand` inside `_summand` won't be updated (not the case here, though). Since you only are about to read from the object, I suggest to use a reference instead to avoid creation of a new local object. Suggestion: const MemPointerSummand& summand = _summands.at(pos_get++); src/hotspot/share/opto/mempointer.cpp line 304: > 302: // Pre-Condition: > 303: // We assume that both pointers are in-bounds of their respective memory object. > 304: // Suggestion: // Pre-Condition: // We assume that both pointers are in-bounds of their respective memory object. If this does // not hold, for example, with the use of Unsafe, then we would already have undefined behavior, // and we are allowed to do anything. src/hotspot/share/opto/mempointer.hpp line 39: > 37: // We parse / decompose pointers into a linear form: > 38: // > 39: // pointer = sum_i(scale_i * variable_i) + con Maybe also change this to `SUM()` with a short explanation. Some like that: Suggestion: // We parse / decompose pointers into a linear form: // // pointer = SUM(scale_i * variable_i) + con // // where SUM() adds all "scale_i * variable_i" for each i together. src/hotspot/share/opto/mempointer.hpp line 403: > 401: // > 402: // summand = scale * variable > 403: // For completness: Suggestion: // Summand of a MemPointerDecomposedForm: // // summand = scale * variable // // where variable is a C2 node. src/hotspot/share/opto/mempointer.hpp line 458: > 456: // Decomposed form of the pointer sub-expression of "pointer". > 457: // > 458: // pointer = sum(summands) + con Suggestion: // pointer = SUM(summands) + con ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825550519 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825551652 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825556318 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825570996 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825650181 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1824667743 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825554063 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825543969 From chagedorn at openjdk.org Fri Nov 1 10:04:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 10:04:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 07:59:32 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix distance assert > > src/hotspot/share/opto/mempointer.cpp line 52: > >> 50: >> 51: // Sort summands by variable->_idx >> 52: _summands.sort(MemPointerSummand::cmp_for_sort); > > When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. Can you also add a comment that sorting it like that enables walking over the summands and combining the scales for the same nodes below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825556736 From epeter at openjdk.org Fri Nov 1 10:22:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:22:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v12] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/9f442d27..63496f33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=10-11 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 10:29:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:29:10 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: apply more suggestions from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/63496f33..3ca647e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=11-12 Stats: 73 lines in 2 files changed: 24 ins; 9 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 10:29:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 10:29:10 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v11] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <6wnkmrDfLXJaccB6xsKXP2sgAG7tXVdEKrqE4vlTsdI=.4af0716c-ad1c-4148-8119-41722c5996d7@github.com> Message-ID: On Fri, 1 Nov 2024 08:00:06 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/mempointer.cpp line 52: >> >>> 50: >>> 51: // Sort summands by variable->_idx >>> 52: _summands.sort(MemPointerSummand::cmp_for_sort); >> >> When you name the method something like `cmp_by_variable_idx`, then you could remove the comment. > > Can you also add a comment that sorting it like that enables walking over the summands and combining the scales for the same nodes below? good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825674083 From aph at openjdk.org Fri Nov 1 11:04:28 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 11:04:28 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE. Looks good. Stubs no Stubs Benchmark (size) Mode Cnt Score (us) relative performance DoubleMaxVector.ACOS 1024 avgt 5 3.962 5.523 1.39 DoubleMaxVector.ASIN 1024 avgt 5 3.236 5.460 1.69 DoubleMaxVector.ATAN 1024 avgt 5 4.856 10.117 2.08 DoubleMaxVector.ATAN2 1024 avgt 5 7.144 18.977 2.66 DoubleMaxVector.CBRT 1024 avgt 5 8.802 9.837 1.12 DoubleMaxVector.COS 1024 avgt 5 6.281 8.789 1.40 DoubleMaxVector.COSH 1024 avgt 5 6.431 8.044 1.25 DoubleMaxVector.EXP 1024 avgt 5 1.939 6.417 3.31 DoubleMaxVector.EXPM1 1024 avgt 5 5.412 9.002 1.66 DoubleMaxVector.HYPOT 1024 avgt 5 4.269 12.323 2.89 DoubleMaxVector.LOG 1024 avgt 5 4.165 8.533 2.05 DoubleMaxVector.LOG10 1024 avgt 5 4.381 11.738 2.68 DoubleMaxVector.LOG1P 1024 avgt 5 4.383 12.135 2.77 DoubleMaxVector.POW 1024 avgt 5 14.060 22.053 1.57 DoubleMaxVector.SIN 1024 avgt 5 5.423 8.652 1.60 DoubleMaxVector.SINH 1024 avgt 5 6.251 8.168 1.31 DoubleMaxVector.TAN 1024 avgt 5 9.271 22.238 2.40 DoubleMaxVector.TANH 1024 avgt 5 4.515 4.499 1.00 Float64Vector.ACOS 1024 avgt 5 3.600 5.472 1.52 Float64Vector.ASIN 1024 avgt 5 2.776 5.547 2.00 Float64Vector.ATAN 1024 avgt 5 3.932 10.129 2.58 Float64Vector.ATAN2 1024 avgt 5 5.913 15.960 2.70 Float64Vector.CBRT 1024 avgt 5 7.464 10.078 1.35 Float64Vector.COS 1024 avgt 5 10.620 9.058 0.85 Float64Vector.COSH 1024 avgt 5 5.899 8.268 1.40 Float64Vector.EXP 1024 avgt 5 1.444 6.642 4.60 Float64Vector.EXPM1 1024 avgt 5 5.467 9.108 1.67 Float64Vector.HYPOT 1024 avgt 5 4.133 9.833 2.38 Float64Vector.LOG 1024 avgt 5 3.172 8.820 2.78 Float64Vector.LOG10 1024 avgt 5 3.346 12.142 3.63 Float64Vector.LOG1P 1024 avgt 5 3.216 12.507 3.89 Float64Vector.POW 1024 avgt 5 13.841 22.105 1.60 Float64Vector.SIN 1024 avgt 5 10.464 8.796 0.84 Float64Vector.SINH 1024 avgt 5 6.680 8.243 1.23 Float64Vector.TAN 1024 avgt 5 10.967 26.275 2.40 Float64Vector.TANH 1024 avgt 5 4.516 4.561 1.01 FloatMaxVector.ACOS 1024 avgt 5 1.819 3.752 2.06 FloatMaxVector.ASIN 1024 avgt 5 1.395 3.682 2.64 FloatMaxVector.ATAN 1024 avgt 5 1.970 7.003 3.55 FloatMaxVector.ATAN2 1024 avgt 5 2.951 12.313 4.17 FloatMaxVector.CBRT 1024 avgt 5 3.733 6.510 1.74 FloatMaxVector.COS 1024 avgt 5 5.405 7.363 1.36 FloatMaxVector.COSH 1024 avgt 5 2.951 5.741 1.95 FloatMaxVector.EXP 1024 avgt 5 0.725 4.745 6.54 FloatMaxVector.EXPM1 1024 avgt 5 2.732 6.490 2.38 FloatMaxVector.HYPOT 1024 avgt 5 2.062 6.328 3.07 FloatMaxVector.LOG 1024 avgt 5 1.587 6.847 4.31 FloatMaxVector.LOG10 1024 avgt 5 1.679 10.035 5.98 FloatMaxVector.LOG1P 1024 avgt 5 1.608 8.616 5.36 FloatMaxVector.POW 1024 avgt 5 6.916 19.432 2.81 FloatMaxVector.SIN 1024 avgt 5 5.239 7.202 1.37 FloatMaxVector.SINH 1024 avgt 5 2.992 5.681 1.90 FloatMaxVector.TAN 1024 avgt 5 5.562 17.419 3.13 FloatMaxVector.TANH 1024 avgt 5 2.788 2.791 1.00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2451695886 From ihse at openjdk.org Fri Nov 1 11:50:32 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 11:50:32 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <2q-xho4lerOP-u38nkEG0T62NXtjQ8iM0b3AnVf_mPU=.df4c5282-cc36-4fd1-ab9c-f7fbc4208b95@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2409931030 From coleenp at openjdk.org Fri Nov 1 11:52:29 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 1 Nov 2024 11:52:29 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v6] In-Reply-To: References: Message-ID: <3erXIe3rpiqZ1E2ScCZU7JHkutKydhdaroKfGM_vlFQ=.dd8a6bf4-e030-4161-8a4b-78499936e985@github.com> On Fri, 1 Nov 2024 06:40:03 GMT, Tobias Hartmann wrote: >> @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. >> >> I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 >> >> Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: >> https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 >> >> And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. >> >> I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Now using the right method .. Yes, this looks great. I didn't realize you had a nice function for this in ci. Thank you! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21784#pullrequestreview-2409933132 From dfenacci at openjdk.org Fri Nov 1 12:07:04 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 12:07:04 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v2] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8343153: add missing import - JDK-8343153: check number of huge pages from file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/5cdd78dc..9670eef6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=00-01 Stats: 31 lines in 1 file changed: 17 ins; 11 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From jbhateja at openjdk.org Fri Nov 1 12:11:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 1 Nov 2024 12:11:01 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Message-ID: KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- 1. Long species < 512 bits and non-predicated operation. - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. 2. Long species < 512 bits with memory operands and non-predicated operations. - Load memory into exactly matching vector size. - Operate at full vector width of 512 bits 3. Long species < 512 bits and predicated operation. - Emulate operation using AVX2 instructions - Blend the result with the first source vector using the predication mask. - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. 4. Long species == 512 bits, both predicated and non-predicated operation - Directly use 512 bits VPMINUQ/VPMAXUQ instructions. All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. Kindly review. Best Regards, Jatin ------------- Commit messages: - 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Changes: https://git.openjdk.org/jdk/pull/21821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21821&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343419 Stats: 37 lines in 2 files changed: 20 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21821.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21821/head:pull/21821 PR: https://git.openjdk.org/jdk/pull/21821 From dfenacci at openjdk.org Fri Nov 1 12:14:06 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 12:14:06 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v3] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with three additional commits since the last revision: - JDK-8343153: add missing import - JDK-8343153: add missing brackets - JDK-8343153: add missing try-catch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/9670eef6..989ef945 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=01-02 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From aph at openjdk.org Fri Nov 1 12:39:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 1 Nov 2024 12:39:29 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: <0UYENq3WDrMtFHJtLQzV8wo7SHVsgyAqKh7JPewdB7w=.5402fe2d-cd2b-49d4-8219-48d639fbaa16@github.com> On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21502#pullrequestreview-2409994503 From epeter at openjdk.org Fri Nov 1 12:54:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 12:54:13 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v14] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 92 commits: - Merge branch 'master' into JDK-8335392-MemPointer - apply more suggestions from Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix distance assert - whitespace - more updates for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn - ... and 82 more: https://git.openjdk.org/jdk/compare/f77a5144...e8ad2757 ------------- Changes: https://git.openjdk.org/jdk/pull/19970/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=13 Stats: 2682 lines in 16 files changed: 2415 ins; 213 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From duke at openjdk.org Fri Nov 1 12:59:32 2024 From: duke at openjdk.org (duke) Date: Fri, 1 Nov 2024 12:59:32 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Thu, 31 Oct 2024 12:38:17 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Improved a comment in CompilerThread. @tzezula Your change (at version 7e0f1a4227f388dc8e22e6200dc026f056d26eed) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2451829766 From dfenacci at openjdk.org Fri Nov 1 13:03:42 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:03:42 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8343153: use >= 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21757/files - new: https://git.openjdk.org/jdk/pull/21757/files/989ef945..70dfa263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21757&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21757/head:pull/21757 PR: https://git.openjdk.org/jdk/pull/21757 From dfenacci at openjdk.org Fri Nov 1 13:08:29 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:08:29 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> References: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> Message-ID: On Thu, 31 Oct 2024 13:16:08 GMT, Evgeny Astigeevich wrote: >>> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. >> >> https://bugs.openjdk.org/browse/JDK-8321526 > >> @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! > > `testDefaultCodeCacheWith1GbLargePages` and `testNonSegmented1GbCodeCacheWith1GbLargePages` should only be run if a system provides 1Gb pages. This is mentioned in their names: `...With1GbLargePages`. If there are no 1Gb pages available, the test should not be run. > > I suggest to check `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1`. If not, output "Skipping testDefaultCodeCacheWith1GbLargePages and testDefaultCodeCacheWith1GbLargePages, no 1Gb pages available" . > > With your change, if a system provides 1Gb pages but JVM fails to use them because of a bug, the tests will pass and the bug will be unknown. Thanks for looking into it @eastig. I've changed the test to check for the content of `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1` as you suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2451841004 From eastigeevich at openjdk.org Fri Nov 1 13:12:33 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 1 Nov 2024 13:12:33 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:03:42 GMT, Damon Fenacci wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8343153: use >= 1 Looks good to me. ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/21757#pullrequestreview-2410039933 From dfenacci at openjdk.org Fri Nov 1 13:12:34 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 1 Nov 2024 13:12:34 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 07:40:27 GMT, Tobias Hartmann wrote: >> test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 120: >> >>> 118: // 1GB large pages configured but none available >>> 119: "Failed to reserve and commit memory with given page size\\. " + >>> 120: "req_addr: [^ ]+ size: 1[gG], page size: 1[gG], \\(errno = 12\\)"); >> >> Took me a while to figure that these are `OR` matches due to the `|` hiding at the end of the first line. Would it make sense to update the comment to something like this? >> >> // 1GB large pages configured and available >> "CodeCache:\\s+min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]|" + >> // or 1GB large pages configured but none available > > Also, isn't there a `CodeCache:\` line in the output in the failing case as well that should be added here in the OR part? Thanks @TobiHartmann for looking at it. I've actually changed the test to follow @eastig's suggestion below and reverted these lines to their original state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21757#discussion_r1825800157 From chagedorn at openjdk.org Fri Nov 1 13:21:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 1 Nov 2024 13:21:47 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 10:29:10 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > apply more suggestions from Christian src/hotspot/share/opto/mempointer.hpp line 343: > 341: // This shows that p1 and p2 have a distance greater than the array size, and hence at least one of the two > 342: // pointers must be out of bounds. This contradicts our assumption (S1) and we are done. > 343: // Maybe add some separation here since this comment does not belong to `TraceMemPointer` but is rather a file header comment. Suggestion: src/hotspot/share/opto/mempointer.hpp line 385: > 383: _distance(distance) > 384: { > 385: assert(_distance != min_jint, "given by condition S3 of MemPointer Lemma"); Suggestion: assert(_distance != min_jint, "given by condition (S3) of MemPointer Lemma"); src/hotspot/share/opto/mempointer.hpp line 389: > 387: > 388: public: > 389: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} Does not look like you call this constructor directly. You can therefore make it private as well: Suggestion: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} public: src/hotspot/share/opto/mempointer.hpp line 393: > 391: static MemPointerAliasing make_unknown() { > 392: return MemPointerAliasing(); > 393: } Thinking about the comment above again, you can probably just remove the no-arg-constructor and simply do the following which I think is expressive enough: Suggestion: static MemPointerAliasing make_unknown() { return MemPointerAliasing(Unknown, 0); } src/hotspot/share/opto/mempointer.hpp line 400: > 398: > 399: // Use case: exact aliasing and adjacency. > 400: bool is_always_at_distance(const jint distance) const { The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? src/hotspot/share/opto/mempointer.hpp line 429: > 427: _variable(nullptr), > 428: _scale(NoOverflowInt::make_NaN()) {} > 429: MemPointerSummand(Node* variable, const NoOverflowInt scale) : Can `scale` be passed as const reference? You will make a copy anyway when assigning it to `_scale`. The compiler would probably optimize this anyway but I guess it does not hurt to use a reference here directly. src/hotspot/share/opto/mempointer.hpp line 438: > 436: > 437: Node* variable() const { return _variable; } > 438: NoOverflowInt scale() const { return _scale; } Not sure if you really require to create a new object here or if you could just pass it by const reference. The usages are only in `parse_decomposed_form()`. There you either add it together, from which you create a new `NoOverFlowInt` anyway, or you use it to create a new `MemPointerSummand` which will create it's own `scale` copy anyway. But maybe I'm also missing something here. src/hotspot/share/opto/mempointer.hpp line 480: > 478: // We limit the number of summands to 10. Usually, a pointer contains a base pointer > 479: // (e.g. array pointer or null for native memory) and a few variables. > 480: static const int SUMMANDS_SIZE = 10; Looks like a best guess. Maybe you can also explicitly mention that here. Otherwise, it's unclear how you came up with the value 10. src/hotspot/share/opto/mempointer.hpp line 497: > 495: > 496: private: > 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) Same here, could `con` be passed by const reference since you create a copy from it anyway? src/hotspot/share/opto/mempointer.hpp line 498: > 496: private: > 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) > 498: :_pointer(pointer), _con(con) { Suggestion: : _pointer(pointer), _con(con) { src/hotspot/share/opto/noOverflowInt.hpp line 28: > 26: #define SHARE_OPTO_NOOVERFLOWINT_HPP > 27: > 28: #include "utilities/globalDefinitions.hpp" You do not seem to need this and thus could be removed Suggestion: src/hotspot/share/opto/noOverflowInt.hpp line 57: > 55: bool is_zero() const { return !is_NaN() && value() == 0; } > 56: > 57: friend NoOverflowInt operator+(const NoOverflowInt a, const NoOverflowInt b) { Is it required to pass the arguments by value for the overloaded operators or would it be sufficient to pass them by reference (i.e. `const NoOverflowInt& a, const NoOverflowInt& b`)? src/hotspot/share/opto/noOverflowInt.hpp line 90: > 88: > 89: NoOverflowInt abs() const { > 90: if (is_NaN()) { return make_NaN(); } Why do you require a new `NaN` here and not simply return `*this`? src/hotspot/share/opto/noOverflowInt.hpp line 95: > 93: } > 94: > 95: bool is_multiple_of(const NoOverflowInt other) const { I think you can also pass `other` here by reference since you only query it: Suggestion: bool is_multiple_of(const NoOverflowInt& other) const { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825759524 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825760604 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825763396 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825765294 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825768349 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825774196 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825778656 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825779509 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825781796 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825781191 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825758486 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825746670 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825749382 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825747742 From duke at openjdk.org Fri Nov 1 13:39:36 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 1 Nov 2024 13:39:36 GMT Subject: Integrated: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has now been integrated. Changeset: 751a914b Author: Tomas Zezula URL: https://git.openjdk.org/jdk/commit/751a914b0a377d4e1dd30d2501f0ab4e327dea34 Stats: 124 lines in 6 files changed: 108 ins; 4 del; 12 mod 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21285 From epeter at openjdk.org Fri Nov 1 13:56:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 13:56:53 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v15] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/e8ad2757..e2550c9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=13-14 Stats: 6 lines in 2 files changed: 1 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 13:56:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 13:56:53 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 12:31:50 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 400: > >> 398: >> 399: // Use case: exact aliasing and adjacency. >> 400: bool is_always_at_distance(const jint distance) const { > > The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? Hmm. Maybe I can call it `is_always_with_distance`? Because this would imply that the two pointers always have an aliasing with this exact distance.... so that would be fitting in its **meaning**. But yours is more exactly what id **does**.... hmm.. what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825844577 From epeter at openjdk.org Fri Nov 1 14:38:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:38:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <7i07uECc-y3b1y4bxbl8OvxmYxgvj0VUnonJNbU22RY=.7bdcb41c-abe4-40fc-a83d-19c4966de4d9@github.com> On Fri, 1 Nov 2024 12:25:43 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 389: > >> 387: >> 388: public: >> 389: MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} > > Does not look like you call this constructor directly. You can therefore make it private as well: > Suggestion: > > MemPointerAliasing() : MemPointerAliasing(Unknown, 0) {} > > public: I just removed this constructor! > src/hotspot/share/opto/mempointer.hpp line 393: > >> 391: static MemPointerAliasing make_unknown() { >> 392: return MemPointerAliasing(); >> 393: } > > Thinking about the comment above again, you can probably just remove the no-arg-constructor and simply do the following which I think is expressive enough: > Suggestion: > > static MemPointerAliasing make_unknown() { > return MemPointerAliasing(Unknown, 0); > } Yes, this seems better, I'm doing this :) > src/hotspot/share/opto/mempointer.hpp line 429: > >> 427: _variable(nullptr), >> 428: _scale(NoOverflowInt::make_NaN()) {} >> 429: MemPointerSummand(Node* variable, const NoOverflowInt scale) : > > Can `scale` be passed as const reference? You will make a copy anyway when assigning it to `_scale`. The compiler would probably optimize this anyway but I guess it does not hurt to use a reference here directly. Will do that, and similarly elsewhere! > src/hotspot/share/opto/mempointer.hpp line 438: > >> 436: >> 437: Node* variable() const { return _variable; } >> 438: NoOverflowInt scale() const { return _scale; } > > Not sure if you really require to create a new object here or if you could just pass it by const reference. The usages are only in `parse_decomposed_form()`. There you either add it together, from which you create a new `NoOverFlowInt` anyway, or you use it to create a new `MemPointerSummand` which will create it's own `scale` copy anyway. But maybe I'm also missing something here. Passing out constant references makes me a little nervous, honestly. What if the MemPointer does not outlive the use of the reference outside? I think a creation of a `NoOverflowInt` is very very cheap, and not really worth that risk... > src/hotspot/share/opto/noOverflowInt.hpp line 57: > >> 55: bool is_zero() const { return !is_NaN() && value() == 0; } >> 56: >> 57: friend NoOverflowInt operator+(const NoOverflowInt a, const NoOverflowInt b) { > > Is it required to pass the arguments by value for the overloaded operators or would it be sufficient to pass them by reference (i.e. `const NoOverflowInt& a, const NoOverflowInt& b`)? Good idea! > src/hotspot/share/opto/noOverflowInt.hpp line 90: > >> 88: >> 89: NoOverflowInt abs() const { >> 90: if (is_NaN()) { return make_NaN(); } > > Why do you require a new `NaN` here and not simply return `*this`? Yes, I changed it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825890781 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825891236 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825892269 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825894476 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825889312 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825890329 From dnsimon at openjdk.org Fri Nov 1 14:40:57 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 14:40:57 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Message-ID: The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: /** * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. * The latter two are forced to be the real OS and architecture. That is, values * for these two properties set on the command line are ignored. */ The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. ------------- Commit messages: - separate out HotSpot specific semantics of getSavedProperties Changes: https://git.openjdk.org/jdk/pull/21832/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21832&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343439 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21832/head:pull/21832 PR: https://git.openjdk.org/jdk/pull/21832 From epeter at openjdk.org Fri Nov 1 14:42:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:42:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <9MSWBGyO2BisLPYCBiz2clMeEPgA4f3lUrUNHjJ41Tg=.f8adee1b-266c-4611-86cb-5e18287ec820@github.com> On Fri, 1 Nov 2024 12:45:56 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply more suggestions from Christian > > src/hotspot/share/opto/mempointer.hpp line 480: > >> 478: // We limit the number of summands to 10. Usually, a pointer contains a base pointer >> 479: // (e.g. array pointer or null for native memory) and a few variables. >> 480: static const int SUMMANDS_SIZE = 10; > > Looks like a best guess. Maybe you can also explicitly mention that here. Otherwise, it's unclear how you came up with the value 10. Ok, will do > src/hotspot/share/opto/mempointer.hpp line 497: > >> 495: >> 496: private: >> 497: MemPointerDecomposedForm(Node* pointer, const GrowableArray& summands, const NoOverflowInt con) > > Same here, could `con` be passed by const reference since you create a copy from it anyway? did that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825897645 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825898587 From epeter at openjdk.org Fri Nov 1 14:49:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:49:07 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more review applications ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/e2550c9b..d10b76ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=14-15 Stats: 20 lines in 3 files changed: 2 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Fri Nov 1 14:49:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 1 Nov 2024 14:49:08 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v13] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 13:52:48 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/mempointer.hpp line 400: >> >>> 398: >>> 399: // Use case: exact aliasing and adjacency. >>> 400: bool is_always_at_distance(const jint distance) const { >> >> The "always" seems to refer to the `Always` but it reads like we are just curious about the distance. Is `is_always_and_at_distance()` more clear? > > Hmm. Maybe I can call it `is_always_with_distance`? Because this would imply that the two pointers always have an aliasing with this exact distance.... so that would be fitting in its **meaning**. But yours is more exactly what id **does**.... hmm.. what do you think? Hmm. No I think I really like the original, because it reads like this: `aliasing.is_always_at_distance(d)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1825905560 From thartmann at openjdk.org Fri Nov 1 15:12:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 1 Nov 2024 15:12:30 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 [v4] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:03:42 GMT, Damon Fenacci wrote: >> # Issue >> >> The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. >> >> On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). >> >> If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). >> >> # Solution >> >> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. >> So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: >> * when 1GB huge pages are supported and can be allocated correctly >> * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8343153: use >= 1 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21757#pullrequestreview-2410261375 From never at openjdk.org Fri Nov 1 17:03:27 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 1 Nov 2024 17:03:27 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21832#pullrequestreview-2410497098 From dnsimon at openjdk.org Fri Nov 1 17:07:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:31 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <3ysaTjj1gA2FAlTBZ74Z3NREDdsOrkjCoxiJMA8Tzmk=.313ae18d-565e-41a0-83f4-7df3a2c1746b@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21832#issuecomment-2452244006 From dnsimon at openjdk.org Fri Nov 1 17:07:32 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:32 GMT Subject: Integrated: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <19Lx0iaxn_59ty9sWRMKM7ftO8MX-ZHlbfr33jARKQY=.64cdeefd-d155-4b5d-9aeb-4abd6a0de49a@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. This pull request has now been integrated. Changeset: 1eccdfc6 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1eccdfc62288b8baff950b7293ee931eab896298 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21832 From kvn at openjdk.org Fri Nov 1 17:30:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 17:30:29 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). Looks good. src/hotspot/share/compiler/compileBroker.hpp line 90: > 88: CompileTask* _first_stale; > 89: > 90: volatile int _size; Right. I was concern about concurrent access to this field. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21812#pullrequestreview-2410562706 PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1826100677 From kvn at openjdk.org Fri Nov 1 17:30:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 17:30:30 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: <9yYTrS8f-0f1Bi_YUaCFEb3JhwLEhxk1mX8_3nvIv98=.d152b2e5-aae5-4d95-803d-91995ca3367e@github.com> On Fri, 1 Nov 2024 00:24:47 GMT, Martin Doerr wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1027: >> >>> 1025: >>> 1026: int old_c2_count = 0, new_c2_count = 0, old_c1_count = 0, new_c1_count = 0; >>> 1027: const int c2_tasks_per_thread = 2, c1_tasks_per_thread = 4; >> >> Any reason to have such numbers (2 and 4)? Any experiments were done to select the best numbers? > > Please note that these constants are not new. I have only given them names. I had done some experiments when implementing [JDK-8198756](https://bugs.openjdk.org/browse/JDK-8198756) for JDK11. C1 is faster than C2. Therefore, we can have more C1 tasks per C1 thread. Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21812#discussion_r1826098757 From sviswanathan at openjdk.org Sat Nov 2 00:10:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 2 Nov 2024 00:10:27 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: References: Message-ID: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> On Fri, 1 Nov 2024 12:06:27 GMT, Jatin Bhateja wrote: > KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. > > This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- > 1. Long species < 512 bits and non-predicated operation. > - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. > 2. Long species < 512 bits with memory operands and non-predicated operations. > - Load memory into exactly matching vector size. > - Operate at full vector width of 512 bits > 3. Long species < 512 bits and predicated operation. > - Emulate operation using AVX2 instructions > - Blend the result with the first source vector using the predication mask. > - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. > 4. Long species == 512 bits, both predicated and non-predicated operations. > - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. > > All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. > > Kindly review. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21821#pullrequestreview-2411015230 From jbhateja at openjdk.org Sat Nov 2 01:10:47 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 2 Nov 2024 01:10:47 GMT Subject: RFR: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> References: <3Mk5nm6pbSQzZECZnGXurTdkTDHOe99zGcwfRgr-ec0=.66858eec-9c11-4e67-b5d4-cb683816f231@github.com> Message-ID: On Sat, 2 Nov 2024 00:08:21 GMT, Sandhya Viswanathan wrote: >> KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. >> >> This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- >> 1. Long species < 512 bits and non-predicated operation. >> - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. >> 2. Long species < 512 bits with memory operands and non-predicated operations. >> - Load memory into exactly matching vector size. >> - Operate at full vector width of 512 bits >> 3. Long species < 512 bits and predicated operation. >> - Emulate operation using AVX2 instructions >> - Blend the result with the first source vector using the predication mask. >> - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. >> 4. Long species == 512 bits, both predicated and non-predicated operations. >> - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. >> >> All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. >> >> Kindly review. >> >> Best Regards, >> Jatin > > Looks good to me. Thanks @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21821#issuecomment-2452777794 From jbhateja at openjdk.org Sat Nov 2 01:10:47 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 2 Nov 2024 01:10:47 GMT Subject: Integrated: 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 12:06:27 GMT, Jatin Bhateja wrote: > KNL only supports AVX512F but not AVX512VL feature, thus vector operations with vector size less than or equal to 256 bits are generally emulated using AVX2 instructions. > > This bugfix patch covers the following scenarios for LongVector unsigned min/ max over KNL targets:- > 1. Long species < 512 bits and non-predicated operation. > - Operate at full vector width of 512 bits using VPMINUQ/VPMAXUQ instructions. > 2. Long species < 512 bits with memory operands and non-predicated operations. > - Load memory into exactly matching vector size. > - Operate at full vector width of 512 bits > 3. Long species < 512 bits and predicated operation. > - Emulate operation using AVX2 instructions > - Blend the result with the first source vector using the predication mask. > - Existing opmask population mechanism expects the existence of AVX512BW/DQ features missing on KNL target. > 4. Long species == 512 bits, both predicated and non-predicated operations. > - Directly uses 512 bits VPMINUQ/VPMAXUQ instructions. > > All existing jtreg regressions are passing with -XX:+UseKNLSetting and -Xcomp flags. > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 3c7082a6 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3c7082a633037c19066c36be2520487b0bed4e79 Stats: 37 lines in 2 files changed: 20 ins; 6 del; 11 mod 8343419: Assertion failure in long vector unsigned min/max with -XX:+UseKNLSetting Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21821 From syan at openjdk.org Sat Nov 2 13:37:02 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 13:37:02 GMT Subject: RFR: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails Message-ID: Hi all, Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: `warning: AES instructions are not available on this CPU` or: `warning: AES intrinsics are not available on this CPU` But the actual output on linux-riscv64 both is: `warning: AES intrinsics require Zvkn extension (not available on this CPU).` This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. ------------- Commit messages: - 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails Changes: https://git.openjdk.org/jdk/pull/21849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343475 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21849/head:pull/21849 PR: https://git.openjdk.org/jdk/pull/21849 From syan at openjdk.org Sat Nov 2 14:28:32 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 14:28:32 GMT Subject: RFR: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails In-Reply-To: References: Message-ID: On Sat, 2 Nov 2024 13:31:35 GMT, SendaoYan wrote: > Hi all, > Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: > `warning: AES instructions are not available on this CPU` > or: > `warning: AES intrinsics are not available on this CPU` > But the actual output on linux-riscv64 both is: > `warning: AES intrinsics require Zvkn extension (not available on this CPU).` > > This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. Duplicate to https://github.com/openjdk/jdk/pull/21847, close this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21849#issuecomment-2453007058 From syan at openjdk.org Sat Nov 2 14:28:32 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 2 Nov 2024 14:28:32 GMT Subject: Withdrawn: 8343475: RISC-V: Test TestAESIntrinsicsOnUnsupportedConfig.java fails In-Reply-To: References: Message-ID: <7wsRnK2bL7Tae0F_D1jJXTEGbMFVlCp3mVK-UG421OY=.658ed2f7-9c72-4a6a-8fd0-02a940b541e7@github.com> On Sat, 2 Nov 2024 13:31:35 GMT, SendaoYan wrote: > Hi all, > Test `test/hotspot/jtreg/compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java` fails on linux-riscv64, the expected output is: > `warning: AES instructions are not available on this CPU` > or: > `warning: AES intrinsics are not available on this CPU` > But the actual output on linux-riscv64 both is: > `warning: AES intrinsics require Zvkn extension (not available on this CPU).` > > This PR adopt the output for linux-riscv64. The change has been verified locally, test-fix only, no risk. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21849 From acobbs at openjdk.org Sat Nov 2 15:55:57 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sat, 2 Nov 2024 15:55:57 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) Message-ID: Please review this patch which removes unnecessary `@SuppressWarnings` annotations. ------------- Commit messages: - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. Changes: https://git.openjdk.org/jdk/pull/21853/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343479 Stats: 6 lines in 3 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From acobbs at openjdk.org Sun Nov 3 03:10:24 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sun, 3 Nov 2024 03:10:24 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/8eab41ca..21c83e93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00-01 Stats: 592 lines in 18 files changed: 420 ins; 93 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From sparasa at openjdk.org Mon Nov 4 01:53:26 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 01:53:26 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v4] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: updated opcode 0F_3C to MAP4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/5049d3aa..0f404dbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=02-03 Stats: 24 lines in 2 files changed: 1 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sparasa at openjdk.org Mon Nov 4 01:59:34 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 01:59:34 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v4] In-Reply-To: References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 21:55:11 GMT, Srinivas Vamsi Parasa wrote: > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. Once this PR is integrated, the immediate next step is to integrate Hank's extended verification tool https://github.com/openjdk/jdk/pull/21795. Those tests won't pass without the changes in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2453699811 From jkarthikeyan at openjdk.org Mon Nov 4 03:36:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 03:36:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v4] In-Reply-To: References: Message-ID: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Re-use optimize() and add backend-specific should_lower() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21599/files - new: https://git.openjdk.org/jdk/pull/21599/files/c7ceec71..fc8fa245 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=02-03 Stats: 49 lines in 8 files changed: 36 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From jkarthikeyan at openjdk.org Mon Nov 4 03:36:12 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 03:36:12 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <3kQ-4gSCJWVed41_y2EvHcqxX1tDLYSTGeBL_QTfPn8=.55f7ce6e-e209-465a-97af-257770e13a65@github.com> References: <6ABTGpRWisFfAgR9R6gCqxJMasj8pEYnMRsXCIes9Tc=.b3495a73-aacc-4b7e-9f3a-1e0428cc539a@github.com> <3kQ-4gSCJWVed41_y2EvHcqxX1tDLYSTGeBL_QTfPn8=.55f7ce6e-e209-465a-97af-257770e13a65@github.com> Message-ID: On Thu, 31 Oct 2024 02:48:03 GMT, Quan Anh Mai wrote: >> I would prefer to keep it as-is because `PhaseIterGVN::optimize` does a lot of logic that may not be relevant here (such as IGVN verification and IGV printing). This way we can avoid changes to IGVN in the future accidentally impacting lowering in unexpected ways. > > I actually think it is a good idea to have verification and printing. Since Lowering does IGVN-like transformations, they should behave in generally the same way. If it turns out that we actually need a separate entry then we can create it then. I see, I can understand the benefit there. I think we'll still need to have a custom entry to collect the nodes to place on the worklist and filter based on platform, but we can use `optimize()` to replace the main loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1827154474 From amitkumar at openjdk.org Mon Nov 4 03:38:38 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 03:38:38 GMT Subject: RFR: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long [v3] In-Reply-To: References: Message-ID: <-4CHTIyNpEsG27Y57OXnNbJqKuFf-BMxfV45xU6QtCw=.3b6a45f6-a288-40bd-a10d-e91f2c1d85d7@github.com> On Mon, 21 Oct 2024 07:45:27 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> removes extra whitespaces > > Still looks good. Thanks @RealLucy @theRealAph for the suggestions & approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21559#issuecomment-2453768936 From amitkumar at openjdk.org Mon Nov 4 03:38:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 03:38:39 GMT Subject: Integrated: 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 09:45:19 GMT, Amit Kumar wrote: > Add match rules for UDivI, UModI, UDivL, UModL. And also adds `dlr` and `dlgr` instruction. > > Tier1 test are clean for fastdebug vm; > > Before this patch, `compiler/c2/TestDivModNodes.java` was failing (see jbs issue) but with this patch test is passing. > > Without Patch: > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > IntegerDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 1935.176 ? 2.191 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 1934.915 ? 3.207 ns/op > IntegerDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 1934.325 ? 1.108 ns/op > IntegerDivMod.testDivideUnsigned 1024 mixed avgt 15 1809.782 ? 49.341 ns/op > IntegerDivMod.testDivideUnsigned 1024 positive avgt 15 1769.326 ? 2.607 ns/op > IntegerDivMod.testDivideUnsigned 1024 negative avgt 15 1784.053 ? 71.190 ns/op > IntegerDivMod.testRemainderUnsigned 1024 mixed avgt 15 2026.978 ? 1.534 ns/op > IntegerDivMod.testRemainderUnsigned 1024 positive avgt 15 2028.039 ? 3.812 ns/op > IntegerDivMod.testRemainderUnsigned 1024 negative avgt 15 2437.843 ? 636.808 ns/op > Finished running test 'micro:java.lang.IntegerDivMod' > > > Benchmark (BUFFER_SIZE) (divisorType) Mode Cnt Score Error Units > LongDivMod.testDivideRemainderUnsigned 1024 mixed avgt 15 4524.897 ? 16.566 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 positive avgt 15 4373.714 ? 9.514 ns/op > LongDivMod.testDivideRemainderUnsigned 1024 negative avgt 15 2018.309 ? 1.788 ns/op > LongDivMod.testDivideUnsigned 1024 mixed avgt 15 4320.382 ? 19.055 ns/op > LongDivMod.testDivideUnsigned 1024 positive avgt 15 3988.953 ? 8.770 ns/op > LongDivMod.testDivideUnsigned 1024 negative avgt 15 1069.703 ? 1.525 ns/op > LongDivMod.testRemainderUnsigned 1024 mixed avgt 15 5589.319 ? 4.247 ns/op > LongDivMod.testRemainderUnsigned 1024 positive avgt 15 3904.555 ? 3.191 ns/op > LongDivMod.testRemainderUnsigned 1024 negative avgt 15 1765.761 ? 1.539 ns/op > Finished ... This pull request has now been integrated. Changeset: c1251780 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/c125178065664fdf96c42dfc6dcfa2431e6011a4 Stats: 101 lines in 3 files changed: 99 ins; 0 del; 2 mod 8341068: [s390x] intrinsics for divideUnsigned and remainderUnsigned methods in java.lang.Integer and java.lang.Long Reviewed-by: lucy, aph ------------- PR: https://git.openjdk.org/jdk/pull/21559 From jkarthikeyan at openjdk.org Mon Nov 4 04:25:07 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 04:25:07 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v4] In-Reply-To: References: Message-ID: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Make long tests check IR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21439/files - new: https://git.openjdk.org/jdk/pull/21439/files/39f7d047..fc484f6e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21439&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21439.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21439/head:pull/21439 PR: https://git.openjdk.org/jdk/pull/21439 From jkarthikeyan at openjdk.org Mon Nov 4 04:25:10 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 4 Nov 2024 04:25:10 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v3] In-Reply-To: References: <3A-W4pcQj_I0QNWlUU3qibf6SQbNnZyO1JxeH1ym9Lw=.d343a0a6-10f4-4a3a-89fc-06e4cef04d02@github.com> Message-ID: <9UXnatwgwzVK3JhV2nBG4qaIFg1aBJTP9Ti9vFbKHuY=.aa113a75-30df-4629-8f3f-47e5266e882f@github.com> On Fri, 1 Nov 2024 07:08:31 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add platform checks to IR >> - Merge branch 'master' into minmax_identities >> - Suggestions from review >> - Min/Max identities > > test/hotspot/jtreg/compiler/c2/irTests/TestMinMaxIdentities.java line 120: > >> 118: >> 119: @Test >> 120: // @IR(applyIfPlatform = { "riscv64", "false" }, phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) > > I would say you should make them negative for now, i.e. make them `failOn`. Otherwise we won't catch these cases when JDK-8307513 gets integrated ;) Sounds good, I've pushed a commit that makes the tests pass now but fail when 8307513 is integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21439#discussion_r1827178087 From thartmann at openjdk.org Mon Nov 4 06:30:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:30:32 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Fri, 1 Nov 2024 08:23:42 GMT, Dean Long wrote: >> Right, I was hoping for that too and tried to move the assert into `TypeNarrowKlass::make`. We do have all the information there but we hit false positives in rare cases like this when `MyAbstract` does not have any subtypes at compile time (mostly with `-Xcomp`): >> >> MyAbstract obj = ...; >> obj.getClass(); >> >> C2 will add a dependency that will invalidate the code once a subclass is loaded and then optimizes the narrow class load from `obj` to be of constant narrow class type `MyAbstract`. The assert will trigger but we will never emit a compressed class pointer because the narrow class load + decode is folded to a non-narrow constant. >> >> We could move the assert to a later stage though. I'll give that a try. > > Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. Right, this was an oversimplified example. I used this code: Class test(MyAbstract obj, boolean b) { if (b) { return obj.getClass(); } return null; } We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1827238258 From thartmann at openjdk.org Mon Nov 4 06:30:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:30:33 GMT Subject: Integrated: 8343206: Final graph reshaping should not compress abstract or interface class pointers In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 11:38:53 GMT, Tobias Hartmann wrote: > @fisk found this problematic optimization in final graph reshaping where we would convert a `CmpP` into a `CmpN` by converting a constant class pointer operand to a narrow class pointer. After [JDK-8338526](https://bugs.openjdk.org/browse/JDK-8338526), this is not valid if the class pointer refers to an interface or abstract class. > > I think it's not an issue in current code though. The only way we can get a dynamic narrow class is when loading from an object at `oopDesc::klass_offset_in_bytes()`: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/type.cpp#L3521-L3522 > > Comparisons of such loads with a constant class pointer of interface or abstract class type are always folded during GVN: > https://github.com/openjdk/jdk/blob/7131f053b0d26b62cbf0d8376ec117d6e8d79f9e/src/hotspot/share/opto/subnode.cpp#L1164-L1171 > > And therefore, the code in `Compile::final_graph_reshaping_main_switch` will never trigger. > > I added a corresponding assert and bailout in product to be on the safe side. The assert never triggered in my testing. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 2432c4f8 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/2432c4f862e66e91c60e75ccc43b376020d80a1f Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod 8343206: Final graph reshaping should not compress abstract or interface class pointers Reviewed-by: coleenp, eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21784 From thartmann at openjdk.org Mon Nov 4 06:37:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 06:37:30 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. Marked as reviewed by thartmann (Reviewer). Thanks Cesar, that looks good to me. I'll run a final round of testing and report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/21778#pullrequestreview-2412246708 PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2453917034 From amitkumar at openjdk.org Mon Nov 4 07:04:57 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 07:04:57 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan Message-ID: This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21864/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21864&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343506 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21864/head:pull/21864 PR: https://git.openjdk.org/jdk/pull/21864 From dfenacci at openjdk.org Mon Nov 4 07:36:35 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 07:36:35 GMT Subject: RFR: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> References: <0tK_3KUqMNg0R5YdFFUlxsSeYZvF57UP_U0b6wdDhG8=.084bfe90-911f-4de2-aa5f-19ed208657b4@github.com> Message-ID: On Thu, 31 Oct 2024 13:16:08 GMT, Evgeny Astigeevich wrote: >>> The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. >> >> https://bugs.openjdk.org/browse/JDK-8321526 > >> @eastig I noticed that you are the author of the original `testNonSegmented1GbCodeCacheWith1GbLargePages` test. Could I ask you to have a look at this change? Thanks a lot! > > `testDefaultCodeCacheWith1GbLargePages` and `testNonSegmented1GbCodeCacheWith1GbLargePages` should only be run if a system provides 1Gb pages. This is mentioned in their names: `...With1GbLargePages`. If there are no 1Gb pages available, the test should not be run. > > I suggest to check `/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages >= 1`. If not, output "Skipping testDefaultCodeCacheWith1GbLargePages and testDefaultCodeCacheWith1GbLargePages, no 1Gb pages available" . > > With your change, if a system provides 1Gb pages but JVM fails to use them because of a bug, the tests will pass and the bug will be unknown. Thank you for your reviews @eastig @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21757#issuecomment-2453985227 From dfenacci at openjdk.org Mon Nov 4 07:36:36 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 07:36:36 GMT Subject: Integrated: 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 In-Reply-To: References: Message-ID: <5Uqdx6inJiDqBconhAcsyD1QCVTGhjQEEI3BvDmqVew=.e4af7332-c171-40fb-9875-055f13cb00c9@github.com> On Tue, 29 Oct 2024 10:54:31 GMT, Damon Fenacci wrote: > # Issue > > The third test of `compiler/codecache/CheckLargePages.java` checks that non-segmented 1GB code cache can be allocated with 1GB large pages. > > On linux (the only supported platform) in order to allocate them, 1GB huge pages have to be enabled (checkable from `/proc/meminfo` and `/sys/kernel/mm/hugepages/hugepages-xxxx`) and their number has to be set to >0 (checkable from `/sys/kernel/mm/hugepages/hugepages-xxxx/nr_hugepages`). > > If 1GB huge pages are enabled but their number is 0, the test fails because it looks for a string that matches `CodeCache: min=1[gG] max=1[gG] base=[^ ]+ size=1[gG] page_size=1[gG]` but the actual output is `CodeCache: min=1G max=1G base=0x00007f4040000000 size=1G page_size=2M`. This happens because the VM tries to allocate 1GB huge pages but it fails beause the number of allocatable ones is 0 and the VM reverts to smaller large page sizes (2MB). > > # Solution > > The problem might be attributed to the VM only checking for 1GB huge pages to be supported, not how many there currently are. Nevertheless, this seems to be the correct behaviour, not least because their number can be changed dynamically. > So, the correct thing to do seems to be to "relax" the check made by the test to include both cases: > * when 1GB huge pages are supported and can be allocated correctly > * when 1GB huge pages are supported but cannot be allocated correctly (because there are none available) and the VM reverts to 2MB huge pages (if there are no 2MB pages available the test doesn't run at all). This pull request has now been integrated. Changeset: e7f0bf11 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/e7f0bf11ff0e89b6b156d5e88ca3771c706aa46a Stats: 24 lines in 2 files changed: 20 ins; 1 del; 3 mod 8343153: compiler/codecache/CheckLargePages.java fails on linux with huge pages configured but its number set to 0 Reviewed-by: eastigeevich, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21757 From mli at openjdk.org Mon Nov 4 09:22:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:37 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> <1CBxrIcc1nOhl-xlgLDw2qjDt4JFIlOC1kbWXJSTt5w=.cd18419f-40b6-44d6-bce0-5a06e494d9eb@github.com> Message-ID: On Fri, 1 Nov 2024 11:01:24 GMT, Andrew Haley wrote: > Here are my results, Apple M1. Pretty similar to what we've seen, but no SVE. > > Looks good. Thank you so much for testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2454181484 From mli at openjdk.org Mon Nov 4 09:22:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> References: <033UQ-960JvpQbpMGHPKyoIv43XWj2bGzdnmCSoZrac=.3ffd7d94-2eb4-44e3-b91b-967d58aa01f7@github.com> Message-ID: On Tue, 22 Oct 2024 09:28:36 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. >> This pr is based on https://github.com/openjdk/jdk/pull/20781. >> >> Thanks! >> >> ## Test >> ### tests: >> * test/jdk/jdk/incubator/vector/ >> * test/hotspot/jtreg/compiler/vectorapi/ >> >> ### options: >> * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs >> * -XX:+EnableVectorSupport -XX:-UseVectorStubs >> >> ## Performance >> >> ### Tests >> jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). >> >> ### Options >> * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' >> * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' >> >> ### Performance data >> I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment for tanh Thanks all for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21502#issuecomment-2454182427 From mli at openjdk.org Mon Nov 4 09:22:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 09:22:40 GMT Subject: Integrated: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:57:46 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? Previously it's https://github.com/openjdk/jdk/pull/18605. > This pr is based on https://github.com/openjdk/jdk/pull/20781. > > Thanks! > > ## Test > ### tests: > * test/jdk/jdk/incubator/vector/ > * test/hotspot/jtreg/compiler/vectorapi/ > > ### options: > * -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs > * -XX:+EnableVectorSupport -XX:-UseVectorStubs > > ## Performance > > ### Tests > jmh tests are test/micro/org/openjdk/bench/jdk/incubator/vector/operation/ from vectorIntrinsics branch in panama-vector. It's good to have these tests in jdk main stream, I will do it in a separate pr later. (These tests are auto-generated tests from some script&template, it's good to also have those scrip&template in jdk main stream, but those scrip&template generates more other tests than what we need here, so better to add these tests and script&template in another pr). > > ### Options > * +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs' > * -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs' > > ### Performance data > I have re-tested, there is no much difference from https://github.com/openjdk/jdk/pull/18605, so please check performance data in that pr. This pull request has now been integrated. Changeset: df08a9ec Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/df08a9ec0d813fcd4ea88a3773c230af6d65e045 Stats: 343 lines in 8 files changed: 338 ins; 1 del; 4 mod 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Co-authored-by: Xiaohong Gong Reviewed-by: ihse, fgao, aph ------------- PR: https://git.openjdk.org/jdk/pull/21502 From rcastanedalo at openjdk.org Mon Nov 4 09:44:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 4 Nov 2024 09:44:52 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression Message-ID: This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. #### Testing ##### Functionality - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). ##### Performance - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. ------------- Commit messages: - Remove fix condition - Add regression test - Verify that there are no out references to dead nodes after matching - Do not clone two pointer adds using each an immediate on x86 Changes: https://git.openjdk.org/jdk/pull/21829/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339303 Stats: 73 lines in 3 files changed: 67 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21829.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21829/head:pull/21829 PR: https://git.openjdk.org/jdk/pull/21829 From mdoerr at openjdk.org Mon Nov 4 10:01:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 10:01:37 GMT Subject: RFR: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory [v2] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 00:22:42 GMT, Martin Doerr wrote: >> This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Minor improvements (review feedback). Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21812#issuecomment-2454262340 From mdoerr at openjdk.org Mon Nov 4 10:01:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 10:01:38 GMT Subject: Integrated: 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 17:03:33 GMT, Martin Doerr wrote: > This PR adds a quick check + bail out in order to avoid excessive usage of slow checks. Especially, it avoids querying the available memory so often. This pull request has now been integrated. Changeset: 75801992 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/75801992a7c626d409f66e2491082dba84c6fe45 Stats: 27 lines in 2 files changed: 16 ins; 0 del; 11 mod 8343205: CompileBroker::possibly_add_compiler_threads excessively polls available memory Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21812 From chagedorn at openjdk.org Mon Nov 4 10:17:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 10:17:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 14:49:07 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more review applications Some final last mostly minor comments but otherwise, it looks good to me now! I like the summaries and how you worked out the proofs. They are now easy to understand. src/hotspot/share/opto/memnode.cpp line 2943: > 2941: return false; > 2942: } > 2943: return true; You could directly return (I cannot create a code suggestion as it says "Applying suggestions on deleted lines is not supported"): return pointer_def.is_adjacent_to_and_before(pointer_use); src/hotspot/share/opto/mempointer.cpp line 45: > 43: while (_worklist.is_nonempty()) { > 44: // Bail out if the graph is too complex. > 45: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } Might be easier to read/understand when we also have `MemPointerDecomposedForm::make_trivial(pointer)` method. What do you think? Then `MemPointerDecomposedForm(pointer)` can also be made private. src/hotspot/share/opto/mempointer.cpp line 199: > 197: #else > 198: > 199: switch(opc) { Suggestion: switch (opc) { src/hotspot/share/opto/mempointer.cpp line 265: > 263: // > 264: // Thus, for AddI and SubI, we get: > 265: // summand = new_summand1 + new_summand2 + scale * y * 2^32 Took me a moment to understand `new_summands`. Maybe we can give a hint like that? Suggestion: // scale * ConvI2L(a << con) = scale * (1 << con) * ConvI2L(a) + scale * y * 2^32 // _______________________/ _____________________________________/ ______________/ // before decomposition after decomposition ("new_summands") overflow correction // // Thus, for AddI and SubI, we get: // summand = new_summand1 + new_summand2 + scale * y * 2^32 src/hotspot/share/opto/mempointer.cpp line 283: > 281: // z * array_element_size_in_bytes = scale > 282: // > 283: // And hence, with "x = y * z": Maybe add here: Suggestion: // And hence, with "x = y * z", the decomposition is (SAFE2) under assumed condition: src/hotspot/share/opto/mempointer.cpp line 318: > 316: #endif > 317: > 318: // "MemPointer Lemma" condition S2: check if all summands are the same: Suggestion: // "MemPointer Lemma" condition (S2): check if all summands are the same: src/hotspot/share/opto/mempointer.cpp line 332: > 330: } > 331: > 332: // "MemPointer Lemma" condition S3: check that the constants do not differ too much: Suggestion: // "MemPointer Lemma" condition (S3): check that the constants do not differ too much: src/hotspot/share/opto/mempointer.cpp line 347: > 345: } > 346: > 347: // "MemPointer Lemma" condition S1: Suggestion: // "MemPointer Lemma" condition (S1): src/hotspot/share/opto/mempointer.cpp line 352: > 350: // bounds of that same memory object. > 351: > 352: // Hence, all 3 conditions of the "MemoryPointer Lemma" are established, and hence Since we also have added `(S0)` recently, we might need to add a word here about it and then update this to "all 4 conditions". src/hotspot/share/opto/mempointer.cpp line 382: > 380: return is_adjacent; > 381: } > 382: Two new lines: Suggestion: src/hotspot/share/opto/mempointer.hpp line 46: > 44: // compile-time variables (C2 nodes). > 45: // > 46: // For the MemPointer, we do not explicitly track base address. For Java heap pointers, the Suggestion: // For the MemPointer, we do not explicitly track the base address. For Java heap pointers, the src/hotspot/share/opto/mempointer.hpp line 232: > 230: // We decompose summand in: > 231: // mp_i = con + summand + SUM(other_summands) > 232: // Resulting in: +-------------------------+ Suggestion: // resulting in: +-------------------------+ src/hotspot/share/opto/mempointer.hpp line 258: > 256: // (S3) All summands of mp1 and mp2 are identical (i.e. only the constants are possibly different). > 257: // > 258: // Then the pointer difference between p1 and p2 is identical to the difference between Suggestion: // then the pointer difference between p1 and p2 is identical to the difference between src/hotspot/share/opto/mempointer.hpp line 332: > 330: // -- apply x != 0 -- > 331: // >= array_element_size_in_bytes * 2^32 - abs(mp1 - mp2) > 332: // -- apply (S3) -- Suggestion: // -- apply (S3) -- src/hotspot/share/opto/mempointer.hpp line 334: > 332: // -- apply (S3) -- > 333: // = array_element_size_in_bytes * 2^32 - abs(mp1.con - mp2.con) > 334: // -- apply (S2) -- Suggestion: // -- apply (S2) -- test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 114: > 112: */ > 113: > 114: // FAILS: mixed providers currently do not merge stores. Maybe there is some inlining issue. Is there a tracking bug/RFE to make this work? test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 196: > 194: Map tests = new HashMap<>(); > 195: > 196: // List of gold, the results from the first run before compilation Sounds funny :-) Maybe: Suggestion: // List of golden values, the results from the first run before compilation test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 375: > 373: applyIf = {"UseUnalignedAccesses", "true"}) > 374: static Object[] test_xxx(MemorySegment a, int xI, int yI, int zI) { > 375: // All RangeChecks remain -> RC smearing not good enough? Is there a tracking bug to further investigate at some point? ------------- PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2412305011 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827276306 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827447237 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827412672 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827463435 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827469254 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827482695 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827483621 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827484278 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827487031 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827488991 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827338243 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827381411 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827382642 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390691 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827390901 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827300978 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827302506 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827310975 From epeter at openjdk.org Mon Nov 4 10:20:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:20:39 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 16:41:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into unsignedbounds > - address reviews > - comment adjust_lo empty case > - formality > - address reviews > - add comments, refactor functions to helper class > - refine comments > - remove leftover code > - add doc to TypeInt, rename parameters, remove unused methods > - change (~v & ones) == 0 to (v & ones) == ones > - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa Sorry, I've been very slow on this. A few more comments before lunch. src/hotspot/share/opto/rangeinference.hpp line 159: > 157: > 158: template > 159: static bool int_type_subset(const CT* super, const CT* sub) { Suggestion: static bool is_int_type_equal(const CT* t1, const CT* t2) { return t1->_lo == t2->_lo && t1->_hi == t2->_hi && t1->_ulo == t2->_ulo && t1->_uhi == t2->_uhi && t1->_bits._zeros == t2->_bits._zeros && t1->_bits._ones == t2->_bits._ones; } template static bool is_int_type_subset(const CT* super, const CT* sub) { I think these should be `is_...` names. src/hotspot/share/opto/type.hpp line 616: > 614: * > 615: * 1. Since every TypeInt instance is canonicalized, all the bounds must also > 616: * be elements of such TypeInt. Or else, we can tighted the bounds by narrowing Suggestion: * be elements of such TypeInt. Or else, we can tighten the bounds by narrowing src/hotspot/share/opto/type.hpp line 620: > 618: * > 619: * 2. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a > 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] The `[_lo, jint(_uhi)] or [jint(_ulo), _hi]` in english is not precise enough. - Is it a mathematical `OR`: the element can also be in both? In that case I would add "or both". - Is it a mathematical `XOR`? Then I would write "either ... or .. but not both" src/hotspot/share/opto/type.hpp line 622: > 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] > 621: * > 622: * Proof: For 2 jint value x, y such that they are both >= 0 or < 0. Then: Suggestion: * Proof: For 2 jint value x, y such that they are both >= 0 or both < 0. Then: Or are you allowing them to one be positive and one negative? src/hotspot/share/opto/type.hpp line 645: > 643: * can be seen that _lo and jint(_uhi) are both < 0 or >= 0, and the same > 644: * applies to jint(_ulo) and _hi. > 645: */ I would appreciate some indentation: it would make it easier to see points 1, 2, ... And to see what is part of the proof, and what is part of a case distinction and each case in it. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2412481589 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827379995 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827388140 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827391449 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827393271 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827397025 From rehn at openjdk.org Mon Nov 4 10:27:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 4 Nov 2024 10:27:31 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four [v2] In-Reply-To: References: Message-ID: <29u1mOEK-Bw7KZZscik5rrpmZAjreO6JK4IQ2JA0mUg=.47443722-b683-4112-90ec-8473915f560d@github.com> On Fri, 1 Nov 2024 02:43:06 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. >> The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. >> So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jalr` pair for this jump. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 >> >> Testing on linux-riscv64: >> - [x] tier1 (fastdebug build) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment typo Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21818#pullrequestreview-2412682449 From epeter at openjdk.org Mon Nov 4 10:35:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/d10b76ff..823bed75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=15-16 Stats: 13 lines in 3 files changed: 0 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Mon Nov 4 10:35:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:15 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Fri, 1 Nov 2024 14:49:07 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more review applications You spent enough time on this already ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454341438 From epeter at openjdk.org Mon Nov 4 10:35:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 10:35:16 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 07:20:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/memnode.cpp line 2943: > >> 2941: return false; >> 2942: } >> 2943: return true; > > You could directly return (I cannot create a code suggestion as it says "Applying suggestions on deleted lines is not supported"): > > return pointer_def.is_adjacent_to_and_before(pointer_use); Ah good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827512031 From fyang at openjdk.org Mon Nov 4 10:56:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Nov 2024 10:56:34 GMT Subject: RFR: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four [v2] In-Reply-To: References: Message-ID: <-y_-jeLxQY8hfcK95gjgIWKgra1f0vbgJ1QE2mz5UDs=.092f442d-e324-4e11-9f30-a296f3ede949@github.com> On Fri, 1 Nov 2024 02:43:06 GMT, Fei Yang wrote: >> Hi, please consider this small change. >> >> There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. >> The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. >> So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jr` pair for this jump. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 >> >> Testing on linux-riscv64: >> - [x] tier1 (fastdebug build) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment typo Thanks all for the review! Moving on ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21818#issuecomment-2454396776 From fyang at openjdk.org Mon Nov 4 10:56:34 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 4 Nov 2024 10:56:34 GMT Subject: Integrated: 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 02:13:16 GMT, Fei Yang wrote: > Hi, please consider this small change. > > There is one jump to continuation (after nmethod entry barriers) in C2EntryBarrierStub [1]. > The current max_size setting assumes the distance is within 1MB, which means a simple `jal` instruction [2]. > So I just count one for this jump in [JDK-8343121](https://bugs.openjdk.org/browse/JDK-8343121). This doesn't seem to break for various tests. But I don't think there is a good reason for that assumption to stand. Instead, we should remove this constraint assuming a `auipc+jr` pair for this jump. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_CodeStubs_riscv.cpp#L66 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L965 > > Testing on linux-riscv64: > - [x] tier1 (fastdebug build) This pull request has now been integrated. Changeset: 7f131a9e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/7f131a9e1eb96d905a57f6e1e6fec2b7c7f725a4 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8343415: RISC-V: Increase maximum size of C2EntryBarrierStub by four Reviewed-by: rehn, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/21818 From chagedorn at openjdk.org Mon Nov 4 11:31:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 11:31:37 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 10:35:15 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java > > Co-authored-by: Christian Hagedorn > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > /contributor add chhagedorn > > You spent enough time on this already ;) Thanks Emanuel, I highly appreciate that :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454464312 From epeter at openjdk.org Mon Nov 4 11:31:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:31:39 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 09:41:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/mempointer.cpp line 45: > >> 43: while (_worklist.is_nonempty()) { >> 44: // Bail out if the graph is too complex. >> 45: if (traversal_count++ > 1000) { return MemPointerDecomposedForm(pointer); } > > Might be easier to read/understand when we also have `MemPointerDecomposedForm::make_trivial(pointer)` method. What do you think? Then `MemPointerDecomposedForm(pointer)` can also be made private. Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827584414 From epeter at openjdk.org Mon Nov 4 11:35:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:35:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 10:12:25 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > src/hotspot/share/opto/mempointer.cpp line 352: > >> 350: // bounds of that same memory object. >> 351: >> 352: // Hence, all 3 conditions of the "MemoryPointer Lemma" are established, and hence > > Since we also have added `(S0)` recently, we might need to add a word here about it and then update this to "all 4 conditions". Good idea ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827588447 From epeter at openjdk.org Mon Nov 4 11:48:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <_bo_FK7zqp8oBdlZdDWdKHvU-rwhCbeqK9ga7qs9Fas=.6ce66b49-42a6-4d8d-9b56-e616899afc48@github.com> On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) @chhagedorn I filed https://bugs.openjdk.org/browse/JDK-8343536 to track the cases in `TestMergeStoresMemorySegment.java` that do not optimize as hoped for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454498317 From epeter at openjdk.org Mon Nov 4 11:48:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:49 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more changes for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19970/files - new: https://git.openjdk.org/jdk/pull/19970/files/823bed75..c1f274f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19970&range=16-17 Stats: 19 lines in 3 files changed: 8 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19970.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19970/head:pull/19970 PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Mon Nov 4 11:48:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:48:50 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v16] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 07:41:49 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more review applications > > test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 114: > >> 112: */ >> 113: >> 114: // FAILS: mixed providers currently do not merge stores. Maybe there is some inlining issue. > > Is there a tracking bug/RFE to make this work? https://bugs.openjdk.org/browse/JDK-8343536 > test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java line 375: > >> 373: applyIf = {"UseUnalignedAccesses", "true"}) >> 374: static Object[] test_xxx(MemorySegment a, int xI, int yI, int zI) { >> 375: // All RangeChecks remain -> RC smearing not good enough? > > Is there a tracking bug to further investigate at some point? https://bugs.openjdk.org/browse/JDK-8343536 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827601925 PR Review Comment: https://git.openjdk.org/jdk/pull/19970#discussion_r1827601955 From epeter at openjdk.org Mon Nov 4 11:51:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 11:51:36 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <-qjDdk1O1LApPy16cdRihBCrNUFM-K0URHazl1pZuac=.f8bc6861-31c8-4cc2-96cb-3896222030df@github.com> On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) @chhagedorn I addressed all your review suggestions. Thank you very much for the in-depth review :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454505103 From chagedorn at openjdk.org Mon Nov 4 12:12:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 12:12:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <53qOJR2DhdN02s1Y64fiuPD7ckg_Hr9mhpQWCCENQvk=.241e763a-268d-49f3-ba4a-d568ac95827b@github.com> On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian That looks good, thanks for the patience to work through all the suggestions and also for the offline discussions! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2412899131 From duke at openjdk.org Mon Nov 4 12:23:00 2024 From: duke at openjdk.org (Sorna Sarathi) Date: Mon, 4 Nov 2024 12:23:00 GMT Subject: RFR: JDK-8251926: [PPC] Removed an unused variable in assembler_ppc.cpp Message-ID: This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) ------------- Commit messages: - Removed an unused variable in assembler_ppc.cpp file Changes: https://git.openjdk.org/jdk/pull/21874/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21874&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8251926 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21874/head:pull/21874 PR: https://git.openjdk.org/jdk/pull/21874 From epeter at openjdk.org Mon Nov 4 12:23:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 12:23:38 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Thu, 17 Oct 2024 21:42:33 GMT, Vladimir Kozlov wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > For me it is confusing to call `pointer = con + sum_i(scale_i * variable_i)` as "pointer" unless it is Unsafe address which has base address as constant. It misses base address. All out pointer types are correspond to an address of some object in Java heap, out of heap, VM's object or some native (C heap) VM object. > This looks like `address_offset`, `displacement`, ... @vnkozlov Would you like to re-review? If I don't hear anything then I'll integrate tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2454574537 From roland at openjdk.org Mon Nov 4 12:27:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 4 Nov 2024 12:27:33 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: <04t-3boaqTZQNmXF3jXEvPgY-zS8oerxa99wbhvCdBg=.0ce7b0a6-d1d4-4cc0-948d-216bdc9ffbfa@github.com> On Mon, 28 Oct 2024 06:20:59 GMT, Tobias Hartmann wrote: > Shouldn't this be caught by `VerifyIterativeGVN` after [JDK-8298952 ](https://bugs.openjdk.org/browse/JDK-8298952)? That one only checks `Value`. In this case, the transformation that's not applied is performed by `Ideal`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21714#issuecomment-2454582028 From epeter at openjdk.org Mon Nov 4 13:00:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:00:43 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 16:41:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge branch 'master' into unsignedbounds > - address reviews > - comment adjust_lo empty case > - formality > - address reviews > - add comments, refactor functions to helper class > - refine comments > - remove leftover code > - add doc to TypeInt, rename parameters, remove unused methods > - change (~v & ones) == 0 to (v & ones) == ones > - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa src/hotspot/share/opto/rangeinference.cpp line 30: > 28: #include "utilities/tuple.hpp" > 29: > 30: constexpr juint SMALLINT = 3; // a value too insignificant to consider widening If you are already refactoring this code, I'd suggest giving it a better name. Seems to have to do with cardinality...? src/hotspot/share/opto/type.cpp line 4690: > 4688: const Type* tm = _ary->meet_speculative(tap->_ary); > 4689: const TypeAry* tary = tm->isa_ary(); > 4690: if (tary == nullptr) { Can you add a comment why this might happen? src/hotspot/share/opto/type.hpp line 630: > 628: * For a TypeInt t, there are 3 possible cases: > 629: * > 630: * a. t._lo >= 0. Since 0 <= t._lo <= jint(t._ulo), we have: I think you should say why `t._lo <= jint(t._ulo)` ... it seems intuitively true... hmm src/hotspot/share/opto/type.hpp line 632: > 630: * a. t._lo >= 0. Since 0 <= t._lo <= jint(t._ulo), we have: > 631: * > 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) You should say what steps you are applying here... otherwise the reader has a lot to do. Lemma, return-cast, `t._lo <= jint(t._ulo)` (maybe its own Lemma2?) src/hotspot/share/opto/type.hpp line 634: > 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) > 633: * > 634: * Which means that t._lo == jint(t._ulo). Similarly, t._hi == jint(t._uhi). Hmm. I feel like I don't immediately see the "Similarly" here.... too many hidden steps. test/hotspot/gtest/opto/test_rangeinference.cpp line 33: > 31: #include > 32: > 33: #ifdef ASSERT Why do you have this here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827691915 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827665859 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827673619 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827676703 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827680285 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827684121 From epeter at openjdk.org Mon Nov 4 13:00:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:00:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: <-d-R7jGoZ1OUrfIP23mumrC1L-WDQd3ylYoTf7TX6vs=.d83a9c38-c11b-4173-a1a9-bba2d691207a@github.com> On Mon, 4 Nov 2024 08:58:47 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge branch 'master' into unsignedbounds >> - address reviews >> - comment adjust_lo empty case >> - formality >> - address reviews >> - add comments, refactor functions to helper class >> - refine comments >> - remove leftover code >> - add doc to TypeInt, rename parameters, remove unused methods >> - change (~v & ones) == 0 to (v & ones) == ones >> - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa > > src/hotspot/share/opto/type.hpp line 622: > >> 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] >> 621: * >> 622: * Proof: For 2 jint value x, y such that they are both >= 0 or < 0. Then: > > Suggestion: > > * Proof: For 2 jint value x, y such that they are both >= 0 or both < 0. Then: > > Or are you allowing them to one be positive and one negative? Also: this is more of a "Lemma", and could be stated before the "Proof" of you property 2... it is property 2 that you are trying to prove here, right? The indentation would help for that as well. > src/hotspot/share/opto/type.hpp line 634: > >> 632: * juint(t._lo) <= juint(jint(t._ulo)) == t._ulo <= juint(t._lo) >> 633: * >> 634: * Which means that t._lo == jint(t._ulo). Similarly, t._hi == jint(t._uhi). > > Hmm. I feel like I don't immediately see the "Similarly" here.... too many hidden steps. I'm going to stop reading the proof below.... I'll read again once you respond to the comments above ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827671624 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1827680976 From thartmann at openjdk.org Mon Nov 4 13:08:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 13:08:29 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v2] In-Reply-To: References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 15:09:50 GMT, Roland Westrelin wrote: >> The transformation: >> >> >> (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) >> >> >> when i fits in an int is not always applied: when the type of `i` is >> narrowed so it fits in an int, the `CastX2P` is not enqueued for >> igvn. This can get in the way of vectorization as shown by test case >> `test2`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > fix test Ah right, I missed that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21714#issuecomment-2454667516 From mdoerr at openjdk.org Mon Nov 4 13:28:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:28:28 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <8azWlriUnVJwl6jZPUNBYlXz7GQVoWivFjf57lgDJuA=.0c9ed3af-ee54-4cd4-8740-02700a54737f@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) Looks good and trivial. Thanks for resolving this old issue. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21874#pullrequestreview-2413061019 From epeter at openjdk.org Mon Nov 4 13:28:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 13:28:41 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord Message-ID: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> There used to be a bug where this happens: - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. - Later, all field loads disappear, and the Allocation of the object is eliminated. - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. // We did not find the int_index. Just to be safe, reject this VPointer. if (!_has_int_index_after_convI2L) { return false; } - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. ------------- Commit messages: - JDK-8342498 Changes: https://git.openjdk.org/jdk/pull/21875/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342498 Stats: 182 lines in 1 file changed: 182 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21875.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21875/head:pull/21875 PR: https://git.openjdk.org/jdk/pull/21875 From mdoerr at openjdk.org Mon Nov 4 13:35:30 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:35:30 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. I think the is_uimm* checks should take an `uint64_t`. See assembler_riscv.inline.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454727814 From chagedorn at openjdk.org Mon Nov 4 13:37:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 13:37:34 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v6] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 04:14:47 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > fix trailing whitespace src/hotspot/share/opto/escape.hpp line 680: > 678: bool add_final_edges_unsafe_access(Node* n, uint opcode); > 679: > 680: int invocation() { return _invocation; } Can be made `const`: Suggestion: int invocation() const { return _invocation; } test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 33: > 31: */ > 32: > 33: package compiler.loopopts.superword; I suggest to move this test to `compiler/escapeAnalysis` and update the package accordingly to `compiler.escapeAnalysis`. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 35: > 33: package compiler.loopopts.superword; > 34: > 35: public class TestScalarize_Bailout { You should not use underlines in class names test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 37: > 35: public class TestScalarize_Bailout { > 36: > 37: static Object var1; Indentation seems to be off. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 43: > 41: try { > 42: Class Class37 = Class.forName("compiler.loopopts.superword.TestScalarize_Bailout"); > 43: synchronized (compiler.loopopts.superword.TestScalarize_Bailout.class) { I guess you do not need the fully qualified name: Suggestion: synchronized (TestScalarize_Bailout.class) { test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 43: > 41: try { > 42: Class Class37 = Class.forName("compiler.loopopts.superword.TestScalarize_Bailout"); > 43: synchronized (compiler.loopopts.superword.TestScalarize_Bailout.class) { Is `forName()` and `synchronized` really required to trigger this with mainline? If so, you should add a comment to explain why this is required. test/hotspot/jtreg/compiler/loopopts/superword/TestScalarize_Bailout.java line 48: > 46: } > 47: } > 48: } catch (Exception eeeeeeee){throw new RuntimeException(eeeeeeee);} Suggestion: } catch (Exception e) { throw new RuntimeException(e); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827660522 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827664200 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827662890 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827739836 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827665705 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827741280 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827666392 From chagedorn at openjdk.org Mon Nov 4 13:37:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 4 Nov 2024 13:37:35 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> References: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> Message-ID: On Thu, 24 Oct 2024 04:07:33 GMT, Dhamoder Nalla wrote: >> src/hotspot/share/opto/macro.cpp line 821: >> >>> 819: // If scalarize operation is adding too many nodes, bail out >>> 820: if (C->check_node_count(300, "out of nodes while scalarizing object")) { >>> 821: return nullptr; >> >> Would a bailout from this scalarization be enough or do we really require to record the method as non-compilable (which is done with `check_node_count()`? In the latter case, we could also try something like "recompilation without EA" as done, for example, here (i.e. `retry_no_escape_analysis`): >> >> https://github.com/openjdk/jdk/blob/37cfaa8deb4cc15864bb6dc2c8a87fc97cff2f0d/src/hotspot/share/opto/escape.cpp#L3858-L3866 >> >> I also suggest to use the `NodeLimitFudgeFactor` instead of `300` to have it controllable. > > Thank you for your suggestion @chhagedorn. I agree that 'recompilation without EA' makes more sense, and I have made the necessary changes. Okay thanks for investigating again. A bailout makes sense for this edge case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1827742596 From amitkumar at openjdk.org Mon Nov 4 13:44:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 13:44:32 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 13:32:30 GMT, Martin Doerr wrote: > I think the is_uimm* checks should take an `uint64_t`. See assembler_riscv.inline.hpp. But aren't `julong` same as `uint64_t` ? I saw this in `globalDefinitions.hpp` // Additional Java basic types typedef uint8_t jubyte; typedef uint16_t jushort; typedef uint32_t juint; typedef uint64_t julong; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454749491 From roland at openjdk.org Mon Nov 4 13:44:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 4 Nov 2024 13:44:33 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v2] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 06:54:43 GMT, Emanuel Peter wrote: > Do you know what JDK versions are affected? The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2454748705 From mdoerr at openjdk.org Mon Nov 4 13:50:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 4 Nov 2024 13:50:27 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. My point is that I think that the riscv solution is better. See assembler_riscv.inline.hpp. Your cast is correct, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454756323 PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454762183 From epeter at openjdk.org Mon Nov 4 14:00:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 4 Nov 2024 14:00:14 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: > There used to be a bug where this happens: > - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. > - Later, all field loads disappear, and the Allocation of the object is eliminated. > - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. > > We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: > - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. > > // We did not find the int_index. Just to be safe, reject this VPointer. > if (!_has_int_index_after_convI2L) { > return false; > } > > - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. > - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. > > **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: unlock diagnostics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21875/files - new: https://git.openjdk.org/jdk/pull/21875/files/4ddc14cc..4ff0aa27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21875&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21875.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21875/head:pull/21875 PR: https://git.openjdk.org/jdk/pull/21875 From thartmann at openjdk.org Mon Nov 4 14:18:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 4 Nov 2024 14:18:30 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: <9r7WrmtNlOWqKxm3tPgUVprgW28KAuNx0cBc3mYVspY=.3ad5a8b7-1b44-45ec-b30c-3ce41b8e4d73@github.com> On Mon, 4 Nov 2024 14:00:14 GMT, Emanuel Peter wrote: >> There used to be a bug where this happens: >> - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. >> - Later, all field loads disappear, and the Allocation of the object is eliminated. >> - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. >> >> We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: >> - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. >> >> // We did not find the int_index. Just to be safe, reject this VPointer. >> if (!_has_int_index_after_convI2L) { >> return false; >> } >> >> - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. >> - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. >> >> **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > unlock diagnostics Great job extracting this test, Emanuel. Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21875#pullrequestreview-2413188915 From amitkumar at openjdk.org Mon Nov 4 14:34:28 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 4 Nov 2024 14:34:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. Oh, got it. I will add that change in the PR and ran tests again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2454870939 From kvn at openjdk.org Mon Nov 4 16:12:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:12:30 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 14:00:14 GMT, Emanuel Peter wrote: >> There used to be a bug where this happens: >> - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. >> - Later, all field loads disappear, and the Allocation of the object is eliminated. >> - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. >> >> We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: >> - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. >> >> // We did not find the int_index. Just to be safe, reject this VPointer. >> if (!_has_int_index_after_convI2L) { >> return false; >> } >> >> - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. >> - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. >> >> **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > unlock diagnostics Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21875#pullrequestreview-2413492594 From kvn at openjdk.org Mon Nov 4 16:42:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:42:28 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Looks good. Yes, it looks like code expect LShift here instead of constant. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2413569797 From kvn at openjdk.org Mon Nov 4 16:20:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 16:20:31 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: References: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> Message-ID: <4t14KRimdrYG3dPJ4FgeeX0oz1xwGDNfuParbwVIL68=.ea5c0137-ffad-41f7-9ac0-e95daecf09ea@github.com> On Mon, 4 Nov 2024 13:34:56 GMT, Christian Hagedorn wrote: >> Thank you for your suggestion @chhagedorn. I agree that 'recompilation without EA' makes more sense, and I have made the necessary changes. > > Okay thanks for investigating again. A bailout makes sense for this edge case. Yes, bailout with recompilation is preferable. Graph could be already partially modified with some fields accesses nodes for scalaraized object. If bailout check and code is the same as in `escape.cpp` consider factoring it into one function to use in both places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1828003943 From kvn at openjdk.org Mon Nov 4 15:54:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 15:54:35 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19970#pullrequestreview-2413445139 From dfenacci at openjdk.org Mon Nov 4 15:08:47 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 4 Nov 2024 15:08:47 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Message-ID: # Issue The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. # Cause The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. The graph that leads to the issue looks like this: ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. # Solution In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) ------------- Commit messages: - Merge branch 'master' into JDK-8302459-new - JDK-8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure - Revert "JDK-8302459: compiler/vectorapi/VectorLogicalOpIdentityTest.java failed with "IRViolationException: There were one or multiple IR rule failures"" - Revert "JDK-8302459: remove unused vector inline queue" - Revert "JDK-8302459: remove unneeded changes" - Revert "JDK-8302459: remove unneeded function declaration" - Revert "JDK-8302459: add explicit -TieredCompilation to tests" - Revert "JDK-8302459: add bug numbers to tests" - Revert "JDK-8302459: update copyright year" - JDK-8302459: update copyright year - ... and 6 more: https://git.openjdk.org/jdk/compare/388d44fb...bd488a96 Changes: https://git.openjdk.org/jdk/pull/21682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302459 Stats: 13 lines in 4 files changed: 6 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From sparasa at openjdk.org Mon Nov 4 18:10:47 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 18:10:47 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove map4 enum; replace with comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/0f404dbd..1563aa2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=03-04 Stats: 22 lines in 2 files changed: 0 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From sparasa at openjdk.org Mon Nov 4 18:14:29 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 18:14:29 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:47:05 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove map4 enum; replace with comment > > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. Hi @jatin-bhateja, please see the updated code indicating the MAP4 comment next to the VEX_OPCODE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2455389911 From mli at openjdk.org Mon Nov 4 18:37:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 4 Nov 2024 18:37:37 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option Message-ID: Hi, Can you help to review this simple patch? Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. Thanks ------------- Commit messages: - Initial commit - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Merge branch 'openjdk:master' into master - Revert "initial commit" - initial commit Changes: https://git.openjdk.org/jdk/pull/21885/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343555 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21885/head:pull/21885 PR: https://git.openjdk.org/jdk/pull/21885 From dlong at openjdk.org Mon Nov 4 20:52:28 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 20:52:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. src/hotspot/cpu/s390/s390.ad line 2550: > 2548: // Unsigned Integer Immediate: 9-bit > 2549: operand SSlenDW() %{ > 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); Suggestion: predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1828368759 From dlong at openjdk.org Mon Nov 4 21:06:27 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 21:06:27 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) Would it be better to trigger cleanup based on the presence of nodes like CastPP/CheckCastPP instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2455697290 From sviswanathan at openjdk.org Mon Nov 4 21:33:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 4 Nov 2024 21:33:38 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v5] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:10:47 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > remove map4 enum; replace with comment src/hotspot/cpu/x86/assembler_x86.cpp line 2637: > 2635: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > 2636: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 2637: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14858: > 14856: InstructionAttr attributes(AVX_128bit, /* vex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 14857: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 14858: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 14880: > 14878: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > 14879: // NDD shares its encoding bits with NDS bits for regular EVEX instruction. > 14880: // Therefore, DST is passed as the second argument to minimize changes in the leaf level routine. dst is not the second argument here so the comment can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828415495 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828414929 PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1828414685 From sparasa at openjdk.org Mon Nov 4 21:59:05 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 4 Nov 2024 21:59:05 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v6] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove comment where not required ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/1563aa2c..fcc782b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From dlong at openjdk.org Mon Nov 4 22:36:34 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 4 Nov 2024 22:36:34 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Mon, 4 Nov 2024 06:26:04 GMT, Tobias Hartmann wrote: >> Do we actually generate an nmethod for the above example? It seems like it could never execute the getClass() because the line above setting `obj` would have to throw an exception if there can be no concrete instances. > > Right, this was an oversimplified example. I used this code: > > Class test(MyAbstract obj, boolean b) { > if (b) { > return obj.getClass(); > } > return null; > } > > > We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. I think there is still hope for moving the assert into `TypeNarrowKlass::make` in a future RFE. In the example above, if we are generating code for obj.getClass() based on the assumption that the type is a leaf, we could also notice that the type is abstract and deduce that obj must be null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1828483501 From vlivanov at openjdk.org Mon Nov 4 23:01:28 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 4 Nov 2024 23:01:28 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Looks good. src/hotspot/share/opto/matcher.cpp line 183: > 181: } > 182: } > 183: for (uint j = 0; j < n->outcnt(); j++) { Why don't you use DU iterator instead (e.g., `DUIterator_Fast`)? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2414301481 PR Review Comment: https://git.openjdk.org/jdk/pull/21829#discussion_r1828505551 From fyang at openjdk.org Tue Nov 5 00:45:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 00:45:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Thanks. That make sense to me. Since we are having more and more RISC-V extensions, we should rely on linux hwprobe syscall for auto detection and enablement them in the long run. Seems that we should also similarly handle other ones like `UseRVC`, `UseRVV`, etc. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2414414743 From amitkumar at openjdk.org Tue Nov 5 06:08:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 5 Nov 2024 06:08:30 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <9xZUzHxNV1awugjBCXBaH0NZUXC37yJhHDt6yNohaBM=.dfec0e3e-6d08-4dca-a529-9939c2d5aaf2@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) I think commands in `edited` section does not work with bots. You can pass integrate command again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21874#issuecomment-2456308779 From duke at openjdk.org Tue Nov 5 06:08:31 2024 From: duke at openjdk.org (duke) Date: Tue, 5 Nov 2024 06:08:31 GMT Subject: RFR: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) @Sorna-Sarathi Your change (at version 8e16c9eeae76e306490dbbe389e0c6ccba64f5b3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21874#issuecomment-2456310012 From duke at openjdk.org Tue Nov 5 06:11:34 2024 From: duke at openjdk.org (Sorna Sarathi) Date: Tue, 5 Nov 2024 06:11:34 GMT Subject: Integrated: 8251926: PPC: Remove an unused variable in assembler_ppc.cpp In-Reply-To: References: Message-ID: <3xeUYA5NCN38addWkH65IEGodG5CXl9lNyiRyvJ2Mt4=.6ca7e78a-8303-4a87-8c7f-f08142ccbe8d@github.com> On Mon, 4 Nov 2024 12:17:34 GMT, Sorna Sarathi wrote: > This PR removes an unused variable from load_const_optimized function in assembler_ppc.cpp file. > > JBS Issue: [JDK-8251926](https://bugs.openjdk.org/browse/JDK-8251926) This pull request has now been integrated. Changeset: 0f7dd98d Author: Sorna Sarathi Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/0f7dd98d9d546e0fc2c7b1df779cef35e5b5852c Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8251926: PPC: Remove an unused variable in assembler_ppc.cpp Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/21874 From jbhateja at openjdk.org Tue Nov 5 07:05:28 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 5 Nov 2024 07:05:28 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v6] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:47:05 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove comment where not required > > I think we should first check-in extended gtest asm validation script detecting these issues either before or along with this patch. > Hi @jatin-bhateja, please see the updated code indicating the MAP4 comment next to the VEX_OPCODE. The specification does not mention using 0F_3C for MAP4. I guess we are trying to be compatible with the GCC encoding scheme here. Adding MAP4 in the comments is still better. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2456386026 From rehn at openjdk.org Tue Nov 5 08:19:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 08:19:31 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Do it really makes sense to have instruction set selection diagnostic: https://github.com/openjdk/jdk/blob/dafa2e55adb6b054c342d5e723e51087d771e6d6/src/hotspot/share/runtime/globals.hpp#L59 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456509266 From thartmann at openjdk.org Tue Nov 5 08:59:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 5 Nov 2024 08:59:40 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. All green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2456591564 From roland at openjdk.org Tue Nov 5 09:07:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 09:07:30 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21790#pullrequestreview-2415019248 From chagedorn at openjdk.org Tue Nov 5 09:19:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 09:19:34 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21790#issuecomment-2456636768 From mli at openjdk.org Tue Nov 5 09:57:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 09:57:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> On Tue, 5 Nov 2024 08:16:59 GMT, Robbin Ehn wrote: > Do it really makes sense to have instruction set selection diagnostic: Do you suggest to keep it as Product or Experimental? The full sentences are as below: // DIAGNOSTIC options are not meant for VM tuning or for product modes. // They are to be used for VM quality assurance or field diagnosis // of VM bugs. They are hidden so that users will not be encouraged to // try them as if they were VM ordinary execution options. However, they // are available in the product version of the VM. Under instruction // from support engineers, VM customers can turn them on to collect // diagnostic information about VM problems. I think it should not be Experimental anymore, and seems it's better than Product, and can be used in product (`However, they are available in the product version of the VM`). But I'm not quite sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456723371 From mdoerr at openjdk.org Tue Nov 5 10:07:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 5 Nov 2024 10:07:35 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: On Mon, 4 Nov 2024 20:49:39 GMT, Dean Long wrote: >> This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. > > src/hotspot/cpu/s390/s390.ad line 2550: > >> 2548: // Unsigned Integer Immediate: 9-bit >> 2549: operand SSlenDW() %{ >> 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); > > Suggestion: > > predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1829071884 From fyang at openjdk.org Tue Nov 5 10:08:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 10:08:27 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> References: <3XREX0qVwN4xX_REogr_hGZNjlp_VVeov9uAhrf_9Bg=.f5ab6614-16b4-4271-9f15-19a731f0385e@github.com> Message-ID: On Tue, 5 Nov 2024 09:54:42 GMT, Hamlin Li wrote: > Do it really makes sense to have instruction set selection diagnostic: This was once discussed somewhere else before. Again, here is what I am thinking. First of all, we might don't want to expose these options for our end users. You will need to add to the release note for newly-added product options. There are quite a few for now and I suppose there will be more and more to come. So it's more reasonable to me to delegate to hwprobe. But if we do that, we still need a way to diagnostic or disable them when issues come (whether performance or functionality related). I don't see a better solution than making them DIAGNOSTIC ones. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456745978 From thartmann at openjdk.org Tue Nov 5 10:44:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 5 Nov 2024 10:44:33 GMT Subject: RFR: 8343206: Final graph reshaping should not compress abstract or interface class pointers [v3] In-Reply-To: References: <4_84pZqk5-pV1iTUdpf5wmVczTdHq-9-Re1qjbGU7Eo=.0fb46e18-883f-45f8-827d-567602373431@github.com> Message-ID: On Mon, 4 Nov 2024 22:34:04 GMT, Dean Long wrote: >> Right, this was an oversimplified example. I used this code: >> >> Class test(MyAbstract obj, boolean b) { >> if (b) { >> return obj.getClass(); >> } >> return null; >> } >> >> >> We pass `null` for `obj` and `false` for `b`. Usually, the branch is then only compiled with Xcomp. > > I think there is still hope for moving the assert into `TypeNarrowKlass::make` in a future RFE. In the example above, if we are generating code for obj.getClass() based on the assumption that the type is a leaf, we could also notice that the type is abstract and deduce that obj must be null. Right, we could do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21784#discussion_r1829127985 From rehn at openjdk.org Tue Nov 5 10:51:28 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 10:51:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Now we have normal and exprimental. Changing this one we would have also diagnostic. And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456842731 From epeter at openjdk.org Tue Nov 5 11:49:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:49:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:27:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/c2/TestMergeStoresMemorySegment.java >> >> Co-authored-by: Christian Hagedorn >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > >> /contributor add chhagedorn >> >> You spent enough time on this already ;) > > Thanks Emanuel, I highly appreciate that :-) Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? Thanks @vnkozlov for the approval. I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2456955481 From epeter at openjdk.org Tue Nov 5 11:49:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:49:47 GMT Subject: Integrated: 8335392: C2 MergeStores: enhanced pointer parsing In-Reply-To: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 1 Jul 2024 13:32:01 GMT, Emanuel Peter wrote: > **Background** > I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. > > **Details** > > The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. > > This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. > > More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! > > `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. > > **What this change enables** > > Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). > > Now we can do: > - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. > - Merging `Unsafe` stores to native memory. > - Merging `MemorySegment`: with array, native, ByteBuffer backing types. > - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. > > **Dealing with Overflows** > > We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-checks or - God forbid - forget overflow-checks. > > **Bench... This pull request has now been integrated. Changeset: f3671bee Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f3671beefb3ff07441a905e25619f0d1a0a2fe15 Stats: 2687 lines in 16 files changed: 2417 ins; 212 del; 58 mod 8335392: C2 MergeStores: enhanced pointer parsing Co-authored-by: Christian Hagedorn Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/19970 From epeter at openjdk.org Tue Nov 5 11:50:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:50:36 GMT Subject: RFR: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord [v2] In-Reply-To: References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 16:10:00 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> unlock diagnostics > > Good. Thanks @vnkozlov for the approval. Thanks @TobiHartmann for the review and all the helpful suggestions along the way! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21875#issuecomment-2456957356 From epeter at openjdk.org Tue Nov 5 11:50:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 11:50:37 GMT Subject: Integrated: 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord In-Reply-To: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> References: <3jLdPTBJ5FF6j7XmU9qicyfHOiPLs8_NGHN-enTh4ak=.32de86ef-a76d-4e07-a8bb-749f6648219f@github.com> Message-ID: On Mon, 4 Nov 2024 13:13:58 GMT, Emanuel Peter wrote: > There used to be a bug where this happens: > - SuperWord vectorizes, and picks a field-store as he alignment reference, using a CastP2X on he object pointer. > - Later, all field loads disappear, and the Allocation of the object is eliminated. > - The GC code then thinks the CastP2X is part of the GC barrier code... and crashes with wrong assumptions about that part of the IR. > > We should obviously not use field-accesses as alignment references for SuperWord. A few other changes have fixed this issue: > - [JDK-8328544](https://bugs.openjdk.org/browse/JDK-8328544): it disallows any non-array accesses that do not have an int-index. This code was backported and so should on its own fix the issue everywhere. But maybe somebody has the idea and wants to be more smart... allowing such memory accesses without int-indices. For that we should add this regression test. > > // We did not find the int_index. Just to be safe, reject this VPointer. > if (!_has_int_index_after_convI2L) { > return false; > } > > - Recently, we now only allow memory access to be alignment references if they are actually vectorized... which cannot happen with field stores. > - Roberto's change with GC barriers also removed the asserting/crashing code. Though I'm not sure if that means the IR is then ok. > > **At any rate**: the bug seems **fixed**, but we should add and possibly backport this **regression test** anyway. This pull request has now been integrated. Changeset: f62fc484 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f62fc4844125cc20a91dc2be39ba05a2d3aca8cf Stats: 183 lines in 1 file changed: 183 ins; 0 del; 0 mod 8342498: Add test for Allocation elimination after use as alignment reference by SuperWord Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21875 From mli at openjdk.org Tue Nov 5 11:57:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 11:57:33 GMT Subject: RFR: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 07:08:39 GMT, Emanuel Peter wrote: >> Hi, >> Can you help to review this simple patch? >> Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. >> Thanks! > > Ah, one more thing. I try to keep `SW_INFO` and `TraceSuperWord` in sync. So if we do decide to add `ALIGN_VECTOR` to `TraceSuperWord`, we should also add it to `SW_INFO`. @eme64 Thanks for the information. At first my thought was to make it easier to debug SLP process, as I observed some tests failure on riscv, but it won't print out the detailed failure reason. This pr made it easy to do so, but at the same time also introduce some verbose information unconditionally, which is not useful when there is no failure/rejection or when user don't care about it. I don't find another more reasonable way to modify current log in SLP, I'll use `-XX:CompileCommand=TraceAutoVectorization,*::*,ALIGN_VECTOR` instead, and close this pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21715#issuecomment-2456973340 From mli at openjdk.org Tue Nov 5 11:57:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 11:57:34 GMT Subject: Withdrawn: 8343070: Enable is_trace_align_vector when TraceSuperWord is set In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 14:45:33 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Currently, in SuperWord::filter_packs_for_alignment(), there is some log not turned on when TraceSuperWord is set, but I think it should, as it's more convenient for users to debug. > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21715 From duke at openjdk.org Tue Nov 5 12:08:36 2024 From: duke at openjdk.org (Benoit Daloze) Date: Tue, 5 Nov 2024 12:08:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Link: https://github.com/openjdk/jdk/pull/21285 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2456996412 From mli at openjdk.org Tue Nov 5 12:09:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 12:09:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <-wQ1pBffv50RX52DhWCnrFt4eSUrd1biyyCt2LpbUg4=.ce604df2-3371-4dbf-ad7c-a2752c5c047c@github.com> On Tue, 5 Nov 2024 10:48:25 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Now we have normal and exprimental. Changing this one we would have also diagnostic. > And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... > Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? @robehn @RealFYang I guess you two have similar opinion now? but I could be wrong. Are you suggesting to turn all the Product options (retrieved by hwprobe) to DIAGNOSTIC? I won't suggest to turn any EXPERIMENTAL to DIAGNOSTIC in this pr, as we need to test it on real hardware first, but if you've tested some of them on real hardware please let me know, I'll do it in this pr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2456997208 From roland at openjdk.org Tue Nov 5 12:21:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 12:21:52 GMT Subject: RFR: 8343068: C2: CastX2P Ideal transformation not always applied [v3] In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8343068 - fix test - test & fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21714/files - new: https://git.openjdk.org/jdk/pull/21714/files/12a471f0..31b4cdde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21714&range=01-02 Stats: 174646 lines in 1487 files changed: 23955 ins; 144716 del; 5975 mod Patch: https://git.openjdk.org/jdk/pull/21714.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21714/head:pull/21714 PR: https://git.openjdk.org/jdk/pull/21714 From roland at openjdk.org Tue Nov 5 12:22:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 12:22:15 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: References: Message-ID: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8341834 - review - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21660/files - new: https://git.openjdk.org/jdk/pull/21660/files/1070696f..9219a292 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21660&range=01-02 Stats: 174646 lines in 1487 files changed: 23955 ins; 144716 del; 5975 mod Patch: https://git.openjdk.org/jdk/pull/21660.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21660/head:pull/21660 PR: https://git.openjdk.org/jdk/pull/21660 From fyang at openjdk.org Tue Nov 5 12:30:29 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 12:30:29 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: <5xhRjghPegFr5EkU8s63LH3uvkOkYvqczBi1the4k8U=.3ffe78a3-476e-429b-9ccc-d8b959296e43@github.com> On Tue, 5 Nov 2024 10:48:25 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Now we have normal and exprimental. Changing this one we would have also diagnostic. > And some we get from hwprobe and some not, it's not easy to know which ones to manually turn on, etc... > Wouldn't it make more sense to turn all options which may be enabled hwprobe to diagnostic instead? > @robehn @RealFYang I guess you two have similar opinion now? but I could be wrong. Are you suggesting to turn all the Product options (retrieved by hwprobe) to DIAGNOSTIC? > > I won't suggest to turn any EXPERIMENTAL to DIAGNOSTIC in this pr, as we need to test it on real hardware first, but if you've tested some of them on real hardware please let me know, I'll do it in this pr. My personal opinion is that we can make following ones DIAGNOSTIC as well as they have been tested on real hardwares. I agree with you to leave the other EXPERIMENTAL ones as they are. We can still turn them DIAGNOSTIC in the future when the hardware is available for testing. product(bool, UseRVC, false, "Use RVC instructions") \ product(bool, UseRVV, false, "Use RVV instructions") \ product(bool, UseZba, false, "Use Zba instructions") \ product(bool, UseZbb, false, "Use Zbb instructions") \ product(bool, UseZbs, false, "Use Zbs instructions") \ product(bool, UseZfh, false, "Use Zfh instructions") ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457040315 From rehn at openjdk.org Tue Nov 5 12:51:28 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 5 Nov 2024 12:51:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Ok, so the path is: 1: Exprimental (should they be turn on by hwprobe?) 2a: If hwprobe => Diagnostic 2b: No hwprobe => Normal Arguably hwprobe should only turn on diagnostic options then ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457083353 From mli at openjdk.org Tue Nov 5 13:03:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 5 Nov 2024 13:03:29 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 12:49:20 GMT, Robbin Ehn wrote: > Ok, so the path is: > 1: Exprimental (should they be turn on by hwprobe?) > 2a: If hwprobe => Diagnostic > 2b: No hwprobe => Normal > I agree. > should they be turn on by hwprobe? I suggest we keep it simple, i.e. keep it as it is now. > Arguably hwprobe should only turn on diagnostic options then ? Still think we'd better keep it simple. And users can still turn on/off by themselves if they want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2457108591 From duke at openjdk.org Tue Nov 5 13:17:08 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:17:08 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines Message-ID: In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. Concerns were raised by @rwestrel in the previous PR: > When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. ------------- Commit messages: - Fix asset failures if printing is disabled - 8319850: PrintInlining should report late inlines - Revert "8319850: PrintInlining should report late inlines" - 8319850: PrintInlining should report late inlines Changes: https://git.openjdk.org/jdk/pull/21899/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319850 Stats: 22 lines in 2 files changed: 22 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Tue Nov 5 13:18:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:18:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method Message-ID: This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: ConINode* node = _igvn.intcon(i); set_ctrl(node, C->root()); and ConLNode* node = _igvn.longcon(i); set_ctrl(node, C->root()); Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. ------------- Commit messages: - Add helper methods for zerocon, makecon, and integercon too - 8343148: C2: Refactor uses of "PhaseValues::intcon() + PhaseIdealLoop::set_ctrl()" into separate method Changes: https://git.openjdk.org/jdk/pull/21836/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343148 Stats: 112 lines in 5 files changed: 40 ins; 36 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From duke at openjdk.org Tue Nov 5 13:19:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:19:03 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Message-ID: Printing incorrectly printed `nullptr` instead of `null` Buggy: ScopeDesc(pc=0x0000000104c05468 offset=2e8): java.lang.Class::desiredAssertionStatus at 20 (line 3984) Locals - l0: reg rfp [58],oop - l1: stack[0],oop - l2: nullptr - l3: empty Expression stack - @0: nullptr Fixed: ScopeDesc(pc=0x0000000106fdd468 offset=2e8): java.lang.Class::desiredAssertionStatus at 20 (line 3984) Locals - l0: reg rfp [58],oop - l1: stack[0],oop - l2: null - l3: empty Expression stack - @0: null ------------- Commit messages: - 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Changes: https://git.openjdk.org/jdk/pull/21869/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21869&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323803 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21869.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21869/head:pull/21869 PR: https://git.openjdk.org/jdk/pull/21869 From chagedorn at openjdk.org Tue Nov 5 13:18:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 13:18:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. While at it, we could extend this to the other constant creation methods `zerocon`, `makecon`, and `integercon` as well (`uncached_makecon` is only called by the other `*con*` methods - could be made `private` at some point). I suggest to update the RFE title accordingly since it only mentions `intcon` now. Maybe something like `PhaseValue::*con*() + ...`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2453927465 From epeter at openjdk.org Tue Nov 5 13:41:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 13:41:51 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/918f9b3e...457735c9 FYI https://github.com/openjdk/jdk/pull/19970 is now integrated - thanks for the patience :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2457208051 From swen at openjdk.org Tue Nov 5 15:08:45 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:08:45 GMT Subject: RFR: 8333893: Optimization for StringBuilder append boolean & null [v20] In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 21:56:53 GMT, Shaojin Wen wrote: >> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 26 additional commits since the last revision: > > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - fix build error > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'origin/optim_str_builder_append_202406' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - revert test > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - Merge remote-tracking branch 'upstream/master' into optim_str_builder_append_202406 > - ... and 16 more: https://git.openjdk.org/jdk/compare/88a99fce...457735c9 It has been tested that mergeStore can work after the master branch is merged ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2457414686 From swen at openjdk.org Tue Nov 5 15:08:45 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:08:45 GMT Subject: Integrated: 8333893: Optimization for StringBuilder append boolean & null In-Reply-To: References: Message-ID: On Mon, 10 Jun 2024 12:12:58 GMT, Shaojin Wen wrote: > After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into primitive arrays by combining values ??into larger stores. > > This PR rewrites the code of appendNull and append(boolean) methods so that these two methods can be optimized by C2. This pull request has now been integrated. Changeset: 5890d943 Author: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/5890d9438bbde88b89070052926a2eafe13d7b42 Stats: 133 lines in 5 files changed: 79 ins; 18 del; 36 mod 8333893: Optimization for StringBuilder append boolean & null Reviewed-by: liach ------------- PR: https://git.openjdk.org/jdk/pull/19626 From chagedorn at openjdk.org Tue Nov 5 15:19:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:19:29 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21869#pullrequestreview-2415938204 From kvn at openjdk.org Tue Nov 5 15:44:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 15:44:30 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Nice refactoring ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21790#pullrequestreview-2416007518 From swen at openjdk.org Tue Nov 5 15:45:04 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 15:45:04 GMT Subject: RFR: 8343629: More MergeStore benchmark Message-ID: 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge branch 'master' into merge_store_bench_202410 - add putBytes4 and improved put Changes: https://git.openjdk.org/jdk/pull/21659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343629 Stats: 315 lines in 1 file changed: 71 ins; 51 del; 193 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From chagedorn at openjdk.org Tue Nov 5 15:47:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:47:41 GMT Subject: RFR: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... Thanks Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21790#issuecomment-2457518979 From qamai at openjdk.org Tue Nov 5 15:52:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 5 Nov 2024 15:52:38 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v4] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 03:36:12 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Re-use optimize() and add backend-specific should_lower() Thanks a lot, the patch looks good to me. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/21599#pullrequestreview-2416032305 From chagedorn at openjdk.org Tue Nov 5 15:55:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 5 Nov 2024 15:55:36 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Looks good to me. Since you could take over the patch from @caojoshua, you should add him as a contributor with `/contributor add @caojoshua`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2416042408 From swen at openjdk.org Tue Nov 5 16:25:46 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 16:25:46 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 11:45:37 GMT, Emanuel Peter wrote: >>> /contributor add chhagedorn >>> >>> You spent enough time on this already ;) >> >> Thanks Emanuel, I highly appreciate that :-) > > Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? > Thanks @vnkozlov for the approval. > > I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. @eme64 How do I use the TraceMergeStores option? It worked before, but now it gives an error. build/macosx-aarch64-server-fastdebug/jdk/bin/java -Dtest=appendNullLatin1 -XX:+TraceMergeStores output Unrecognized VM option 'TraceMergeStores' Did you mean '(+/-)MergeStores'? Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2457624721 From epeter at openjdk.org Tue Nov 5 16:47:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 5 Nov 2024 16:47:43 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v17] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 16:22:35 GMT, Shaojin Wen wrote: >> Thanks @chhagedorn for the extensive reviews and collaboration on improving the proofs ? >> Thanks @vnkozlov for the approval. >> >> I did an offline merge and testing (to avoid requiring a re-approval) - all looks good. > > @eme64 How do I use the TraceMergeStores option? It worked before, but now it gives an error. > > > build/macosx-aarch64-server-fastdebug/jdk/bin/java -Dtest=appendNullLatin1 -XX:+TraceMergeStores > > > output > > Unrecognized VM option 'TraceMergeStores' > Did you mean '(+/-)MergeStores'? > Error: Could not create the Java Virtual Machine. > Error: A fatal exception has occurred. Program will exit. @wenshao Ah, good question! I changed it from a "global" flag to a "compile option". You can now filter the methods! And you can enable different tags - so you can regulate how verbose it is. Example: `-XX:CompileCommand=TraceMergeStores,Test::test*,SUCCESS,ADJACENCY,ALIASING,BASIC` And to see all available tags: `-XX:CompileCommand=TraceMergeStores,Test::test*,help` Usage for CompileCommand TraceMergeStores: -XX:CompileCommand=TraceMergeStores,, tags descriptions BASIC Trace basic analysis steps POINTER Trace pointer IR ALIASING Trace MemPointerSimpleForm::get_aliasing_with ADJACENCY Trace adjacency SUCCESS Trace successful merges You might have to play around a little to see what is helpful to you. And I'm always open to feedback :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2457676944 From rcastanedalo at openjdk.org Tue Nov 5 17:07:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:07:13 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Use DUIterator_Fast to traverse node outputs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21829/files - new: https://git.openjdk.org/jdk/pull/21829/files/e85ba7cb..b0aa39fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21829&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21829.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21829/head:pull/21829 PR: https://git.openjdk.org/jdk/pull/21829 From rcastanedalo at openjdk.org Tue Nov 5 17:07:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:07:13 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 22:58:47 GMT, Vladimir Ivanov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use DUIterator_Fast to traverse node outputs > > src/hotspot/share/opto/matcher.cpp line 183: > >> 181: } >> 182: } >> 183: for (uint j = 0; j < n->outcnt(); j++) { > > Why don't you use DU iterator instead (e.g., `DUIterator_Fast`)? Right, done in commit b0aa39fc, thanks. Please re-review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21829#discussion_r1829715261 From roland at openjdk.org Tue Nov 5 17:12:33 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 5 Nov 2024 17:12:33 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457735470 From rcastanedalo at openjdk.org Tue Nov 5 17:21:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 17:21:44 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations Message-ID: This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) The end result is the generation of fewer explicit address computation instructions. #### Testing ##### Functionality - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). ##### Performance - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. ------------- Commit messages: - Re-add to worklist only if it is the offset that changes - Simplify test - Remove test condition - Generalize test for aarch64 - Merge better with surrounding code - Add tentative solution (guarded with UseNewCode) - Add test case Changes: https://git.openjdk.org/jdk/pull/21898/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343067 Stats: 73 lines in 3 files changed: 70 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21898/head:pull/21898 PR: https://git.openjdk.org/jdk/pull/21898 From kvn at openjdk.org Tue Nov 5 18:05:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:05:29 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:02:16 GMT, Roberto Casta?eda Lozano wrote: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. src/hotspot/share/opto/phaseX.cpp line 1647: > 1645: if (u->is_Mem()) { > 1646: worklist.push(u); > 1647: } else if (n == use->in(AddPNode::Offset) && `n == use->in(AddPNode::Offset)` result can be saved outside loop in local var. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21898#discussion_r1829796865 From kvn at openjdk.org Tue Nov 5 18:10:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:10:28 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null Trivial ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21869#pullrequestreview-2416366722 From duke at openjdk.org Tue Nov 5 18:19:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 18:19:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> On Tue, 5 Nov 2024 17:10:08 GMT, Roland Westrelin wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. > > What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457866376 From kvn at openjdk.org Tue Nov 5 18:24:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:24:29 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2416390797 From kvn at openjdk.org Tue Nov 5 18:24:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:24:29 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 18:20:58 GMT, Vladimir Kozlov wrote: > Do we have other places (not new constant node) where we set Root as control? In loop opts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2457874998 From duke at openjdk.org Tue Nov 5 18:25:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 5 Nov 2024 18:25:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> References: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> Message-ID: <3GO6-2ZBvqEpDdVBejUIw4d_wIKNqh7tJbNOkt4UBHM=.aa92ed3b-44f6-4767-a90a-d1f472f0a74b@github.com> On Tue, 5 Nov 2024 18:16:41 GMT, theoweidmannoracle wrote: >>> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >>> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. >> >> What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? > > @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. > > There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. Thanks @theoweidmannoracle for continuing this work and investigating the `CallGenerator::for_late_inline_virtual()` stuff. Not a reviewer, but LGTM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2457876403 From kvn at openjdk.org Tue Nov 5 18:28:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 18:28:33 GMT Subject: RFR: 8343173: Remove ZGC-specific non-JVMCI test groups [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 16:13:53 GMT, Leonid Mesnik wrote: >> The JVMCI should be supported by all GCs and specific >> hotspot_compiler_all_gcs >> group is not needed anymore. >> >> There are few failures of JVMCI tests with ZGC happened, the bug >> https://bugs.openjdk.org/browse/JDK-8343233 >> is filed and corresponding tests are problemlisted. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - typo fixed > - Merge branch 'master' of https://github.com/openjdk/jdk into 8343173 > - 8343173: Remove ZGC-specific non-JVMCI test groups Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21774#pullrequestreview-2416399409 From rcastanedalo at openjdk.org Tue Nov 5 19:50:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 19:50:12 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Hoist changed offset input check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21898/files - new: https://git.openjdk.org/jdk/pull/21898/files/6dcbb0c6..deb7c4e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21898&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21898/head:pull/21898 PR: https://git.openjdk.org/jdk/pull/21898 From rcastanedalo at openjdk.org Tue Nov 5 19:50:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 5 Nov 2024 19:50:12 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: <8yo89AtWDHWMseurCUy1o_-is_JMbtrogj1ci1-FNbk=.63b0f9e5-2c0c-4e9b-9ecd-fe0944bc160b@github.com> On Tue, 5 Nov 2024 18:03:15 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Hoist changed offset input check > > src/hotspot/share/opto/phaseX.cpp line 1647: > >> 1645: if (u->is_Mem()) { >> 1646: worklist.push(u); >> 1647: } else if (n == use->in(AddPNode::Offset) && > > `n == use->in(AddPNode::Offset)` result can be saved outside loop in local var. Thanks, done (commit deb7c4e1). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21898#discussion_r1829916712 From sparasa at openjdk.org Tue Nov 5 20:52:41 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 5 Nov 2024 20:52:41 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update opcodes for load based operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21770/files - new: https://git.openjdk.org/jdk/pull/21770/files/fcc782b2..bca87165 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21770&range=05-06 Stats: 32 lines in 1 file changed: 0 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21770.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21770/head:pull/21770 PR: https://git.openjdk.org/jdk/pull/21770 From kvn at openjdk.org Tue Nov 5 20:55:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 5 Nov 2024 20:55:28 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 19:50:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: >> >> ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) >> >> The end result is the generation of fewer explicit address computation instructions. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Hoist changed offset input check Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21898#pullrequestreview-2416668042 From lmesnik at openjdk.org Tue Nov 5 20:55:35 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 5 Nov 2024 20:55:35 GMT Subject: Integrated: 8343173: Remove ZGC-specific non-JVMCI test groups In-Reply-To: References: Message-ID: <-1bZpI933zmujmTibsiiOkDdxnlxnKEGVGAPlqfvYik=.a0981eca-c8da-466c-a209-b266afea8513@github.com> On Tue, 29 Oct 2024 22:01:08 GMT, Leonid Mesnik wrote: > The JVMCI should be supported by all GCs and specific > hotspot_compiler_all_gcs > group is not needed anymore. > > There are few failures of JVMCI tests with ZGC happened, the bug > https://bugs.openjdk.org/browse/JDK-8343233 > is filed and corresponding tests are problemlisted. This pull request has now been integrated. Changeset: 847cc5eb Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/847cc5ebac43b83746d8f238c5f9ecf2972a2796 Stats: 12 lines in 2 files changed: 8 ins; 4 del; 0 mod 8343173: Remove ZGC-specific non-JVMCI test groups Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/21774 From cslucas at openjdk.org Tue Nov 5 21:02:29 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 5 Nov 2024 21:02:29 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: <2CcKaBp-JbbQ78T0ruK9IQtGMkexY9eiGF5xIHQh33M=.7dc91766-57f1-47b1-88d7-2c133d80011a@github.com> On Tue, 5 Nov 2024 08:57:14 GMT, Tobias Hartmann wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Address PR feedback: include test execution options. > > All green. Thank you @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2458138370 From duke at openjdk.org Tue Nov 5 21:02:30 2024 From: duke at openjdk.org (duke) Date: Tue, 5 Nov 2024 21:02:30 GMT Subject: RFR: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" [v3] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 21:53:45 GMT, Cesar Soares Lucas wrote: >> Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: >> >> - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 >> >> - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. >> >> - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. >> >> After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. >> >> The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. >> >> --------- >> >> ### Tests >> >> Win, Mac & Linux tier1-4 on x64 & Aarch64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR feedback: include test execution options. @JohnTortugo Your change (at version 2449e42c8a01f600633d637651e6d53ff69297bc) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21778#issuecomment-2458140525 From cslucas at openjdk.org Tue Nov 5 21:22:41 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 5 Nov 2024 21:22:41 GMT Subject: Integrated: 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" In-Reply-To: References: Message-ID: <6PmRW_j30ZJZqXw8w7LkgPrvMA2ID0E0Eyjt5F-H4KU=.2f555ae0-8272-4b7a-87c4-4115e6465f3e@github.com> On Wed, 30 Oct 2024 00:40:22 GMT, Cesar Soares Lucas wrote: > Please, consider this patch to fix an issue that happens when a Phi previously considered reducible become later irreducible. The overall situation that causes the problem is like so: > > - Consider that there are at least 2 scalar replaceable objects (Obj1 and Obj2; Obj2 is stored in a field of Obj1) when we start iterating the loop at escape.cpp:301 > > - In the first iteration of the loop the call chain starting with `adjust_scalar_replaceable_state` ends up calling `can_reduce_phi` and considering Phi1 as reducible. This Phi has only Obj1 as *SR* input. > > - In another iteration of the loop Obj2 is flagged as NSR. For instance, because we are storing Obj2 in an unknown position of an array. This will cause `found_nsr_alloc` to be set to `true`. > > After the loop finishes, the execution will go to `find_scalar_replaceable_allocs`. The code will process Obj1, because it's still scalar replaceable, but will find that this object is stored in a field of a - **now** - NSR object. Therefore, correctly, Obj1 will also be marked as NSR. When Obj1 is marked as NSR Phi1 becomes irreducible because it doesn't have any more scalar replaceable input. > > The solution I'm proposing is simply revisit the "reducibility" of the Phis when an object is marked as NSR. > > --------- > > ### Tests > > Win, Mac & Linux tier1-4 on x64 & Aarch64. This pull request has now been integrated. Changeset: d4d9831c Author: Cesar Soares Lucas URL: https://git.openjdk.org/jdk/commit/d4d9831c9075c1a157d8375e6902bfc6c731389a Stats: 124 lines in 3 files changed: 121 ins; 0 del; 3 mod 8340454: C2 EA asserts with "previous reducible Phi is no longer reducible before SUT" Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21778 From vlivanov at openjdk.org Tue Nov 5 21:39:29 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 5 Nov 2024 21:39:29 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: <1XHa78pQg4aMT2bD_vFY9dCP3h4XpfOtw3skiKBjx-g=.f0eb7973-3b49-4667-9b20-45a4ea5b9c2e@github.com> On Tue, 5 Nov 2024 19:50:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: >> >> ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) >> >> The end result is the generation of fewer explicit address computation instructions. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Hoist changed offset input check Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21898#pullrequestreview-2416744542 From vlivanov at openjdk.org Tue Nov 5 21:43:31 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 5 Nov 2024 21:43:31 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 17:07:13 GMT, Roberto Casta?eda Lozano wrote: >> This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: >> >> ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) >> >> Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). >> >> The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. >> >> Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Use DUIterator_Fast to traverse node outputs Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21829#pullrequestreview-2416750596 From sviswanathan at openjdk.org Tue Nov 5 21:56:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 5 Nov 2024 21:56:30 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: <_7R5NelxVRX3Cze4kId-NsdQ17qSPoF2cavVJYyF5Qo=.2ddd34f2-4cb0-4c74-8c8b-8596d9c42c1b@github.com> On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21770#pullrequestreview-2416768268 From swen at openjdk.org Tue Nov 5 23:41:44 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 5 Nov 2024 23:41:44 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian Currently, TraceMergeStores can only be used in fastdebug images. Are you planning to support it in release images? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2458416345 From swen at openjdk.org Wed Nov 6 00:31:28 2024 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 6 Nov 2024 00:31:28 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:03:33 GMT, Shaojin Wen wrote: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. @eme64 Below are the performance numbers running under AMD EPYC? Genoa (x64), where the scenario of putBytes4GetBytes is "null".getBytes(0, 4, bytes4, off); Is it possible to do MergeStore in this scenario? Benchmark Mode Cnt Score Error Units MergeStoreBench.getCharB avgt 5 6038.532 ? 533.982 ns/op MergeStoreBench.getCharBU avgt 5 4923.182 ? 163.872 ns/op MergeStoreBench.getCharBV avgt 5 3111.268 ? 84.077 ns/op MergeStoreBench.getCharC avgt 5 2245.270 ? 33.559 ns/op MergeStoreBench.getCharL avgt 5 6109.519 ? 249.512 ns/op MergeStoreBench.getCharLU avgt 5 4552.425 ? 161.933 ns/op MergeStoreBench.getCharLV avgt 5 2239.866 ? 91.853 ns/op MergeStoreBench.getIntB avgt 5 8163.035 ? 137.565 ns/op MergeStoreBench.getIntBU avgt 5 9136.199 ? 259.491 ns/op MergeStoreBench.getIntBV avgt 5 314.123 ? 4.510 ns/op MergeStoreBench.getIntL avgt 5 7879.011 ? 10.759 ns/op MergeStoreBench.getIntLU avgt 5 8968.715 ? 268.414 ns/op MergeStoreBench.getIntLV avgt 5 2228.228 ? 1.510 ns/op MergeStoreBench.getIntRB avgt 5 8618.141 ? 22.545 ns/op MergeStoreBench.getIntRBU avgt 5 11239.977 ? 447.754 ns/op MergeStoreBench.getIntRL avgt 5 9060.754 ? 236.147 ns/op MergeStoreBench.getIntRLU avgt 5 9365.050 ? 154.357 ns/op MergeStoreBench.getIntRU avgt 5 2540.704 ? 75.198 ns/op MergeStoreBench.getIntU avgt 5 2508.954 ? 74.999 ns/op MergeStoreBench.getLongB avgt 5 24940.668 ? 16857.311 ns/op MergeStoreBench.getLongBU avgt 5 14126.468 ? 329.241 ns/op MergeStoreBench.getLongBV avgt 5 607.128 ? 23.775 ns/op MergeStoreBench.getLongL avgt 5 25519.679 ? 15393.727 ns/op MergeStoreBench.getLongLU avgt 5 14598.271 ? 481.158 ns/op MergeStoreBench.getLongLV avgt 5 2227.659 ? 16.334 ns/op MergeStoreBench.getLongRB avgt 5 25158.839 ? 18209.451 ns/op MergeStoreBench.getLongRBU avgt 5 14005.082 ? 208.154 ns/op MergeStoreBench.getLongRL avgt 5 25303.319 ? 14775.524 ns/op MergeStoreBench.getLongRLU avgt 5 14481.847 ? 309.623 ns/op MergeStoreBench.getLongRU avgt 5 3065.744 ? 15.405 ns/op MergeStoreBench.getLongU avgt 5 3048.522 ? 0.704 ns/op MergeStoreBench.putBytes4 avgt 5 933.283 ? 6.197 ns/op MergeStoreBench.putBytes4GetBytes avgt 5 5917.932 ? 199.901 ns/op MergeStoreBench.putBytes4U avgt 5 944.097 ? 25.902 ns/op MergeStoreBench.putBytes4X avgt 5 944.714 ? 18.924 ns/op MergeStoreBench.putChars4B avgt 5 5679.262 ? 154.030 ns/op MergeStoreBench.putChars4BU avgt 5 1143.133 ? 4.250 ns/op MergeStoreBench.putChars4BV avgt 5 4530.941 ? 124.318 ns/op MergeStoreBench.putChars4C avgt 5 1138.541 ? 27.843 ns/op MergeStoreBench.putChars4L avgt 5 5647.885 ? 112.363 ns/op MergeStoreBench.putChars4LU avgt 5 1142.501 ? 4.400 ns/op MergeStoreBench.putChars4LV avgt 5 1143.770 ? 3.435 ns/op MergeStoreBench.putChars4S avgt 5 1141.919 ? 36.528 ns/op MergeStoreBench.setCharBS avgt 5 6114.143 ? 144.826 ns/op MergeStoreBench.setCharBV avgt 5 3607.599 ? 87.720 ns/op MergeStoreBench.setCharC avgt 5 4510.196 ? 5.445 ns/op MergeStoreBench.setCharLS avgt 5 5641.424 ? 195.167 ns/op MergeStoreBench.setCharLV avgt 5 2267.712 ? 40.752 ns/op MergeStoreBench.setIntB avgt 5 8049.368 ? 233.618 ns/op MergeStoreBench.setIntBU avgt 5 18052.279 ? 2428.567 ns/op MergeStoreBench.setIntBV avgt 5 3287.905 ? 63.375 ns/op MergeStoreBench.setIntL avgt 5 2135.887 ? 62.601 ns/op MergeStoreBench.setIntLU avgt 5 4795.636 ? 74.974 ns/op MergeStoreBench.setIntLV avgt 5 2154.363 ? 81.324 ns/op MergeStoreBench.setIntRB avgt 5 13895.941 ? 7981.782 ns/op MergeStoreBench.setIntRBU avgt 5 14756.267 ? 1585.571 ns/op MergeStoreBench.setIntRL avgt 5 3284.792 ? 37.939 ns/op MergeStoreBench.setIntRLU avgt 5 5958.555 ? 27.404 ns/op MergeStoreBench.setIntRU avgt 5 5983.119 ? 79.627 ns/op MergeStoreBench.setIntU avgt 5 4848.655 ? 168.466 ns/op MergeStoreBench.setLongB avgt 5 31871.401 ? 1233.822 ns/op MergeStoreBench.setLongBU avgt 5 25704.975 ? 5105.792 ns/op MergeStoreBench.setLongBV avgt 5 2199.367 ? 69.511 ns/op MergeStoreBench.setLongL avgt 5 5486.926 ? 30.874 ns/op MergeStoreBench.setLongLU avgt 5 4503.212 ? 81.635 ns/op MergeStoreBench.setLongLV avgt 5 2144.943 ? 38.944 ns/op MergeStoreBench.setLongRB avgt 5 30338.353 ? 1631.512 ns/op MergeStoreBench.setLongRBU avgt 5 25025.442 ? 2690.138 ns/op MergeStoreBench.setLongRL avgt 5 4553.245 ? 128.721 ns/op MergeStoreBench.setLongRLU avgt 5 4793.427 ? 1.474 ns/op MergeStoreBench.setLongRU avgt 5 4803.963 ? 74.017 ns/op MergeStoreBench.setLongU avgt 5 4564.326 ? 146.283 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458465745 From dlong at openjdk.org Wed Nov 6 00:59:28 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 00:59:28 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: On Tue, 5 Nov 2024 10:05:02 GMT, Martin Doerr wrote: >> src/hotspot/cpu/s390/s390.ad line 2550: >> >>> 2548: // Unsigned Integer Immediate: 9-bit >>> 2549: operand SSlenDW() %{ >>> 2550: predicate(Immediate::is_uimm8((julong)n->get_long()-1)); >> >> Suggestion: >> >> predicate(n->get_long() >= 1 && Immediate::is_uimm8((julong)n->get_long()-1)); > > I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. Right, it's not UB, but sometimes it is a bug, and would be flagged by things like -fsanitize=unsigned-integer-overflow, so my preference would be to avoid it if possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1830259148 From fyang at openjdk.org Wed Nov 6 03:24:28 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 03:24:28 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks We don't auto-enable Exprimental options through hwprobe until they are fully tested on real hardwares. That's what we do for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2458651208 From chagedorn at openjdk.org Wed Nov 6 06:12:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 06:12:35 GMT Subject: Integrated: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor In-Reply-To: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> References: <5krLcDigRN94zhl5T8zX33E_7qyajZH96Xv-1biYGOc=.860f4223-44fa-4904-ae70-dafe74b2989c@github.com> Message-ID: <-4QGg9Ue9Sk2hwx6V07buYycJvFNcRBcY4tU9VI8dYg=.0141577d-1bec-42a6-bae0-1b802927dcb5@github.com> On Wed, 30 Oct 2024 15:18:56 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and post loop (this PR) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): Loop Unswitching and removing useless Assertion Predicates (upcoming) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > --- > > #### Refactorings of this Patch > This patch replaces the predicate walking and cloning code for **main and post loops**. The code can reuse the code established w... This pull request has now been integrated. Changeset: 4431852a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4431852a880b06241231d346311170331c20ab2d Stats: 275 lines in 5 files changed: 94 ins; 164 del; 17 mod 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21790 From jbhateja at openjdk.org Wed Nov 6 06:36:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 06:36:30 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21770#pullrequestreview-2417359277 From jbhateja at openjdk.org Wed Nov 6 06:36:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 06:36:31 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> References: <9lYvrMFQKW4Cq9zKFDjkNg9zKgLvdaCtVQTZgUGgpNs=.78ee059a-5f85-4d6e-828a-a71351d44763@github.com> Message-ID: On Wed, 30 Oct 2024 09:05:50 GMT, Jatin Bhateja wrote: > Hi @vamsi-parasa, NDD is very flexible in terms of argument selection, i.e. ADDL NDD, SRC1 (ModRM.R/M), SRC2 (ModRM.REG) has opcode 0x01 Whereas, ADDL NDD, SRC1 (ModRM.REG), SRC2 (ModRM.R/M) has opcode 0x03 > > In this case, we are trying to match GCC encoding scheme. > > Can you please add the following comment here since the argument nomenclature does not match with parameter nomenclature? > > NDD shares its encoding bits with NDS bits for regular EVEX instruction. Therefore we are passing DST as the second argument to minimize changes in leaf level routine. Thanks for addressing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21770#discussion_r1830459760 From chagedorn at openjdk.org Wed Nov 6 07:06:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 07:06:01 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Message-ID: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> #### Replacing the Remaining Predicate Walking and Cloning Code The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) --- (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. --- #### Refactorings of this Patch - This patch replaces the predicate walking in `PhaseIdealLoop::get_assertion_predicates()` which is used for Loop Unswitching and removing useless Template Assertion Predicates (called from `PhaseIdealLoop::collect_useful_template_assertion_predicates_for_loop()`). - Note that the cloning code in Loop Unswitching is not replaced, yet, because we clone the Template Assertion Predicates in the original order as currently found in the graph which also allowed us to use `PhaseIdealLoop::create_new_if_for_predicate()`. This means that we first walk from the loop entry to the last Template Assertion Predicate and then start cloning them in the reverse order (which ensures that we keep the original order of the Template Assertion Predicates). I don't think that keeping the original order is a strong requirement. Once we replace the UCTs with halt nodes, we do not require to call `create_new_if_for_predicate()` anymore and could theoretically just clone and initialize the Template Assertion Predicates in the opposite order as originally found in the graph which is easier to implement. This is currently also done for the other loop opts that require Assertion Predicates cloning/initialization. I think it's probably safe to do this for Loop Unswitching as well once we replace UCTs with halt nodes (@rwestrel what do you think?). If at some point, we need to keep the Assertion Predicate order, we can just add this functionality to the `PredicateIterator` classes. Anyhow, I'm leaving this code in`clone_assertion_predicates_to_unswitched_loop()` as it is for now and revisit it later again. Thanks, Christian ------------- Commit messages: - 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21918/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21918&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342945 Stats: 59 lines in 4 files changed: 22 ins; 22 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21918.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21918/head:pull/21918 PR: https://git.openjdk.org/jdk/pull/21918 From epeter at openjdk.org Wed Nov 6 07:25:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 6 Nov 2024 07:25:29 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:03:33 GMT, Shaojin Wen wrote: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > ```java > "null".getBytes(0, 4, bytes4, off); > ``` > > Is it possible to do MergeStore in this scenario? I don't know. What do the logs say? And what does it currently compile down to, i.e. what assembly instructions? Otherwise I think this update seems reasonable. It would be nice if you could do some summary / explanation: which cases do still not optimize, and why? For that, it would be helpful if you had a run with, and one without `MergeStores` enabled - then we can easily compare the performance! You can find an example of how to do that easily here: https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458877595 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458878368 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458880901 PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2458883551 From epeter at openjdk.org Wed Nov 6 07:32:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 6 Nov 2024 07:32:45 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: On Tue, 5 Nov 2024 23:38:59 GMT, Shaojin Wen wrote: > Currently, TraceMergeStores can only be used in fastdebug images. Are you planning to support it in release images? I generally don't support it in release builds, only debug - or rather `NOT_PRODUCT`. The issue with supporting in release is that some other printing methods I use are not available in product (`Node::dump`). And if we support it in release, then we have to create a CSR, clearly specify what it prints, and then we are going to be less flexible in the future with changing the behavior. So I would rather not make it product, at least for now ;) BTW: this is also why you can only disable `-XX:-MergeStores` with `-XX:+UnlockDiagnosticVMOptions `: it is not a full product flag, and so does not require a CSR, and we are able to remove it or change its behavior. But if someone really has an issue with the MergeStores optimization, they at least have a workaround until we are able to fix it ;) Does that make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2458892679 From thartmann at openjdk.org Wed Nov 6 08:09:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:09:32 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) src/hotspot/share/opto/callGenerator.cpp line 734: > 732: } > 733: C->set_inlining_progress(true); > 734: C->set_do_cleanup(kit.stopped() || result->Opcode() == Op_VectorBox); // path is dead or vector box; needs cleanup This only triggers if the return value of the incrementally inlined method is a `VectorBox`, right? Is that sufficient? Could the `VectorBox` be hidden by another node? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1830549530 From thartmann at openjdk.org Wed Nov 6 08:14:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:14:29 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> References: <1-GKNeVswpUac-XTPAW78e7qQhfW8JRDlaOyY5mLVX4=.36089295-410b-477c-b657-82dd8109cc40@github.com> Message-ID: On Tue, 5 Nov 2024 18:16:41 GMT, theoweidmannoracle wrote: >>> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >>> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. >> >> What about regular calls that fail to inline initially because the compiler ran out of inlining budget but are inlined later on? > > @rwestrel I think in case the inlining budget has been exceeded (i.e. try_to_inline and subsequently ok_to_inline fail), there's only two code locations where we would create a late inlining code generator: [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L380) and [here](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L292). Both are calls to CallGenerator::for_late_inline_virtual() that create a LateInlineVirtualCallGenerator, which only performs strength reduction AFAIK. > > There might be something I'm missing, though, since I've only been working on the C2 compiler for three days ? So please feel free to point me other cases of late inlining and I will investigate. > @theoweidmannoracle @caojoshua was not found in the census. @caojoshua You might want to [associate your GitHub account and your OpenJDK username](https://wiki.openjdk.org/display/SKARA/Skara#Skara-AssociatingyourGitHubaccountandyourOpenJDKusername). @theoweidmannoracle you can add Joshua via `/contributor add jcao` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2458958121 From thartmann at openjdk.org Wed Nov 6 08:17:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:17:31 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Please update the copyright dates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2458962968 From thartmann at openjdk.org Wed Nov 6 08:29:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:29:28 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2417539451 From dfenacci at openjdk.org Wed Nov 6 08:38:29 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 6 Nov 2024 08:38:29 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: <-Rus6nFTc2zKhUmxgsFbK1H6Q1yr9OKHKTcdej-I0jw=.193754f3-c396-42c0-86b5-73af1e11bd8b@github.com> On Mon, 4 Nov 2024 21:03:37 GMT, Dean Long wrote: > Would it be better to trigger cleanup based on the presence of nodes like CastPP/CheckCastPP instead? Good point. At first I wanted to restrict the extra cleanups as much as possible (checking for a VectorBox seemed more restrictive) but the "origin" of the issue are actually the CastPP/CheckCastPP nodes. I just want to check how "expensive" that is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2459002194 From roland at openjdk.org Wed Nov 6 08:42:28 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 08:42:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. With a simple test case: public static void main(String[] args) { for (int i = 0; i < 20_000; i++) { test1(); } } private static void test1() { inlined1(); } private static void inlined1() { } } Without your patch: $ java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:CompileOnly=TestLateInlining::test1 -XX:CompileCommand=quiet -XX:+PrintInlining -XX:+AlwaysIncrementalInline TestLateInlining 87 1 n jdk.internal.vm.Continuation::enterSpecial (native) (static) 87 2 n jdk.internal.vm.Continuation::doYield (native) (static) 92 3 b TestLateInlining::test1 (4 bytes) @ 0 TestLateInlining::inlined1 (1 bytes) inline (hot) With your patch: $ java -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:CompileOnly=TestLateInlining::test1 -XX:CompileCommand=quiet -XX:+PrintInlining -XX:+AlwaysIncrementalInline TestLateInlining 86 1 n jdk.internal.vm.Continuation::enterSpecial (native) (static) 86 2 n jdk.internal.vm.Continuation::doYield (native) (static) 92 3 b TestLateInlining::test1 (4 bytes) @ 0 TestLateInlining::inlined1 (1 bytes) late inline I think it would be nice to preserve the "inline (hot)" part of the first input as it's the reason for inlining. There can be other reason for inlining (not many from a quick look at the code) but, who knows, there could be more in the future. Also, having a test case would be useful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2459009598 From shade at openjdk.org Wed Nov 6 09:18:02 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 6 Nov 2024 09:18:02 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. Looks fine. It is fairly cryptic why `43` is the default :) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21921#pullrequestreview-2417645031 From tholenstein at openjdk.org Wed Nov 6 09:18:02 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 09:18:02 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag Message-ID: Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. ------------- Commit messages: - JDK-8331727: Increase upper limit of LoopOptsCount flag Changes: https://git.openjdk.org/jdk/pull/21921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321997 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21921/head:pull/21921 PR: https://git.openjdk.org/jdk/pull/21921 From mli at openjdk.org Wed Nov 6 09:18:32 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 09:18:32 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Hey, so what's the conclusion for now? I'm fine with @RealFYang 's proposal above, how do you think? @robehn product(bool, UseRVC, false, "Use RVC instructions") \ product(bool, UseRVV, false, "Use RVV instructions") \ product(bool, UseZba, false, "Use Zba instructions") \ product(bool, UseZbb, false, "Use Zbb instructions") \ product(bool, UseZbs, false, "Use Zbs instructions") \ product(bool, UseZfh, false, "Use Zfh instructions") ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2459086263 From rcastanedalo at openjdk.org Wed Nov 6 09:20:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 6 Nov 2024 09:20:38 GMT Subject: RFR: 8339303: C2: dead node after failing to match cloned address expression [v2] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 16:39:49 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Use DUIterator_Fast to traverse node outputs > > Looks good. Yes, it looks like code expect LShift here instead of constant. Thanks for reviewing, @vnkozlov and @iwanowww! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21829#issuecomment-2459086820 From rcastanedalo at openjdk.org Wed Nov 6 09:20:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 6 Nov 2024 09:20:39 GMT Subject: Integrated: 8339303: C2: dead node after failing to match cloned address expression In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:53:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset prevents the x86 platform-specific logic from cloning address expressions consisting of two chained `AddP` nodes with a small constant offset each, such as in the following example: > > ![example](https://github.com/user-attachments/assets/86c143a1-3895-4e0c-936b-0d22b7c80e73) > > Such patterns cannot be fully subsumed into x86 complex addressing modes, and cloning them can cause the matcher to introduce dead nodes that trigger a segmentation fault in the subsequent global code motion phase. See a detailed analysis of the failure in the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8339303). > > The changeset additionally extends the post-matching verification logic to check that no old node is reachable by travesing both node inputs and outputs. This extension would have caused the original test case to fail directly after matching with an informative assertion message rather than an opaque segmentation fault in an unrelated code generation phase. > > Note that the pattern causing the failure should be in general optimized by `AddPNode::Ideal` into a single `AddP` node with the constant sum of the offsets. While [JDK-8343067](https://bugs.openjdk.org/browse/JDK-8343067) should address the missing optimization, this changeset proposes a complementary solution that is easily backportable and avoids relying on specific optimizations for correctness. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. This pull request has now been integrated. Changeset: 83f3d42d Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/83f3d42d6bcefac80449987f4d951f8280eeee3a Stats: 73 lines in 3 files changed: 67 ins; 3 del; 3 mod 8339303: C2: dead node after failing to match cloned address expression Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21829 From chagedorn at openjdk.org Wed Nov 6 09:50:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 09:50:30 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. Looks good! I'm also curious what the story behind 43 is :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21921#pullrequestreview-2417728600 From chagedorn at openjdk.org Wed Nov 6 09:51:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 09:51:32 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2459156266 From galder at openjdk.org Wed Nov 6 10:19:03 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Nov 2024 10:19:03 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Looking into the formatting errors ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2459129498 From galder at openjdk.org Wed Nov 6 10:19:03 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 6 Nov 2024 10:19:03 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic Message-ID: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. ------------- Commit messages: - Fix formatting - Fix more formatting issues - Fix formatting - Add test that replicates issue Changes: https://git.openjdk.org/jdk/pull/21920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326369 Stats: 90 lines in 1 file changed: 90 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From thartmann at openjdk.org Wed Nov 6 11:34:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 11:34:33 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java line 1: > 1: /** The copyright header is missing. test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java line 3: > 1: /** > 2: * @test > 3: * bug This should be `@bug 8326369` right? ------------- PR Review: https://git.openjdk.org/jdk/pull/21920#pullrequestreview-2418023116 PR Review Comment: https://git.openjdk.org/jdk/pull/21920#discussion_r1830863161 PR Review Comment: https://git.openjdk.org/jdk/pull/21920#discussion_r1830862930 From roland at openjdk.org Wed Nov 6 12:39:31 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 12:39:31 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2418162085 From chagedorn at openjdk.org Wed Nov 6 12:39:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 12:39:32 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2459641384 From roland at openjdk.org Wed Nov 6 14:51:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 14:51:40 GMT Subject: Integrated: 8343068: C2: CastX2P Ideal transformation not always applied In-Reply-To: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> References: <5k9Zq9E-2OJg0raPGHS1t_45N7QguwAToBfZiCfPjlY=.f41fd65a-7ac1-4886-a801-16753b864a2e@github.com> Message-ID: On Fri, 25 Oct 2024 14:09:48 GMT, Roland Westrelin wrote: > The transformation: > > > (CastX2P (AddL base i)) -> (AddP (CastX2P base) i) > > > when i fits in an int is not always applied: when the type of `i` is > narrowed so it fits in an int, the `CastX2P` is not enqueued for > igvn. This can get in the way of vectorization as shown by test case > `test2`. This pull request has now been integrated. Changeset: 57c3bb60 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/57c3bb6091f8ba0caced6f5ecf21dc998ffeee9f Stats: 93 lines in 3 files changed: 93 ins; 0 del; 0 mod 8343068: C2: CastX2P Ideal transformation not always applied Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21714 From roland at openjdk.org Wed Nov 6 14:53:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 14:53:38 GMT Subject: Integrated: 8341834: C2 compilation fails with "bad AD file" due to Replicate In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 08:30:15 GMT, Roland Westrelin wrote: > Superword creates a `Replicate` node at a `ConvL2I` node and uses the > type of the result of the `ConvL2I` to pick the type of the > `Replicate` instead of the type of the input to the `ConvL2I`. This pull request has now been integrated. Changeset: 72a45ddb Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/72a45ddbad9c343200197348ccfcf74105e6fefa Stats: 55 lines in 2 files changed: 54 ins; 0 del; 1 mod 8341834: C2 compilation fails with "bad AD file" due to Replicate Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21660 From tholenstein at openjdk.org Wed Nov 6 14:58:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 14:58:22 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand Message-ID: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> color pick nodes Adds new option to IGV to color selected nodes: 1) select some nodes 2) `Ctrl + C` or `View` -> `Color action` 3) pick a color and apply ------------- Commit messages: - Update ColorAction.java - JDK-8343535: IGV: Colorize nodes on demand Changes: https://git.openjdk.org/jdk/pull/21925/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343535 Stats: 109 lines in 6 files changed: 105 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Wed Nov 6 14:58:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 6 Nov 2024 14:58:23 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply That's a nice feature! Works as expected on Linux with the short cut. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2418178425 From rehn at openjdk.org Wed Nov 6 16:10:31 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 6 Nov 2024 16:10:31 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Yes, I'm fine with that. Just so we try to keep somekind of common thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2460188120 From mdoerr at openjdk.org Wed Nov 6 16:23:49 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 6 Nov 2024 16:23:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling Message-ID: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. ------------- Commit messages: - 8343724: [PPC64] Disallow OptoScheduling Changes: https://git.openjdk.org/jdk/pull/21935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343724 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From duke at openjdk.org Wed Nov 6 16:31:34 2024 From: duke at openjdk.org (duke) Date: Wed, 6 Nov 2024 16:31:34 GMT Subject: RFR: 8343214: Fix encoding errors in APX New Data Destination Instructions Support [v7] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:52:41 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) >> >> The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update opcodes for load based operations @vamsi-parasa Your change (at version bca87165b26116dd832b5e6b700cdaa89fa1f17e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21770#issuecomment-2460243436 From sparasa at openjdk.org Wed Nov 6 16:44:33 2024 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 6 Nov 2024 16:44:33 GMT Subject: Integrated: 8343214: Fix encoding errors in APX New Data Destination Instructions Support In-Reply-To: References: Message-ID: <5DFO_wKi6Se-c4sbZjl5XG7TdiQcqw9UlE6RQFcgyog=.f04f6ac2-a7a4-4e3a-8190-ab07fa0348cb@github.com> On Tue, 29 Oct 2024 17:19:20 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to fix instruction encoding errors for some of the instructions which support the APX New Data Destination (NDD) and No Flags (NF) features (added in [JDK-8329035](https://bugs.openjdk.org/browse/JDK-8329035)) > > The correctness of the encoding is verified by comparing to the reference encoding by GCC using an automated tool (https://github.com/openjdk/jdk/pull/21795) This pull request has now been integrated. Changeset: c0e6c3b9 Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/c0e6c3b93c0d21debc538e0135805c2957053108 Stats: 72 lines in 1 file changed: 28 ins; 1 del; 43 mod 8343214: Fix encoding errors in APX New Data Destination Instructions Support Reviewed-by: jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21770 From kvn at openjdk.org Wed Nov 6 17:32:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 6 Nov 2024 17:32:27 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21918#pullrequestreview-2419001131 From jbhateja at openjdk.org Wed Nov 6 17:39:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:22 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction Message-ID: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- Sierra Forest :- ============ Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 1299.407 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 504.995 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 327.544 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 160.963 ops/ms Granite Rapids:- ============= Baseline:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2279.099 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1148.609 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 570.848 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 268.872 ops/ms With Optimization:- Benchmark (SIZE) Mode Cnt Score Error Units VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 2612.484 ops/ms VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 1308.187 ops/ms VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 653.375 ops/ms VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 316.182 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Removing target specific hooks - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Review resoultions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code - Review resolutions - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Changes: https://git.openjdk.org/jdk/pull/21244/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341137 Stats: 528 lines in 7 files changed: 527 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:27 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:27 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... > Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not full proof for above 4 patterns. Current patch takes care of this limitation. I think this is a good point. I've taken a look at the patch and added some comments below. Hmm, do you think this pattern could be matched in the ad-files instead of the middle end? I think that might be a lot cleaner since the backend already has systems for matching node trees, which could avoid a lot of the complexity here. I think it could make the patch a lot smaller and simpler. For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) BTW, from the last conversation I had started working on PhaseLowering myself, you can see my work so far on my branch: https://github.com/jaskarth/jdk/tree/phase-lowering. I think I can publish an RFE in the coming two or three days (there were some optimizations and cleanup I was prototyping, I will remove them before sending a PR.) Do you think we should continue with my branch or do you want to approach the problem from a different way? Just want to check again to make sure we don't end up re-doing the same work :) src/hotspot/cpu/x86/matcher_x86.hpp line 184: > 182: // Does the CPU supports doubleword multiplication with quadword saturation. > 183: static constexpr bool supports_double_word_mult_with_quadword_staturation(void) { > 184: return true; Should this be `UseAVX > 0`? I'm wondering since we have a `MulVL` rule that applies when `UseAVX == 0`. src/hotspot/share/opto/vectornode.cpp line 2089: > 2087: if (Matcher::supports_double_word_mult_with_quadword_staturation() && > 2088: !is_mult_lower_double_word()) { > 2089: auto is_clear_upper_double_word_uright_shift_op = [](const Node *n) { Suggestion: auto is_clear_upper_double_word_uright_shift_op = [](const Node* n) { src/hotspot/share/opto/vectornode.cpp line 2093: > 2091: n->in(2)->Opcode() == Op_RShiftCntV && n->in(2)->in(1)->is_Con() && > 2092: n->in(2)->in(1)->bottom_type()->isa_int() && > 2093: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32L; Suggestion: n->in(2)->in(1)->bottom_type()->is_int()->get_con() == 32; Since you are comparing with a `TypeInt` I think this shouldn't be `32L`. src/hotspot/share/opto/vectornode.cpp line 2098: > 2096: auto is_lower_double_word_and_mask_op = [](const Node *n) { > 2097: if (n->Opcode() == Op_AndV) { > 2098: Node *replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) Suggestion: Node* replicate_operand = n->in(1)->Opcode() == Op_Replicate ? n->in(1) src/hotspot/share/opto/vectornode.cpp line 2124: > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > 2123: if ((is_lower_double_word_and_mask_op(in(1)) || > 2124: is_lower_double_word_and_mask_op(in(1)) || `is_lower_double_word_and_mask_op(in(1)) || is_lower_double_word_and_mask_op(in(1))` is redundant, right? Shouldn't you only need it once? Same for the other 3 calls, which are similarly repeated. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 41: > 39: */ > 40: > 41: public class VectorMultiplyOpt { Could it be possible to also do IR verification in this test? It would be good to check that we don't generate `AndVL` or `URShiftVL` with this transform. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 43: > 41: public class VectorMultiplyOpt { > 42: > 43: public static long [] src1; Suggestion: public static long[] src1; And for the rest of the `long []` in this file too. test/micro/org/openjdk/bench/jdk/incubator/vector/VectorXXH3HashingBenchmark.java line 39: > 37: @Param({"1024", "2048", "4096", "8192"}) > 38: private int SIZE; > 39: private long [] accumulators; Suggestion: private long[] accumulators; ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2367683334 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407658405 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411538179 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414553899 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422700344 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800159123 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153755 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153568 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800153842 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800151177 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800167403 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800165261 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1800169840 From jbhateja at openjdk.org Wed Nov 6 17:39:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:27 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Hi @iwanowww , @sviswa7, @merykitty, Can you kindly review this. I re-evaluated the solution and feel that lowering pass will compliment such transformation, specially in light of re-wiring logic to directly feed the pattern inputs to Multiplier, while x86 VMULUDQ expects to operate on lower doubleword of each quadword lane, AARCH64 SVE has instructions which considers upper doubleword of quadword multiplier and multiplicand and hence can optimize following pattern too ` MulVL ( SRC1 << 32 ) * ( SRC2 << 32 ) ` https://www.felixcloutier.com/x86/pmuludq https://dougallj.github.io/asil/doc/umullt_z_zz_32.html I am in process of introducing a PhaseLowering which will have target specific IR transformations for nodes of interest, till then moving the PR to draft stage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2401895553 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422634178 From qamai at openjdk.org Wed Nov 6 17:39:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Another approach is to do similarly to `MacroLogicVNode`. You can make another node and transform `MulVL` to it before matching, this is more flexible than using match rules. I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering`. It can be used to do e.g split `ExtractI` into the 128-bit lane extraction and the element extraction from that lane. This allows us to do `GVN` on those and `v.lane(5) + v.lane(7)` can be compiled nicely as: vextracti128 xmm0, ymm1, 1 pextrd eax, xmm0, 1 // vextracti128 xmm0, ymm1, 1 here will be gvn-ed pextrd ecx, xmm0, 3 add eax, ecx Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. The issues I have with this patch are that: - It convolutes the graph with machine-dependent nodes early in the compiling process. - It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407793168 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414491182 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421157206 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:29 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 11 Oct 2024 16:54:23 GMT, Quan Anh Mai wrote: > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2407821557 From jbhateja at openjdk.org Wed Nov 6 17:39:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan wrote: > > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` > > I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693 From qamai at openjdk.org Wed Nov 6 17:39:30 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:30 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <2g_Hm5UuVBqoklekkaxtnYn05JYKmosnzaMefQi_q3s=.916470fa-352d-410c-b187-f6453bb53630@github.com> On Mon, 14 Oct 2024 12:12:58 GMT, Jatin Bhateja wrote: >>> I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > >> > I am having a similar idea that is to group those transformations together into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying to implement the BMI1 instruction `bextr` for better performance with bit masks, but ran into a problem where it doesn't have an immediate encoding so we'd need to manifest a constant into a temporary register every time. With an (x86-specific) ideal node, we could simply let the register allocator handle placing the constant. It would also be nice to avoid needing to put similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > > Hey @jaskarth , @merykitty , we already have an infrastructure where during parsing we create Macro Nodes which can be lowered / expanded to multiple IRs nodes during macro expansion, what we need in this case is a target specific IR pattern check since not all targets may support 32x32 multiplication with quadword saturation, idea is to avoid creating a new IR and piggyback needed information on existing MulVL IR, we already use such tricks for relaxed unsafe reductions. Going forward, infusion of KnownBits into our data flow analysis infrastructure will streamline such optimizations, this patch is performing point optimization for specific set of constrained multiplication patterns. @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411389030 From jbhateja at openjdk.org Wed Nov 6 17:39:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:31 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Mon, 14 Oct 2024 15:04:54 GMT, Jasmine Karthikeyan wrote: > For the record I think in this PR we could simply match the IR patterns in the ad file, since (from my understanding) the patterns we are matching could be supported there. We should do platform-specific lowering in a separate patch because it is pretty nuanced, and we could potentially move it to the new system afterwards. Hi @jaskarth , Bigger pattern matching is sensitive to [IR level node sharing](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/matcher.cpp#L1724), thus it may not be full proof for above 4 patterns. Current patch takes care of this limitation. > @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. Hi @merykitty, I see some scope of refactoring and carving out a separate target specific lowering pass going forward, I have brough this up in past too. Existing optimizations are in line with current infrastructure and guards target specific optimizations with target specific match_rule_supported checks e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.cpp#L2898. As @jaskarth suggests we can pick this up going forward. > BTW, from the last conversation I had started working on PhaseLowering myself, you can see my work so far on my branch: https://github.com/jaskarth/jdk/tree/phase-lowering. I think I can publish an RFE in the coming two or three days (there were some optimizations and cleanup I was prototyping, I will remove them before sending a PR.) Do you think we should continue with my branch or do you want to approach the problem from a different way? Just want to check again to make sure we don't end up re-doing the same work :) Hi @jaskarth , Please add PhaseLowering skeleton code only and then we can add applicable lowering transforms in seperate patches e.g . I volenteer to move x86 side lowering transforms like MacroLogic Optimization along with this doubleword multiplication pass. We need to carefully take such decisions keeping in the view the code duplication aspects, so only very specific IR transforms should be lowered, common transforms should still be part of shared code. Let me know if you have any concerns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411884206 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2422981643 From vlivanov at openjdk.org Wed Nov 6 17:39:32 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:32 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Some time ago, there was a relevant experiment to optimize vectorized Poly1305 implementation by utilizing VPMULDQ instruction on x86 (see [JDK-8219881](https://bugs.openjdk.org/browse/JDK-8219881) for details). The implementation used int-to-long vector casts and produced the following IR shape: `MulVL (VectorCastI2X src1) (VectorCastI2X src2)`. Does it make sense to cover it as part of this particular enhancement? IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. Also, I briefly looked at #21599 in the context of this particular enhancement, but still don't see how it can improve the situation (except input rewiring part) and not simply duplicate what matcher already does well. src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) > 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) > 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2412582542 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421529658 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2436531693 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805886268 From qamai at openjdk.org Wed Nov 6 17:39:33 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:33 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 17:00:26 GMT, Jasmine Karthikeyan wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2414605470 From jbhateja at openjdk.org Wed Nov 6 17:39:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:33 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 00:28:25 GMT, Vladimir Ivanov wrote: > MulVL (VectorCastI2X src1) (VectorCastI2X src2 It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, thus we may not be able to neglect partial products of upper doublewords while performing 64x64 bit multiplication. Existing patterns guarantees clearing of upper double words thereby result computation only depends on lower doubleword multiplication. > Personally, I think this optimization is not essential, so we should proceed with introducing lowering first, then add this transformation to that phase, instead of trying to integrate this transformation then refactor it into phase lowering, which seems like a net extra step. I think we should not block inflight patches in anticipation of new refactoring. We can always tune it later. > I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? > > About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) It will be good to float an RFP with some use-cases upfront before development. As @jaskarth pointed out some vectorization improvements. > IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. Hi @iwanowww , I have implemented additional pattern you suggested. In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. Hi @jaskarth , @merykitty , As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420384086 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2423716135 From vlivanov at openjdk.org Wed Nov 6 17:39:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja wrote: >> MulVL (VectorCastI2X src1) (VectorCastI2X src2) > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? >> IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. > > Hi @iwanowww , > I have implemented additional pattern you suggested. > In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. > > Hi @jaskarth , @merykitty , > As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. Thanks, @jatin-bhateja. I took a look at the latest version and still think that IGVN is not the best place for it. First of all, flags on MulVL feel too adhoc and irregular. The original IR structure is still there (except the cases when inputs are rewired), so can be easily recomputed on demand. I noticed that the patterns can be generalized: what matters is whether upper half is filled with zeros/sign bits or not, so small enough masks (and large enough shifts) are amenable to the same optimization. But, in such case, input rewiring becomes applicable only to particular constant inputs. (BTW signed right shifts can be optimized in a similar way, since they populate upper half with the sign-bit.) So, IMO the best way to move this particular enhancement forward is: * perform the transformation during matching; * match a single MulVL node and shape the checks on argument shape as predicates on AD instructions * setting lower instruction costs should tell the matcher to prefer new specific instructions over generic ones; * avoid input rewiring for now (VPMULDQ/VPMULUDQ give enough performance improvement on its own). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420668490 PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2436528498 From vlivanov at openjdk.org Wed Nov 6 17:39:34 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 15 Oct 2024 17:26:49 GMT, Quan Anh Mai wrote: >> I'm pretty ambivalent, I think implementing it either way would be alright. Especially with unit tests, I think the lowering implementation wouldn't be that difficult. Maybe another reviewer has an opinion? >> >> About PhaseLowering though, I've found some more interesting things we could do with it, especially with improving vectorization support in the backend. @merykitty have you already started to work on it? I was thinking about prototyping it soon. Just wanted to make sure we're not doing the same work twice :) > > @jaskarth Please proceed with it, I have a really simple prototype for it but I don't have any plan to proceed further soon. Thanks a lot :) @merykitty The approach @jatin-bhateja proposes looks well-justified to me. Matching is essentially a lowering step which transforms platform-independent Ideal IR into platform-specific Mach IR. And collapsing non-trivial IR trees into platform-specific instructions is a well-established pattern in the code. Indeed, there are some constraints matching imposes, so it may not be flexible enough to cover all use cases. In particular, for `VPTERNLOGD`/`VPTERNLOGQ` it was decided it's worth the effort to handle them specially (see `Compile::optimize_logic_cones()`). As it is implemented now, it's part of the shared code, but if there's platform-specific custom lowering phase available one day, it can be moved there, of course. But speaking of `VPMULDQ`/`VPMULUDQ`, what kind of benefits do you see from custom logic to support them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420732705 From jbhateja at openjdk.org Wed Nov 6 17:39:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:34 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 17 Oct 2024 21:53:16 GMT, Vladimir Ivanov wrote: > > > MulVL (VectorCastI2X src1) (VectorCastI2X src2) > > > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... > > Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one (32x32->64). Am I missing something important here? @iwanowww , Agree!, I missed noticing that you were talking about **VPMULDQ**, its a signed doubleword multiplier with quadword saturation, so it should be ok to include suggested pattern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421132055 From jbhateja at openjdk.org Wed Nov 6 17:39:36 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:36 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 02:41:47 GMT, Quan Anh Mai wrote: > The issues I have with this patch are that: > > * It convolutes the graph with machine-dependent nodes early in the compiling process. MulVL is a machine independent IR, we create a machine dependent IR post matching. > * It overloads `MulVL` with alternative behaviours, it is fine now as we do not perform much analysis on this node but it would be problematic later. I think it is more preferable to have a separate IR node for this like `MulVLowIToLNode`, or have this transformation be done only just before matching, or both. I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421300738 From vlivanov at openjdk.org Wed Nov 6 17:39:36 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:36 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <34KZVRjCMAl5-KAG6hLnJUe2RZF2fThQAWuresTL5Pk=.a797f2d0-2915-4175-8c7c-3381fdc578cb@github.com> On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > It convolutes the graph with machine-dependent nodes early in the compiling process. Ah, I see your point now! I took a closer look at the patch and indeed `MulVLNode::_mult_lower_double_word` with `MulVLNode::Ideal()` don't look pretty. @jatin-bhateja why don't you turn the logic it into match rules instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421372120 From qamai at openjdk.org Wed Nov 6 17:39:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421376285 From vlivanov at openjdk.org Wed Nov 6 17:39:37 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:05:16 GMT, Quan Anh Mai wrote: > The issue is that a node is not immutable. I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421412061 From qamai at openjdk.org Wed Nov 6 17:39:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:37 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. @iwanowww IMO there are 2 ways to view this: - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421441405 From jbhateja at openjdk.org Wed Nov 6 17:39:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:16:04 GMT, Vladimir Ivanov wrote: >>> I see this is as a twostep optimization, in the first step we do analysis and annotate additional information on existing IR, which is later used by instruction selector. I plan to subsume first stage with enhanced dataflow analysis going forward. >> >> The issue is that a node is not immutable. This puts a burden on every place to keep the annotation sane when doing transformations, which is easily missed when there are a lot of kinds of `Node`s out there. That's why I think it is most suitable to be done only right before matching. `Node::Ideal` is invoked in a really generous manner so I would prefer not to add analysis to it that can be done more efficiently somewhere else. Additionally, if you have a separate IR node for this operation, you can do some more beneficial transformations such as `MulVL (AndV x max_juint) (AndV y max_juint)` into `MulVLowIToL x y`. >> >> My suggestions are based on this PR as a standalone, so they may not be optimal when looking at a wider perspective, in case you think this approach would fit more nicely into a larger landscape of your planned enhancements please let us know. Thanks for your patience. > >> The issue is that a node is not immutable. > > I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. Hi @iwanowww , @merykitty , I am in process of addressing all your concerns. I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421448784 From vlivanov at openjdk.org Wed Nov 6 17:39:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:38 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > @iwanowww IMO there are 2 ways to view this: > > - You can see a `MulVL` nodes with `_mult_lower_double_word` being an entirely different kind of nodes which do a different thing (a.k.a throw away the upper bits and only multiply the lower bits), in this case it is a machine-dependent IR node hiding behind the opcode of `MulVL` and changing the inputs of it is not worrying because the node does not care about that anyway, its semantics is predetermined already. > - Or you can see `_mult_lower_double_word` being an annotation that adds information to `MulVL`, which means it is still a `MulVL` but annotated with information saying that all upper bits of the operands are 0. I think this is Jatin's point of view right now. The issue here would be to keep the annotation sane when the node inputs may be changed. @merykitty I was under an erroneous impression that `MulVL::Ideal()` folds operands of particular shapes into `MulVL::_mult_lower_double_word == true`. Now I see it's not the case. Indeed, what `MulVL::Ideal()` does is it caches the info about operand shapes in `MulVL::_mult_lower_double_word` which introduces unnecessary redundancy. I doubt it is possible for IR to diverge so much (through a sequence of equivalent transformations) that the bit gets out of sync (unless there's a bug in compiler or a paradoxical situation in effectively dead code occurs). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421504978 From qamai at openjdk.org Wed Nov 6 17:39:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:39 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <8p95gYaAnNAIfqVBosZgvMMCVhHn2M0fQx7FLLgCn9U=.852c7aef-327c-4c2f-a591-0efde9ccc2e6@github.com> On Fri, 18 Oct 2024 05:42:21 GMT, Jatin Bhateja wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNode::Ideal()` does.) But I agree with you that a dedicated ideal node type (e.g., `MulVI2L`) is much cleaner than `MulVLNode::_mult_lower_double_word`. Still, I'd prefer to see the logic confined in matcher-related code instead. > > Hi @iwanowww , @merykitty , I am in process of addressing all your concerns. > > I still feel idealization is the right place to execute this pattern detection, we just need to re-wire the effective inputs bypassing doubleword clearing logic to newly annotated MulVL node and allow clearing IR to sweepout during successive passes, moving it to final graph reshaping just before instruction selection will prevent dead IR cleanups. @jatin-bhateja I think you can do it at the same place as `Compile::optimize_logic_cones`, we do perform IGVN there. Unless you think this information is needed early in the compiling process, currently I see it is used during matching only, which makes it unnecessary to repeatedly checking it in `Node::Ideal` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2421519087 From jkarthikeyan at openjdk.org Wed Nov 6 17:39:39 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 6 Nov 2024 17:39:39 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sat, 19 Oct 2024 09:25:12 GMT, Jatin Bhateja wrote: >> IMO until C2 type system starts to track bitwise constant information ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there are not enough benefits to rely on IGVN here. So far, all the discussed patterns are simple enough for matcher to handle them without too much tweaking. > > Hi @iwanowww , > I have implemented additional pattern you suggested. > In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper doubleword clearing logic in applicable scenarios. > > Hi @jaskarth , @merykitty , > As discussed, waiting on PhaseLowering skeleton to move some part of this patch to x86 specific lowering pass. Hi @jatin-bhateja, I've opened a PR for the new pass here: #21599. I've added just the skeleton code, like you suggested. Let me know what you think! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2425542577 From vlivanov at openjdk.org Wed Nov 6 17:39:40 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:40 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 24 Oct 2024 23:47:29 GMT, Vladimir Ivanov wrote: > So, IMO the best way to move this particular enhancement forward is: ... @jatin-bhateja here's a sketch (not tested): https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:pr/21244 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2455955390 From jbhateja at openjdk.org Wed Nov 6 17:39:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 6 Nov 2024 17:39:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <0fLBeJHlkgf0PnTP6gnbYeZ2P7yEceS1MW5oSf3q43s=.25320887-be06-46b1-919c-f3d25d46c039@github.com> On Tue, 5 Nov 2024 00:07:51 GMT, Vladimir Ivanov wrote: >> Thanks, @jatin-bhateja. I took a look at the latest version and still think that IGVN is not the best place for it. >> >> First of all, flags on MulVL feel too adhoc and irregular. The original IR structure is still there (except the cases when inputs are rewired), so can be easily recomputed on demand. >> >> I noticed that the patterns can be generalized: what matters is whether upper half is filled with zeros/sign bits or not, so small enough masks (and large enough shifts) are amenable to the same optimization. But, in such case, input rewiring becomes applicable only to particular constant inputs. >> >> (BTW signed right shifts can be optimized in a similar way, since they populate upper half with the sign-bit.) >> >> So, IMO the best way to move this particular enhancement forward is: >> * perform the transformation during matching; >> * match a single MulVL node and shape the checks on argument shape as predicates on AD instructions >> * setting lower instruction costs should tell the matcher to prefer new specific instructions over generic ones; >> * avoid input rewiring for now (VPMULDQ/VPMULUDQ give enough performance improvement on its own). > >> So, IMO the best way to move this particular enhancement forward is: ... > > @jatin-bhateja here's a sketch (not tested): https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:pr/21244 Hi @iwanowww , Thanks for refactoring! your suggestions are included. Points in favor of the current approach:- - Patch strength reduces 15 cycles full quadword multiplier to 5 cycles double word multiplier with quadword saturation. - IR remains target independent, we are not directly forwarding the pattern inputs to the multiplier, such rewiring is only possible when we mask out the upper double word of inputs, for other cases like right shifting (logical) inputs by 32 or upcasting integral to long lanes we still need to emit the input preparation/formatting instruction sequence. - Patch shows performance improvement on both E and P core Xeons. Following are the performance number for include micro benchmarks. ![image](https://github.com/user-attachments/assets/6a19181a-7f55-4cd8-9dfb-23dd4c786428) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2459806910 From qamai at openjdk.org Wed Nov 6 17:39:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:41 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:35:28 GMT, Vladimir Ivanov wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) > > I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805887594 From qamai at openjdk.org Wed Nov 6 17:39:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 6 Nov 2024 17:39:42 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:37:16 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // MulL (URShift SRC1 , 32) (URShift SRC2, 32) >>> 2121: // MulL (URShift SRC1 , 32) ( And SRC2, 0xFFFFFFFF) >>> 2122: // MulL ( And SRC1, 0xFFFFFFFF) (URShift SRC2 , 32) >> >> I don't understand how it works... According to the documentation, `VPMULDQ`/`VPMULUDQ` consume vectors of double words and produce a vector of quadwords. But it looks like `SRC1`/`SRC2` are always vectors of longs (quadwords). And `vmuludq_reg` in `x86.ad` just takes the immedate operands and pass them into `vpmuludq` which doesn't look right... > > `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) DEST[63:0] := SRC1[31:0] * SRC2[31:0] DEST[127:64] := SRC1[95:64] * SRC2[95:64] DEST[191:128] := SRC1[159:128] * SRC2[159:128] DEST[255:192] := SRC1[223:192] * SRC2[223:192] DEST[MAXVL-1:256] := 0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805888984 From vlivanov at openjdk.org Wed Nov 6 17:39:42 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:42 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai wrote: >> `vpmuludq` does a long multiplication but throws away the upper bits of the operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) > DEST[63:0] := SRC1[31:0] * SRC2[31:0] > DEST[127:64] := SRC1[95:64] * SRC2[95:64] > DEST[191:128] := SRC1[159:128] * SRC2[159:128] > DEST[255:192] := SRC1[223:192] * SRC2[223:192] > DEST[MAXVL-1:256] := 0 Got it. Now it makes perfect sense. Thanks for the clarifications! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805894106 From vlivanov at openjdk.org Wed Nov 6 17:39:43 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 6 Nov 2024 17:39:43 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov wrote: >> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq >> >> VPMULUDQ (VEX.256 Encoded Version)[ ?](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) >> DEST[63:0] := SRC1[31:0] * SRC2[31:0] >> DEST[127:64] := SRC1[95:64] * SRC2[95:64] >> DEST[191:128] := SRC1[159:128] * SRC2[159:128] >> DEST[255:192] := SRC1[223:192] * SRC2[223:192] >> DEST[MAXVL-1:256] := 0 > > Got it. Now it makes perfect sense. Thanks for the clarifications! Actually, it makes detecting the pattern during matching even simpler than I initially thought. Since there's no need to match any non-trivial ideal IR tree, AD instruction can just match a single `MulVL`, but detect operand shapes using a predicate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1805903273 From aph at openjdk.org Wed Nov 6 17:55:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Nov 2024 17:55:35 GMT Subject: Integrated: 8342540: InterfaceCalls micro-benchmark gives misleading results In-Reply-To: References: Message-ID: On Fri, 18 Oct 2024 11:53:06 GMT, Andrew Haley wrote: > `InterfaceCalls.java` makes highly predictable memory accesses, which leads to a gross time underestimate of the case where a megamorphic access is unpredictable. > > Here's one example, with and without randomization. The unpredictable megamorphic call takes more than 4* as long as the benchmark. > > > Benchmark (randomized) Mode Cnt Score Error Units > InterfaceCalls.test2ndInt3Types false avgt 4 5.013 ? 0.081 ns/op > InterfaceCalls.test2ndInt3Types true avgt 4 23.421 ? 0.102 ns/op > ``` > > This patch adds the "randomized" parameter, which allows the measurement of predictable and unpredictable megamorphic calls. This pull request has now been integrated. Changeset: 78b378ad Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/78b378ad03d0f6c85468ac208e84fabea79fc7de Stats: 34 lines in 1 file changed: 22 ins; 6 del; 6 mod 8342540: InterfaceCalls micro-benchmark gives misleading results Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21581 From kvn at openjdk.org Wed Nov 6 18:12:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 6 Nov 2024 18:12:34 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Would be nice to have simple test for this. ------------- PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2419083858 From mli at openjdk.org Wed Nov 6 18:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 18:42:04 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: turn more verified extensions as DIAGNOSTIC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21885/files - new: https://git.openjdk.org/jdk/pull/21885/files/4b41bb91..e5bd3eef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21885&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21885/head:pull/21885 PR: https://git.openjdk.org/jdk/pull/21885 From mli at openjdk.org Wed Nov 6 18:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 6 Nov 2024 18:42:04 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 16:07:58 GMT, Robbin Ehn wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Yes, I'm fine with that. Just so we try to keep somekind of common thread. @robehn Thanks for the confirmation, updated accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2460511903 From aturbanov at openjdk.org Wed Nov 6 18:57:29 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 6 Nov 2024 18:57:29 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java line 226: > 224: > 225: public void colorSelectedFigures(Color color) { > 226: for (Figure figure : model.getSelectedFigures()) { Suggestion: for (Figure figure : model.getSelectedFigures()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21925#discussion_r1831556933 From tholenstein at openjdk.org Wed Nov 6 20:30:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 6 Nov 2024 20:30:26 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/0fd894fd..17205bab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From dlong at openjdk.org Wed Nov 6 21:15:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 21:15:54 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 08:06:33 GMT, Tobias Hartmann wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > src/hotspot/share/opto/callGenerator.cpp line 734: > >> 732: } >> 733: C->set_inlining_progress(true); >> 734: C->set_do_cleanup(kit.stopped() || result->Opcode() == Op_VectorBox); // path is dead or vector box; needs cleanup > > This only triggers if the return value of the incrementally inlined method is a `VectorBox`, right? Is that sufficient? Could the `VectorBox` be hidden by another node? I'm failing to understand why this is only an issue with VectorBox. It doesn't feel quite right to be checking for a specific node type here. Maybe this should be something like needs_cleanup(result)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1831715637 From vlivanov at openjdk.org Thu Nov 7 00:03:42 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 7 Nov 2024 00:03:42 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 13:31:37 GMT, Damon Fenacci wrote: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > In order to fix this an extra cleanup has to be performed when we encounter a situation like the one above, i.e. when late inlining creates a `VectorBox`. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) The root cause of the bug is that type information obtained during inlining is not propagated until IGVN kicks in. Vector API is special here, because (1) it heavily relies on exact type information to perform intrinsification; and (2) vector intrinsics are processed during post-parse inlining. IMO the current fix (do cleanup when VectorBox is returned) is good enough as a stop-the-gap fix for Vector API issue (missed intrinsification opportunity). As an alternative fix, limited IGVN pass over `CastPP`/`CheckCastPP` users of result value may be enough to avoid full-blown cleanup. I suspect some other intrinsics may be susceptible to a similar issue, but in such case it would be more like a corner case (few intrinsics fail in rare conditions). A proper fix would be to re-examine failed intrinsics call site during IGVN and repeat intrinsifcation attempt when their inputs improve (akin to what is done in `CallStaticJavaNode::Ideal()`/`CallDynamicJavaNode::Ideal()`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2461038909 From vlivanov at openjdk.org Thu Nov 7 00:15:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 7 Nov 2024 00:15:41 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Please, reframe JDK-8326369 as an Enhancement to add a missing test case. Otherwise, it looks confusing. It's also fine to create new issue and close JDK-8326369 as a duplicate of JDK-8339299. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461058157 From swen at openjdk.org Thu Nov 7 00:47:58 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 7 Nov 2024 00:47:58 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> Message-ID: <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> On Mon, 4 Nov 2024 11:48:49 GMT, Emanuel Peter wrote: >> **Background** >> I am introducing the `MemPointer`, for enhanced pointer parsing. For now, it replaces the much more limited `ArrayPointer` in `MergeStores` (see https://github.com/openjdk/jdk/pull/16245), but eventually it is supposed to be used widely in optimizations for pointer analysis: adjacency, aliasing, etc. I also plan to refactor the `VPointer` from auto-vectorization with it, and unlock more pointer patterns that way - possibly including scatter/gather. >> >> **Details** >> >> The `MemPointer` decomposes a pointer into the form `pointer = con + sum_i(scale_i * variable_i)` - a linear form with a sum of variables and scale-coefficients, plus some constant offset. >> >> This form allows us to perform aliasing checks - basically we can check if two pointers are always at a constant offset. This allows us to answer many questions, including if two pointers are adjacent. `MergeStores` needs to know if two stores are adjacent, so that we can safely merge them. >> >> More details can be found in the description in `mempointer.hpp`. Please read them when reviewing! >> >> `MemPointer` is more powerful than the previous `ArrayPointer`: the latter only allows arrays, the former also allows native memory accesses, `Unsafe` and `MemorySegement`. >> >> **What this change enables** >> >> Before this change, we only allowed merging stores to arrays, where the store had to have the same type as the array element (`StoreB` on `byte[]`, `StoreI` on `int[]`). >> >> Now we can do: >> - Merging `Unsafe` stores to array. Including "mismatched size": e.g. `putChar` to `byte[]`. >> - Merging `Unsafe` stores to native memory. >> - Merging `MemorySegment`: with array, native, ByteBuffer backing types. >> - However: there is still some problem with RangeCheck smearing (a type of RC elimination) for the examples I have tried. Without RC's smeared, we can only ever merge 2 neighbouring stores. I hope we can improve this with better RangeCheck smearing. `MemorySegment` introduce `checkIndexL`, the long-variant of the RangeCheck. Normal array accesses only use the equivalent of `checkIndex`, the int-variant that we already optimize away much better. >> >> **Dealing with Overflows** >> >> We have to be very careful with overflows when dealing with pointers. For this, I introduced a `NoOverflowInt`. It allows us to do "normal" int operations on it, and tracks if there was ever an overflow. This way, we can do all overflow checks implicitly, and do not clutter the code with overflow-check... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more changes for Christian If it is not provided in the release image, users need to find the source code of the current version of JDK to build the fastdebug image to analyze whether the MergeStore optimization of a certain code works. I can understand that MergeStore may still need to be improved, so it cannot be used as a product feature, but this is a useful optimization and I hope it can be provided in the product eventually. I hope that TraceMergeStore can eventually be used in the release image like `PrintInlining` and become a tool for performance optimizers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2461092539 From swen at openjdk.org Thu Nov 7 01:11:42 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 7 Nov 2024 01:11:42 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: <0lmMa_RLLb-r3FaHFLY2zIIPcht-6Y000LW9CIDYJUc=.afca63f6-5326-40c1-9200-e87a07080dc3@github.com> On Wed, 6 Nov 2024 07:18:16 GMT, Emanuel Peter wrote: > > ```java > > "null".getBytes(0, 4, bytes4, off); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Is it possible to do MergeStore in this scenario? > > I don't know. What do the logs say? And what does it currently compile down to, i.e. what assembly instructions? > > Otherwise I think this update seems reasonable. My thinking is this: StringBuilder buf = new StringBuilder(); // ... buf.append("null"); The calling path is as follows: AbstractStringBuilder::append -> AbstractStringBuilder::putStringAt -> String::getBytes(byte[], int, byte) -> System::arraycopy In this scenario, if System::arraycopy can be optimized to use putInt or putLong, performance can be improved. It is similar in the String concatenation scenario String f(int i) { return "abcd" + i; } Here `StringConcatHelper::prepend(int, byte, byte[], String, String)` is called, and then `String::getBytes(byte[], int, byte) -> System::arraycopy` package java.lang; class StringConcatHelper { static int prepend(int index, byte coder, byte[] buf, String value, String prefix) { index -= value.length(); if (coder == String.LATIN1) { value.getBytes(buf, index, String.LATIN1); index -= prefix.length(); prefix.getBytes(buf, index, String.LATIN1); } else { value.getBytes(buf, index, String.UTF16); index -= prefix.length(); prefix.getBytes(buf, index, String.UTF16); } return index; } } Here is similar to the above, can we optimize "abcd".getBytes to putInt or putLong? In summary, can we optimize the System::arraycopy of a stable byte[] with a length of 4 to putInt? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2461114396 From fyang at openjdk.org Thu Nov 7 01:44:42 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 7 Nov 2024 01:44:42 GMT Subject: RFR: 8343555: RISC-V: make UseZvfh diagnostic option [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Please also update the JBS title to reflect the latest version, as we are targeting more options than a single UseZvfh. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2461144551 From haosun at openjdk.org Thu Nov 7 01:44:49 2024 From: haosun at openjdk.org (Hao Sun) Date: Thu, 7 Nov 2024 01:44:49 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: References: Message-ID: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> On Mon, 4 Nov 2024 13:41:34 GMT, Roland Westrelin wrote: >> Nice, thanks for the added comments! >> >> Do you know what JDK versions are affected? > >> Do you know what JDK versions are affected? > > The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. Hi @rwestrel My JBS account is inactive recently. Hence I'd like to report the bug here. I encountered the following error with `-XX:MaxVectorSize=8` on both AArch64 and x86_64. Could you help take a look at this issue? Thanks. Test command: make test JTREG="VM_OPTIONS=-XX:MaxVectorSize=8" TEST=test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java Error message: CompileCommand: compileonly TestReplicateAtConv.test bool compileonly = true # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/tmp/jdk-dev/src/hotspot/share/opto/type.cpp:2499), pid=1424540, tid=1424557 # assert(Matcher::vector_size_supported(elem_bt, length)) failed: length in range # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-git-63c19d3db58) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-git-63c19d3db58, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # V [libjvm.so+0x17bca30] TypeVect::make(BasicType, unsigned int, bool)+0x150 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/core.1424540) # # An error report file with more information is saved as: # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/hs_err_pid1424540.log # # Compiler replay data is saved as: # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/replay_pid1424540.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2461145117 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation Message-ID: Lazy computation of TypeFunc. Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) ------------- Commit messages: - extra space - inline accessor methods - Revert "mac build workaround" - final change - mac build workaround - init change Changes: https://git.openjdk.org/jdk/pull/21782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330851 Stats: 894 lines in 5 files changed: 619 ins; 31 del; 244 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) @dean-long can you take a look at these changes "Pre-submit test" for zero are not related. [build.sh][INFO] Downloading https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.8-bin.zip to /home/runner/work/jdk/jdk/jtreg/src/make/../build/deps/apache-ant-1.10.8-bin.zip Error: sh][ERROR] wget exited with exit code 4 Error: Process completed with exit code 1. Sorry for delay, I was out for long weekend. >For example, rename LockNode::lock_type() to LockNode::lock_type_init(), and have it save the result in a static const field. Then have LockNode::lock_type() simply return the field. But as you mentioned "the data field is `static const`". So we can't do assignment operation in the class itself. To do that we have to go outside the scope of class and do the definition part there. Or do you have another way in mind ? with that change I am getting this error: === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_run_ld: Undefined symbols for architecture arm64: "LockNode::_lock_type_tf", referenced from: GraphKit::shared_lock(Node*) in graphKit.o LockNode::lock_type_init() in type.o ld: symbol(s) not found for architecture arm64 clang++: error: linker command failed with exit code 1 (use -v to see invocation) Here is shorter version: class Temp { public: static const int* ptr; public: static void set_ptr() { const int *abs = new int(20); ptr = abs; } }; // Initialize static member; const int* Temp::ptr = nullptr; int main() { Temp::set_ptr(); cout << *Temp::ptr << endl; return 0; } If I comment out `const int* Temp::ptr = nullptr;` then I am getting the similar error as I pasted above which I got from the build failure. Here we might need to give the definition out of scope of the class. Another solution is making the data-field inline: class Temp { public: static inline const int* ptr = nullptr; public: static void set_ptr() { const int *abs = new int(20); ptr = abs; } }; // Initialize static member; //const int* Temp::ptr = nullptr; int main() { Temp::set_ptr(); cout << *Temp::ptr << endl; return 0; } Here If we mark `ptr` as inline variable that is also acceptable, though C++17 started accepting it, but hotspot code is throwing warning over there as well. I don't see any way through which we can shrink the code here; Though methods with `*_Type` can be derived from macro because all of them are doing same task i.e. checking for assert & returning the field. But not sure that's a good choice. Because it will sprinkle the macro everywhere. Overall I think code will became less intuitive and more error prone. const TypeFunc *OptoRuntime::athrow_Type() { assert(_athrow_tf != nullptr, "should be initialized"); return _athrow_tf; } But if you want this or have another idea, I am happy to give it try. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2446193033 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2446204918 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2453901207 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2456246788 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2458710642 From dlong at openjdk.org Thu Nov 7 04:32:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) It looks OK, but I think we are paying some overhead every time we try to get the TypeFunc, because C++ has to first check if it's the first time the function was called. Instead, how about getting rid of the lambda and make the initialization explicit?For example, rename LockNode::lock_type() to LockNode::lock_type_init(), and have it save the result in a static const field. Then have LockNode::lock_type() simply return the field. This is what I meant: diff --git a/src/hotspot/share/opto/callnode.hpp b/src/hotspot/share/opto/callnode.hpp index 2d3835b71ad..f72e78745b5 100644 --- a/src/hotspot/share/opto/callnode.hpp +++ b/src/hotspot/share/opto/callnode.hpp @@ -1190,9 +1190,11 @@ class AbstractLockNode: public CallNode { // 2 - a FastLockNode // class LockNode : public AbstractLockNode { + static const TypeFunc *_lock_type_tf; public: - static const TypeFunc *lock_type() { + static void lock_type_init() { + assert(_lock_type_tf == nullptr, "lock_type_init() already called"); // create input type (domain) const Type **fields = TypeTuple::fields(3); fields[TypeFunc::Parms+0] = TypeInstPtr::NOTNULL; // Object to be Locked @@ -1205,7 +1207,12 @@ class LockNode : public AbstractLockNode { const TypeTuple *range = TypeTuple::make(TypeFunc::Parms+0,fields); - return TypeFunc::make(domain,range); + _lock_type_tf = TypeFunc::make(domain,range); + } + + static const TypeFunc *lock_type() { + assert(_lock_type_tf != nullptr, "lock_type_init() not called"); + return _lock_type_tf; } virtual int Opcode() const; Nice work so far. I would suggest making _Type() accessors inlined, and try to reduce boiler-plate code with macros if possible (field name and accessor function name can both be derived from a common root, which is pretty common practice in HotSpot code). If you move all these accessor functions into the .hpp or .inline.hpp file, so they can be inlined, then I think the benefit of a macro will be come more apparent, but I won't insist. Let's see what other reviewers think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2450827514 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2455940131 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2458525693 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2461241654 From dlong at openjdk.org Thu Nov 7 04:32:14 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> References: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> Message-ID: On Tue, 5 Nov 2024 05:08:33 GMT, Amit Kumar wrote: > Here we might need to give the definition out of scope of the class. Yes. For example, in callnode.cpp: const TypeFunc *LockNode::_lock_type_tf = nullptr; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2456376861 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: <1EYmbEDooBKIFVBWjsqwBwbQyipe_g3pqA30V-o3lOY=.af345b94-532e-42ef-8665-95969ddd3e4e@github.com> Message-ID: <4tYIvAd_-1u5crvXKVFKrzVeMMqw2284G--OFy2PzcU=.ec8d4a1e-6907-4f67-9acd-2739beffd60a@github.com> On Tue, 5 Nov 2024 06:53:46 GMT, Dean Long wrote: >> with that change I am getting this error: >> >> === Output from failing command(s) repeated here === >> * For target hotspot_variant-server_libjvm_objs_BUILD_LIBJVM_run_ld: >> Undefined symbols for architecture arm64: >> "LockNode::_lock_type_tf", referenced from: >> GraphKit::shared_lock(Node*) in graphKit.o >> LockNode::lock_type_init() in type.o >> ld: symbol(s) not found for architecture arm64 >> clang++: error: linker command failed with exit code 1 (use -v to see invocation) >> >> >> Here is shorter version: >> >> class Temp { >> public: >> static const int* ptr; >> >> public: >> static void set_ptr() { >> const int *abs = new int(20); >> ptr = abs; >> } >> }; >> >> // Initialize static member; >> const int* Temp::ptr = nullptr; >> >> int main() { >> Temp::set_ptr(); >> cout << *Temp::ptr << endl; >> return 0; >> } >> >> >> If I comment out `const int* Temp::ptr = nullptr;` then I am getting the similar error as I pasted above which I got from the build failure. Here we might need to give the definition out of scope of the class. >> >> >> Another solution is making the data-field inline: >> >> class Temp { >> public: >> static inline const int* ptr = nullptr; >> >> public: >> static void set_ptr() { >> const int *abs = new int(20); >> ptr = abs; >> } >> }; >> >> // Initialize static member; >> //const int* Temp::ptr = nullptr; >> >> int main() { >> Temp::set_ptr(); >> cout << *Temp::ptr << endl; >> return 0; >> } >> >> >> Here If we mark `ptr` as inline variable that is also acceptable, though C++17 started accepting it, but hotspot code is throwing warning over there as well. > >> Here we might need to give the definition out of scope of the class. > > Yes. For example, in callnode.cpp: > > const TypeFunc *LockNode::_lock_type_tf = nullptr; @dean-long I have updated the patch, please have a look at the current changes :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2457088617 From amitkumar at openjdk.org Thu Nov 7 04:32:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:32:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 03:24:31 GMT, Dean Long wrote: >If you move all these accessor functions into the .hpp or .inline.hpp file, so they can be inlined, then I think the benefit of a macro will be come more apparent, but I won't insist. Let's see what other reviewers think. I have moved them, and marking changes ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2461294664 From amitkumar at openjdk.org Thu Nov 7 04:46:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 04:46:41 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 13:45:03 GMT, Martin Doerr wrote: > My point is that I think that the riscv solution is better. See assembler_riscv.inline.hpp. @TheRealMDoerr can we do it with another RFE ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21864#issuecomment-2461308296 From galder at openjdk.org Thu Nov 7 05:20:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 05:20:41 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 7 Nov 2024 00:13:00 GMT, Vladimir Ivanov wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Please, reframe JDK-8326369 as an Enhancement to add a missing test case. Otherwise, it looks confusing. > > It's also fine to create new issue and close JDK-8326369 as a duplicate of JDK-8339299. @iwanowww I've reframed JDK-8326369 as per your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461339721 From amitkumar at openjdk.org Thu Nov 7 05:41:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 05:41:41 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. I see that test is passing for s390x (with ubsan enabled). But still do you think we should disable for s390x as well ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2461368447 From epeter at openjdk.org Thu Nov 7 06:39:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:39:54 GMT Subject: RFR: 8335392: C2 MergeStores: enhanced pointer parsing [v18] In-Reply-To: <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> References: <8oIUayQwwcLTY0Hp04GJyxq7ZgVGvdGVC2JpFPVpEgs=.2006412d-a35d-4588-a751-4b9f978bd87d@github.com> <3ItvC90tZHf_VJuHevQPlS71roWQAn0kyaiAr1JBtf4=.8a631b7a-cf49-45cd-b9de-2c95e2340cc3@github.com> Message-ID: <5OKKjGShXckCMGeNBZNwzgfI-1X5NyD4mRrzPQy2jEk=.effcbecf-713b-4161-8a73-5e579c4ae685@github.com> On Thu, 7 Nov 2024 00:45:22 GMT, Shaojin Wen wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more changes for Christian > > If it is not provided in the release image, users need to find the source code of the current version of JDK to build the fastdebug image to analyze whether the MergeStore optimization of a certain code works. > > I can understand that MergeStore may still need to be improved, so it cannot be used as a product feature, but this is a useful optimization and I hope it can be provided in the product eventually. > > I hope that TraceMergeStore can eventually be used in the release image like `PrintInlining` and become a tool for performance optimizers. @wenshao I suppose we could consider making `TraceMergeStores` and `TraceAutoVectorization` available in product, but under the `-XX:+UnlockDiagnosticVMOptions` flag... I will discuss this with other VM engineers. That means it is available, but there is no promise of stability. Still, once people become dependent on it, maybe even tools become dependent, then it is harder to make changes without everybody complaining ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19970#issuecomment-2461437763 From epeter at openjdk.org Thu Nov 7 06:56:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:56:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures Message-ID: **History** This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. **Summary of Problem** As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. **Benchmark** I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). The benchmarks look different on different machines, but they all have a pattern similar to this: ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). The reason is that for low offsets, the latency dominates the runtime, and for high offsets the throughput dominates. If there are store-to-load-failures from every iteration `i` -> `i+offset`, and we have a total of `n` iterations, then we have a chain of `n/offset` latencies. Hence, as the `offset` increases, this latency chain becomes smaller and smaller. As an example: `offset = 3`, the 3rd iteration depends on the 0th, the 6th on the 3rd, the 9th on the 6th, the 12th on the 9th ... all the way to the nth iteration. **Current Solution: a new heuristic** Any heuristic is going to be somewhat inaccurate, but we now want to fix this issue in JDK24, and so I'd rather have a quick solution that works most of the time, rather than a sophisticated solution that works almost always. The sophisticated solution would carefully compute the expected latency and throughput for both the scalar and vectorized loop, and pick the faster one. I hope to experiment with that in the future. For now, we just implement a "hard cutoff": if we predict that there will be ANY store-to-load-forwarding failure within some `N` iterations, then we bailout of vectoirzation. This `N` can be configured with the new diagnostic flag `SuperWordStoreToLoadForwardingFailureDetection`. The benchmarks indicated that `x64` machines should have a value of `16`, and `aarch64 asimd/neon` machines a value of `8`. I do not know what the value should be on other machines ... I just guessed it to be `16`, but **platform maintainers are welcome to adjust this value** - my benchmarks may be a helpful guide. Note: we only detect store-to-load-forwarding failures when the loads and stores are known at compile time to go to the same memory object. **Should someone experience performance regressions doe to this fix**: you can disable the detection by setting the diagnostic flag: `-XX:+UnlockDiagnosticVMOptions -XX:SuperWordStoreToLoadForwardingFailureDetection=0`. Maybe you just need to lower it from the default. Increasing it is probably not going to help - but why not try anyway. **Tests** I had to adapt some tests. Primarily `TestDependencyOffsets.java`, which I just refactored in https://github.com/openjdk/jdk/pull/21541 to make these changes here easier. **Performance Testing** [I ran my benchmark on 7 machines](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698), and the new heuristic seems to perform very well. I also ran extensive performance testing, and I did not see any significant change. This was the originally reported regression (MacOSX x64 - SPECjvm2008-Crypto.signverify-G1: specjvm2008): ![image](https://github.com/user-attachments/assets/ed68344a-c4aa-47b7-96a1-60c91faee503) (drop from `promo-24-b1` to `promo-24-b2`) And that seems to be fixed now: ![image](https://github.com/user-attachments/assets/394f7a44-5fb5-4217-bf0e-f5a585268f2a) ------------- Commit messages: - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - fix whitespace - fix tests and build - fix store-to-load forward IR rules - updates before the weekend ... who knows if they are any good - refactor to iteration threshold - use jvmArgs again, and apply same fix as 8343345 - revert to jvmArgsPrepend - manual merge - ... and 14 more: https://git.openjdk.org/jdk/compare/06d8216a...9b2efe1a Changes: https://git.openjdk.org/jdk/pull/21521/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334431 Stats: 4386 lines in 17 files changed: 4324 ins; 4 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Thu Nov 7 06:56:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 06:56:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 11:33:04 GMT, Emanuel Peter wrote: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... I'm experimenting now. I have taken my benchmark from https://github.com/openjdk/jdk/pull/19880, and extended it a little. Here the full results in the [PDF](https://github.com/user-attachments/files/17518027/table2.pdf). And here some charts: ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) I ran this on my avx521 machine, so results may vary on different platforms - especially with different vector-lengths and different store-to-load-forwarding mechanisms. But on my machine, it is pretty clear that the cut-off is at about an offset of 32. I explain it like this: with an offset smaller than 32, the latency is the main issue: the store-to-load-forwarding failures incur a higher latency on that store-load "edge", and that shows in the final runtime. But if the offset is larger than 32, then we have limitation on throughput: we can only run so many scalar ops per cycle. But if we turn them into vector ops, we have fewer ops, and so we are faster. Of course, optimally we would have some sort of cost model that takes into account both latency and throughput. But that is for Future Work. For now, we need some cut-off heuristic. And it looks like - at least for avx512 - the heuristic is that we must check if there is any store-to-load-forwarding failure within 32 (virtually unrolled?) iterations. Of course this will not be fully accurate - hand-unrolling and a number of other factors can confuse this heuristic. Now I also ran it on a ASIMD aarch64 machine. Here the [PDF](https://github.com/user-attachments/files/17522538/table_aarch64.pdf). ![image](https://github.com/user-attachments/assets/ccaa73b0-1659-4ead-873c-39cc7c9b4e53) ![image](https://github.com/user-attachments/assets/a662e555-66f2-43ac-9d43-66e2a739a108) ![image](https://github.com/user-attachments/assets/d8dd329e-19e7-4e3b-8e25-b8d69aacc2ac) ![image](https://github.com/user-attachments/assets/9cee426b-46a3-4359-a547-d20d8d03dbce) A few observations. - The short benchmark is a bit noisy. Maybe someone else got on to that machine while I was running the benchmark. But everything else looks quite clean and nice, so I won't re-run the benchmark again now. - In all 4 plots, we see a similar pattern: with smaller than offset 8, there is some few instances where vectorization is slower, but with higher offset, vectorization seems to always pay off. - We see a similar "stepping" pattern with `byte`, a little with `short`, and not much at all with `int` and `long`. - The vector size on that machine is only 16 bytes, so the throughput difference on the `long` benchmark can only be a factor 2x, with `int` 4x, with `short` 8x and with `byte` 16x. That seems to roughly show true with high offsets. If I had to come up with a hypothesis, I would say that the cut-off is at `X` iterations, where `X = MaxVectorSize / 2`. I need to confirm that with different machines, maybe AVX2. And now I also ran it on an `AVX2` machine (Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz). Here the [PDF](https://github.com/user-attachments/files/17586850/ghost11_avx2_table_v2.pdf). ![image](https://github.com/user-attachments/assets/bacb12ca-dbb3-4ae7-abab-f0690a93f6b1) ![image](https://github.com/user-attachments/assets/f1e321d6-4589-4c4a-9640-755146fe0961) ![image](https://github.com/user-attachments/assets/73ef7756-821c-4142-8ad3-faa6121b78a5) ![image](https://github.com/user-attachments/assets/4defda17-4408-4750-be50-0b3f31f69e44) It looks like the cut-off is consistently at 16 iterations, though 32 iterations would be fine as well. **Some Thoughts** Every hardware will behave different. It depends on latency and throughput. The latency depends on the L1 cache latency, especially for the store-to-load-forwarding failures. And the throughput depends on the vector length, and the number of ports that can execute the instructions. This is quite complex, and would require fine-tuning. For now, I will just have to set a hard limit, which is going to be inaccurate. But it is probably better on average than doing nothing for now. Now I'm trying to consider how to set the iteration threshold. The benchmark here is a very simple case, and it is (to my understanding) maximally sensitive to store-to-load-forwarding failure latency: we only perform load, add and store. If the loop contained more other instructions that could be parallelized, then we would be more quickly limited by throughput. Hence, vectorization would be profitable earlier, i.e. for lower iteration thresholds. I would therefore wager that it is better to err on the lower side, and set the iteration threshold lower than the `StoreToLoadForwarding` benchmark indicates. Thus, I will set the iteration threshold at `16` for `x86` (and by default for all platforms), and at `8` for `aarch64`. Of course the iteration threshold is in the current implementation only a lower bound, we cannot at this point avoid having more iterations than the threshold in the unrolled loop at the time we vectorize. If there are more iterations in the loop, the threshold is effectively higher. I've run benchmarks on 7 machines now. Here my [micro.ods](https://github.com/user-attachments/files/17643625/micro.ods). scalar: SuperWord disabled no_detect: SuperWord without detecting store-to-load-forwarding failures (old behaviour, before this patch) default: new default SuperWord behaviour (detect store-to-load-forwarding failures for small offsets -> disable vectorization if detected) **Conclusion** For this benchmark, it seems the new behaviour (`default`) is very accurately chosing the best options between `scalar` and `no_detect`. The only exception is on the `windows x64` machine: for ints and longs in the offset range from 17-31 `default` decides to vectorize (same as `no_detect`), where the `scalar` option would have been a little faster. But no heuristic is perfect, and as said above: this benchmark is maximally sensitive to latency, and if the amount of work per iteration was increased, then we should expect the balance to tip towards vectorization being preferrable. x64 AVX2 machine ![image](https://github.com/user-attachments/assets/c234f922-aab9-4c1d-bade-9d9ce1363372) OCI aach64 (asimd / neon) ![image](https://github.com/user-attachments/assets/cd1e6bf2-a5c5-42d7-9690-c13a0474313b) Linux aarch64 (asimd / neon) ![image](https://github.com/user-attachments/assets/488e0c12-9f2a-4601-8f06-d976531b9d7e) Linux x64 ![image](https://github.com/user-attachments/assets/24d04e84-7206-4ce6-9576-41c3090ba326) MacOSX aarch64 (asimd / neon) ![image](https://github.com/user-attachments/assets/c33fa869-3a06-4fff-af9c-fcb794e2f1a1) MacOSX x64 ![image](https://github.com/user-attachments/assets/93f16e92-f2e6-44b9-a0f4-2223ef7c619a) Windows x64 ![image](https://github.com/user-attachments/assets/79221029-9733-4488-ab27-654674b44c03) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2437135550 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2437764165 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2449602737 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2449611215 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2451423352 PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2458938698 From chagedorn at openjdk.org Thu Nov 7 07:07:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 07:07:45 GMT Subject: RFR: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21918#issuecomment-2461470724 From chagedorn at openjdk.org Thu Nov 7 07:07:46 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 07:07:46 GMT Subject: Integrated: 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor In-Reply-To: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> References: <5rOjn2UG-Qg9QFlM4wjWWFZ2X7TicU25nS4VSI1NAu8=.f73abba2-8e1e-4494-80e0-5e092763ecec@github.com> Message-ID: On Wed, 6 Nov 2024 07:00:57 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (this PR) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (upcoming) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790)) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. > ---... This pull request has now been integrated. Changeset: a6c85daa Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/a6c85daa1c5e685ab64cbf9860a022aaa4a0d7f8 Stats: 59 lines in 4 files changed: 22 ins; 22 del; 15 mod 8342945: Replace predicate walking code in get_assertion_predicates() used for Loop Unswitching and cleaning useless Template Assertion Predicates with a predicate visitor Reviewed-by: thartmann, roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21918 From duke at openjdk.org Thu Nov 7 07:31:47 2024 From: duke at openjdk.org (duke) Date: Thu, 7 Nov 2024 07:31:47 GMT Subject: RFR: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null @theoweidmannoracle Your change (at version 0fa7e4e52dcebcd0694afae77908a50101e820da) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21869#issuecomment-2461506657 From epeter at openjdk.org Thu Nov 7 07:51:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 07:51:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> On Sun, 3 Nov 2024 03:10:24 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2461538717 From epeter at openjdk.org Thu Nov 7 07:55:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 07:55:43 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. src/hotspot/share/opto/loopTransform.cpp line 2963: > 2961: // Kill the eliminated test > 2962: C->set_major_progress(); > 2963: Node *kill_con = intcon(1-flip); Suggestion: Node* kill_con = intcon(1-flip); We generally now have the pointer `*` with the type. So if you touch any new code please update it ;) src/hotspot/share/opto/loopopts.cpp line 334: > 332: } > 333: // 'con' is set to true or false to kill the dominated test. > 334: Node *con = makecon(pop == Op_IfTrue ? TypeInt::ONE : TypeInt::ZERO); Suggestion: Node* con = makecon(pop == Op_IfTrue ? TypeInt::ONE : TypeInt::ZERO); src/hotspot/share/opto/loopopts.cpp line 2907: > 2905: int proj_con = live_proj->_con; > 2906: assert(proj_con == 0 || proj_con == 1, "false or true projection"); > 2907: Node *con = intcon(proj_con); Suggestion: Node* con = intcon(proj_con); src/hotspot/share/opto/loopopts.cpp line 3245: > 3243: stay_in_loop(lp_proj, loop)->is_If() && > 3244: stay_in_loop(lp_proj, loop)->in(1)->in(1)->Opcode() == Op_CmpU, "inserted cmpi before cmpu"); > 3245: Node *con = makecon(lp_proj->is_IfTrue() ? TypeInt::ONE : TypeInt::ZERO); Suggestion: Node* con = makecon(lp_proj->is_IfTrue() ? TypeInt::ONE : TypeInt::ZERO); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832203491 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832204711 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832205017 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1832205177 From duke at openjdk.org Thu Nov 7 08:12:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 08:12:42 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 18:20:58 GMT, Vladimir Kozlov wrote: > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2461576384 From roland at openjdk.org Thu Nov 7 08:25:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 08:25:50 GMT Subject: RFR: 8341834: C2 compilation fails with "bad AD file" due to Replicate [v3] In-Reply-To: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> References: <77bYQ44LNNQlSteh4rJSEvJ5QWIvpBcb7eNb0Sy-vVE=.a8c58c15-a7c9-4333-9a45-e49fc35797eb@github.com> Message-ID: On Thu, 7 Nov 2024 01:42:30 GMT, Hao Sun wrote: >>> Do you know what JDK versions are affected? >> >> The failure doesn't reproduce with jdk21u. But that seems to be because we need JDK-8326139 (and JDK-8331575) for the bug to show up. > > Hi @rwestrel > > My JBS account is inactive recently. Hence I'd like to report the bug here. > > I encountered the following error with `-XX:MaxVectorSize=8` on both AArch64 and x86_64. > Could you help take a look at this issue? Thanks. > > Test command: > > make test JTREG="VM_OPTIONS=-XX:MaxVectorSize=8" TEST=test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java > > > Error message: > > CompileCommand: compileonly TestReplicateAtConv.test bool compileonly = true > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/tmp/jdk-dev/src/hotspot/share/opto/type.cpp:2499), pid=1424540, tid=1424557 > # assert(Matcher::vector_size_supported(elem_bt, length)) failed: length in range > # > # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-git-63c19d3db58) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-git-63c19d3db58, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0x17bca30] TypeVect::make(BasicType, unsigned int, bool)+0x150 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/core.1424540) > # > # An error report file with more information is saved as: > # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/hs_err_pid1424540.log > # > # Compiler replay data is saved as: > # /tmp/jdk-build/test-support/jtreg_test_hotspot_jtreg_compiler_vectorization_TestReplicateAtConv_java/scratch/0/replay_pid1424540.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp @shqking thanks for the report. I filed https://bugs.openjdk.org/browse/JDK-8343747 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21660#issuecomment-2461598820 From rrich at openjdk.org Thu Nov 7 08:55:45 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 08:55:45 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. src/hotspot/cpu/ppc/c2_init_ppc.cpp line 58: > 56: warning("OptoScheduling is not supported on this CPU."); > 57: FLAG_SET_DEFAULT(OptoScheduling, false); > 58: } Makes sense but better do it in `VM_Version::initialize()` because `Compile::pd_compiler2_init()` is called after initialization of flags has been completed and the setting will not be shown with `PrintFlagsFinal`. I'd even suggest to move the other flag settings there with this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832283840 From tholenstein at openjdk.org Thu Nov 7 08:58:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 7 Nov 2024 08:58:46 GMT Subject: RFR: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. I'm not really sure why 43 was chosen as the default. With this PR, we can experiment with higher values and potentially adjust the default in the future. >From my own tests, I have rarely seen the 43 limit hit, but I have observed a few edge cases where loop optimization were applied in the hundreds (after removing the 43 limit). We would need to look into those cases more closely to see if they actually improve performance or if they might even reveal issues in the loop optimizations. Thanks for the reviews @shipilev and @chhagedorn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21921#issuecomment-2461662396 From tholenstein at openjdk.org Thu Nov 7 08:58:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 7 Nov 2024 08:58:46 GMT Subject: Integrated: 8321997: Increase upper limit of LoopOptsCount flag In-Reply-To: References: Message-ID: <-qLB2HgbRaNRFNIFN-oie0Nd2mAh-Hc4pKMF1Ub2te4=.efeb241c-f8a0-4185-9fdf-f5464910adac@github.com> On Wed, 6 Nov 2024 09:13:12 GMT, Tobias Holenstein wrote: > Currently `LoopOptsCount` has a range of 5-43 with default value 43. For stress testing we want to set values higher than 43. Set to upper limit to 1000 or even max_jint. This pull request has now been integrated. Changeset: 592a48b1 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/592a48b163ed582872b686e7a606cf8b96fcbcbc Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8321997: Increase upper limit of LoopOptsCount flag Reviewed-by: shade, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21921 From chagedorn at openjdk.org Thu Nov 7 09:37:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 09:37:03 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Message-ID: #### Replacing the Remaining Predicate Walking and Cloning Code The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) --- (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) #### Single Template Assertion Predicate Check This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). #### Common Refactorings for all the Patches in this Series In each of the patch, I will do similar refactoring ideas: - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. This limitation should eventually be removed. But I want to do that separately at a later point. --- #### Refactorings of this Patch This patch replaces the predicate walking in `PhaseIdealLoop::update_main_loop_assertion_predicates()` which is used during Loop Unrolling to update the Template Assertion Predicates for the new unrolled stride and create new Initialized Assertion Predicates reflecting that change while the old Initialized Assertion Predicates with the pre-unrolled stride are killled. - New visitor `UpdateStrideForAssertionPredicates` takes care of these tasks. - Update Template Assertion Predicates: `replace_opaque_stride_input()` - Uses new class `ReplaceOpaqueStrideInput` which does a simple BFS on a Template Assertion Expression to find the `OpaqueLoopStrideNode` to update it. Note that the existing class `DataNodesOnPathsToTargets` is not suitable since this class collects all nodes in between which is unnecessary for this task. - Create Initialized Assertion Predicate from template: `initialize_from_updated_template()` - Calls `clone_and_fold_opaque_loop_nodes()` that uses new strategy class `RemoveOpaqueLoopNodesStrategy` which is passed to the existing method `TemplateAssertionExpression::clone()` to do the Template Assertion Expression cloning. This strategy just folds the `OpaqueLoop*nodes` away for the cloned expression and only keeps their inputs. #### Follow-up Work In Loop Unrolling, we only update the stride and not the init value. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This was already an inefficiency before but could now be tackled since we keep track of whether an Assertion Predicate is for the init or last value with `AssertionPredicateType`. I filed [JDK-8343745](https://bugs.openjdk.org/browse/JDK-8343745) for that. Thanks, Christian ------------- Commit messages: - Add const - 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Changes: https://git.openjdk.org/jdk/pull/21944/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21944&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342946 Stats: 203 lines in 4 files changed: 161 ins; 29 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21944.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21944/head:pull/21944 PR: https://git.openjdk.org/jdk/pull/21944 From duke at openjdk.org Thu Nov 7 10:06:49 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 10:06:49 GMT Subject: Integrated: 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:23:16 GMT, theoweidmannoracle wrote: > Printing incorrectly printed `nullptr` instead of `null` > > Buggy: > > > ScopeDesc(pc=0x0000000104c05468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: nullptr > - l3: empty > Expression stack > - @0: nullptr > > > Fixed: > > > ScopeDesc(pc=0x0000000106fdd468 offset=2e8): > java.lang.Class::desiredAssertionStatus at 20 (line 3984) > Locals > - l0: reg rfp [58],oop > - l1: stack[0],oop > - l2: null > - l3: empty > Expression stack > - @0: null This pull request has now been integrated. Changeset: 7620b129 Author: Theo Weidmann Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/7620b129888d57514d9ef588e0681f1d43377236 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8323803: ConstantOopReadValue::print_on should print 'null' instead of 'nullptr' Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21869 From galder at openjdk.org Thu Nov 7 10:50:19 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 10:50:19 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Added copyright and @bug identifiers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/1f548010..1bf6992c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=00-01 Stats: 25 lines in 1 file changed: 24 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Thu Nov 7 10:50:19 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 10:50:19 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 11:31:37 GMT, Tobias Hartmann wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Added copyright and @bug identifiers > > Changes requested by thartmann (Reviewer). @TobiHartmann I've added `@bug` and copyright header. I've put Red Hat's copyright. @fzhinkin do you want me to add a line for Jetbrains to the copyright? I see it has been done in the past, e.g. `ComplexURITest`: /* Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2024 JetBrains s.r.o. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2461903598 From mdoerr at openjdk.org Thu Nov 7 11:26:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 11:26:42 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. Looks correct. Additional improvements could be done separately. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21864#pullrequestreview-2420693793 From lucy at openjdk.org Thu Nov 7 11:33:41 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 7 Nov 2024 11:33:41 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21864#pullrequestreview-2420710850 From mdoerr at openjdk.org Thu Nov 7 13:23:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:23:19 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21935/files - new: https://git.openjdk.org/jdk/pull/21935/files/a9330b32..db0d279e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=00-01 Stats: 44 lines in 2 files changed: 22 ins; 21 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From mdoerr at openjdk.org Thu Nov 7 13:23:19 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:23:19 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 08:52:46 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. > > src/hotspot/cpu/ppc/c2_init_ppc.cpp line 58: > >> 56: warning("OptoScheduling is not supported on this CPU."); >> 57: FLAG_SET_DEFAULT(OptoScheduling, false); >> 58: } > > Makes sense but better do it in `VM_Version::initialize()` because `Compile::pd_compiler2_init()` is called after initialization of flags has been completed and the setting will not be shown with `PrintFlagsFinal`. > I'd even suggest to move the other flag settings there with this PR. This makes sense. Please see my update. I have also added `guarantee(CodeEntryAlignment >= InteriorEntryAlignment, "");` to `Compile::pd_compiler2_init()` which is there on other platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832669129 From mdoerr at openjdk.org Thu Nov 7 13:27:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 13:27:44 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 05:39:24 GMT, Amit Kumar wrote: > I see that test is passing for s390x (with ubsan enabled). But still do you think we should disable for s390x as well ? You're free to decide. If there are no issues, there's no urgent need to change anything. On the other side, if it's not well maintained, then allowing the usage probably makes no sense. You could check if there's any performance difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2462234538 From rrich at openjdk.org Thu Nov 7 13:32:50 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 13:32:50 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 13:23:19 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. Looks good. Cheers, Richard. src/hotspot/cpu/ppc/vm_version_ppc.cpp line 174: > 172: > 173: // Power7 and later. > 174: if (PowerArchitecturePPC64 > 6) { Settings that depend on `PowerArchitecturePPC64` seem to be ordered. You might want to keep it like that. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2420983914 PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832682141 From fzhinkin at openjdk.org Thu Nov 7 14:04:47 2024 From: fzhinkin at openjdk.org (Filipp Zhinkin) Date: Thu, 7 Nov 2024 14:04:47 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 7 Nov 2024 10:45:00 GMT, Galder Zamarre?o wrote: >> Changes requested by thartmann (Reviewer). > > @TobiHartmann I've added `@bug` and copyright header. I've put Red Hat's copyright. > > @fzhinkin do you want me to add a line for Jetbrains to the copyright? I see it has been done in the past, e.g. `ComplexURITest`: > > > /* Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > * Copyright (c) 2024 JetBrains s.r.o. @galderz, I'd appreciate it if you can add `Copyright (c) 2024 JetBrains s.r.o.. All rights reserved.` to the header. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2462315472 From mdoerr at openjdk.org Thu Nov 7 14:09:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:09:02 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move Power7 flags up. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21935/files - new: https://git.openjdk.org/jdk/pull/21935/files/db0d279e..f8257242 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21935&range=01-02 Stats: 22 lines in 1 file changed: 11 ins; 11 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21935/head:pull/21935 PR: https://git.openjdk.org/jdk/pull/21935 From mdoerr at openjdk.org Thu Nov 7 14:09:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:09:02 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v2] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 13:28:49 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move flag configuration to VM_Version::initialize(). Add EntryAlignment guarantee like on other platforms. > > src/hotspot/cpu/ppc/vm_version_ppc.cpp line 174: > >> 172: >> 173: // Power7 and later. >> 174: if (PowerArchitecturePPC64 > 6) { > > Settings that depend on `PowerArchitecturePPC64` seem to be ordered. You might want to keep it like that. I have moved these flags up. Note that the checks will get removed by [JDK-8331859](https://bugs.openjdk.org/browse/JDK-8331859). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21935#discussion_r1832740450 From mbaesken at openjdk.org Thu Nov 7 14:16:49 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 7 Nov 2024 14:16:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2421105178 From mdoerr at openjdk.org Thu Nov 7 14:22:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 14:22:44 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21935#issuecomment-2462358079 From roland at openjdk.org Thu Nov 7 14:48:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 14:48:00 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Message-ID: A `CountedLoopEnd` (that marks the end of a still existing `CountedLoop`) is optimized out because a dominating identical `CountedLoopEnd` (that no longer marks the end of an existing `CountedLoop` but was left behind by previous loop opts) is found. That causes the path out of `CountedLoopEnd` to become dead including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` looses its backedge as a consequence. The `CountedLoop` is still marked as strip mined but the outer loop doesn't exist anymore. The fix I propose for this corner case is to simply detect when that happens (during igvn AFAICT) and clear the strip mined flag from the `CountedLoop`. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/21956/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340532 Stats: 75 lines in 3 files changed: 74 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21956.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21956/head:pull/21956 PR: https://git.openjdk.org/jdk/pull/21956 From roland at openjdk.org Thu Nov 7 14:54:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 14:54:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Looks reasonable to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21944#pullrequestreview-2421213969 From rrich at openjdk.org Thu Nov 7 15:03:49 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 7 Nov 2024 15:03:49 GMT Subject: RFR: 8343724: [PPC64] Disallow OptoScheduling [v3] In-Reply-To: References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Thu, 7 Nov 2024 14:09:02 GMT, Martin Doerr wrote: >> Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move Power7 flags up. Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21935#pullrequestreview-2421239929 From chagedorn at openjdk.org Thu Nov 7 15:05:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 15:05:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: <6EfxekMMTeswejTgNj2oHlzScpoW4LpHj5YkiXwM7Aw=.a0e3f667-4ad8-4308-90cd-3d1519c06e00@github.com> On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21944#issuecomment-2462463398 From chagedorn at openjdk.org Thu Nov 7 15:05:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 7 Nov 2024 15:05:44 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: References: Message-ID: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> On Thu, 7 Nov 2024 14:42:41 GMT, Roland Westrelin wrote: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. Looks reasonable to me. test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java line 28: > 26: * @bug 8340532 > 27: * @summary C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue > 28: * Since you use C2 only flags, you should add: Suggestion: * @requires vm.compiler2.enabled ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2421244527 PR Review Comment: https://git.openjdk.org/jdk/pull/21956#discussion_r1832837462 From acobbs at openjdk.org Thu Nov 7 15:46:45 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Thu, 7 Nov 2024 15:46:45 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 07:48:43 GMT, Emanuel Peter wrote: >> Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Update copyright years. >> - Merge branch 'master' into SuppressWarningsCleanup-hotspot >> - Merge branch 'master' into SuppressWarningsCleanup-graal >> - Remove unnecessary @SuppressWarnings annotations. > > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? Hi @eme64, > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? FYI there are [several other](https://github.com/openjdk/jdk/pulls?q=author%3Aarchiecobbs+is%3Apr+%22Remove+unnecessary%22+in%3Atitle+) PR's like this one. I haven't checked exhaustively, but all of the ones I've checked appear to be due to either (a) the warning was never needed, or (b) a subsequent refinement of the warning itself which made the code no longer qualify as "warnable". For an example of (a) see commit 8fb70c710afa which added `@SuppressWarnings("unchecked")` for a cast to type `Key`, even though `Key` is not a generic type and so the cast was never unchecked in the first place. For an example of (b), see commit b431c6929d12 which added `@SuppressWarnings("serial")` because an anonymous class did not declare `serialVersionUID`, but then later the warning was was changed to no longer trigger in that situation by [JDK-7152104](https://bugs.openjdk.org/browse/JDK-7152104), but the annotation was not removed as part of that commit. In this particular PR, it looks like (for example) the useless `@SuppressWarnings("try")` annotations on `compileMethod()` was [added in this commit](https://github.com/openjdk/jdk/commit/3b0ee5a6d8b89a52b0dacc51399955631d6aa597#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4) - probably a copy & paste error. This is typical. I guess the only other possibility is that the warning stopped working at some point due to a bug, but I haven't seen any examples of that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2462566874 From duke at openjdk.org Thu Nov 7 16:08:08 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:08:08 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v2] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/38d5bd0d..d1817ee8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From duke at openjdk.org Thu Nov 7 16:11:07 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:11:07 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v3] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/d1817ee8..798a6172 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From roland at openjdk.org Thu Nov 7 16:18:09 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 16:18:09 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21956/files - new: https://git.openjdk.org/jdk/pull/21956/files/5e9ca1bf..a4649dd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21956&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21956.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21956/head:pull/21956 PR: https://git.openjdk.org/jdk/pull/21956 From roland at openjdk.org Thu Nov 7 16:22:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 7 Nov 2024 16:22:49 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> References: <7z556krCUH5mTfEeTgD75L3MHiUG9k-1_7Ox4LcH0F4=.a2830249-8597-4ff7-95cf-358a30f044bb@github.com> Message-ID: On Thu, 7 Nov 2024 15:02:48 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java line 28: > >> 26: * @bug 8340532 >> 27: * @summary C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue >> 28: * > > Since you use C2 only flags, you should add: > Suggestion: > > * @requires vm.compiler2.enabled Right! Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21956#discussion_r1832973222 From kvn at openjdk.org Thu Nov 7 17:35:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:35:41 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21944#pullrequestreview-2421716128 From kvn at openjdk.org Thu Nov 7 17:38:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:38:44 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: <0m4ib_nqLieISo4dU7rYBEwzL2EndD8AOWtrgH_qJwQ=.94fabea9-54ac-45fa-b6dc-f5ba94b04f13@github.com> On Thu, 7 Nov 2024 16:18:09 GMT, Roland Westrelin wrote: >> A `CountedLoopEnd` (that marks the end of a still existing >> `CountedLoop`) is optimized out because a dominating identical >> `CountedLoopEnd` (that no longer marks the end of an existing >> `CountedLoop` but was left behind by previous loop opts) is >> found. That causes the path out of `CountedLoopEnd` to become dead >> including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` >> looses its backedge as a consequence. The `CountedLoop` is still >> marked as strip mined but the outer loop doesn't exist anymore. >> >> The fix I propose for this corner case is to simply detect when that >> happens (during igvn AFAICT) and clear the strip mined flag from the >> `CountedLoop`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java > > Co-authored-by: Christian Hagedorn Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2421726221 From kvn at openjdk.org Thu Nov 7 17:51:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 7 Nov 2024 17:51:44 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:09:55 GMT, theoweidmannoracle wrote: > > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. > > There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. > > Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? My suggesting is about additional cleaning code. I think 3 + 5 places are enough to justify to have a new function in header file. Also `set_root_as_ctrl(n)` could be copy of `set_ctrl(n, ctrl)` without 2 asserts which checks `ctrl`. It will be faster. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2462873767 From mdoerr at openjdk.org Thu Nov 7 22:14:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 7 Nov 2024 22:14:47 GMT Subject: Integrated: 8343724: [PPC64] Disallow OptoScheduling In-Reply-To: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> References: <4CarWHRIvPn8OIHzBEVx2di8Q3FJkdKaU3dcLZtgAjk=.fd50f8d4-e36e-4df9-afbe-19d6f93c9548@github.com> Message-ID: On Wed, 6 Nov 2024 16:18:11 GMT, Martin Doerr wrote: > Force off OptoScheduling for PPC64. It should be properly maintained before allowing users to switch it on. It's probably not important for this platform. Also see JBS issue for motivation. This pull request has now been integrated. Changeset: f621f26c Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/f621f26cd113090a0305598cfc50f0eac9a263c6 Stats: 39 lines in 2 files changed: 22 ins; 16 del; 1 mod 8343724: [PPC64] Disallow OptoScheduling Reviewed-by: rrich, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/21935 From fyang at openjdk.org Fri Nov 8 02:18:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 02:18:06 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Message-ID: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Hello, please review this trivial change. The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have: $ java -Xlog:stubs -XX:-UseRVC -version [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 ------------- Commit messages: - 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Changes: https://git.openjdk.org/jdk/pull/21966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343805 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21966/head:pull/21966 PR: https://git.openjdk.org/jdk/pull/21966 From dlong at openjdk.org Fri Nov 8 03:12:46 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 03:12:46 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. I agree with Roland, rather that overwriting the old information, it would be nice to append to it. Unfortunately this late inlining print support is a bit complicated and also a bit broken, I discovered recently. It could probably use a cleanup. I hit one assert because there was no message printed in do_late_inline_check when allow_inline was set to true. When I investigated that, I discovered that print_inlining_commit() will happily append a new message next to an old message on the same line. Something is going wrong with the logic in print_inlining_update() when cg() is null. I'm wondering if we could simplify things by placing the stringStream inside the CallGenerator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2463671322 From amitkumar at openjdk.org Fri Nov 8 04:54:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 04:54:32 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments Message-ID: trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. ------------- Commit messages: - int64_t -> uint64_t Changes: https://git.openjdk.org/jdk/pull/21967/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21967&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343810 Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21967/head:pull/21967 PR: https://git.openjdk.org/jdk/pull/21967 From chagedorn at openjdk.org Fri Nov 8 06:21:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 06:21:31 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 16:18:09 GMT, Roland Westrelin wrote: >> A `CountedLoopEnd` (that marks the end of a still existing >> `CountedLoop`) is optimized out because a dominating identical >> `CountedLoopEnd` (that no longer marks the end of an existing >> `CountedLoop` but was left behind by previous loop opts) is >> found. That causes the path out of `CountedLoopEnd` to become dead >> including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` >> looses its backedge as a consequence. The `CountedLoop` is still >> marked as strip mined but the outer loop doesn't exist anymore. >> >> The fix I propose for this corner case is to simply detect when that >> happens (during igvn AFAICT) and clear the strip mined flag from the >> `CountedLoop`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21956#pullrequestreview-2422794662 From syan at openjdk.org Fri Nov 8 06:44:18 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 8 Nov 2024 06:44:18 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt Message-ID: Hi all, The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. ------------- Commit messages: - 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt Changes: https://git.openjdk.org/jdk/pull/21968/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21968&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343488 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21968.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21968/head:pull/21968 PR: https://git.openjdk.org/jdk/pull/21968 From chagedorn at openjdk.org Fri Nov 8 07:05:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:05:28 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21968#pullrequestreview-2422849876 From syan at openjdk.org Fri Nov 8 07:14:40 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 8 Nov 2024 07:14:40 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21968#issuecomment-2463926413 From chagedorn at openjdk.org Fri Nov 8 07:19:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:19:16 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Message-ID: (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. Thanks, Christian ------------- Depends on: https://git.openjdk.org/jdk/pull/21944 Commit messages: - 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Changes: https://git.openjdk.org/jdk/pull/21969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343745 Stats: 101 lines in 7 files changed: 16 ins; 13 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Fri Nov 8 07:19:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 07:19:16 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian src/hotspot/share/opto/predicates.cpp line 881: > 879: // Only Last Value Assertion Predicates have an OpaqueLoopStrideNode. > 880: return; > 881: } Skipping to update Init Value Template Assertion Predicate. src/hotspot/share/opto/predicates.hpp line 1073: > 1071: // Only Last Value Initialized Assertion Predicates need to be killed and updated. > 1072: initialized_assertion_predicate.kill(_phase); > 1073: } Only killing old Last Value Initialized Assertion Predicate ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21969#discussion_r1833814389 PR Review Comment: https://git.openjdk.org/jdk/pull/21969#discussion_r1833813957 From roland at openjdk.org Fri Nov 8 07:54:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 07:54:42 GMT Subject: RFR: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue [v2] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:17:49 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/loopstripmining/TestIdenticalDominatingCLE.java >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/21956#issuecomment-2463981674 From roland at openjdk.org Fri Nov 8 07:57:18 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 07:57:18 GMT Subject: Integrated: 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue In-Reply-To: References: Message-ID: <57sjv4VxZ2KunadWfkprDW5tlKiWMM45J4UEOJhCQPI=.c17d2245-a825-46db-b365-40c203fcc9eb@github.com> On Thu, 7 Nov 2024 14:42:41 GMT, Roland Westrelin wrote: > A `CountedLoopEnd` (that marks the end of a still existing > `CountedLoop`) is optimized out because a dominating identical > `CountedLoopEnd` (that no longer marks the end of an existing > `CountedLoop` but was left behind by previous loop opts) is > found. That causes the path out of `CountedLoopEnd` to become dead > including the `OuterStripMinedLoopEnd`. The `OuterStripMinedLoop` > looses its backedge as a consequence. The `CountedLoop` is still > marked as strip mined but the outer loop doesn't exist anymore. > > The fix I propose for this corner case is to simply detect when that > happens (during igvn AFAICT) and clear the strip mined flag from the > `CountedLoop`. This pull request has now been integrated. Changeset: a10b1ccd Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/a10b1ccd377335354db7505e9944496729e539ce Stats: 75 lines in 3 files changed: 74 ins; 1 del; 0 mod 8340532: C2: assert(is_OuterStripMinedLoop()) failed: invalid node class: IfTrue Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21956 From jbhateja at openjdk.org Fri Nov 8 08:15:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 8 Nov 2024 08:15:32 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/43320063..613f491b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=00-01 Stats: 69 lines in 7 files changed: 57 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From rcastanedalo at openjdk.org Fri Nov 8 08:52:34 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 08:52:34 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 20:30:26 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java > > Co-authored-by: Andrey Turbanov Nice improvement, thanks for working on this! If the user selects a dark color, the node labels might become hard to read. Here's a simple change that addresses that by coloring the labels in white in that case. Please consider merging it into this PR: diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java index a469d196a6b..495d844eb34 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/widgets/FigureWidget.java @@ -113,7 +113,6 @@ public FigureWidget(final Figure f, DiagramScene scene) { LayoutFactory.SerialAlignment.LEFT_TOP : LayoutFactory.SerialAlignment.CENTER; middleWidget.setLayout(LayoutFactory.createVerticalFlowLayout(textAlign, 0)); - middleWidget.setBackground(f.getColor()); middleWidget.setOpaque(true); middleWidget.getActions().addAction(new DoubleClickAction(this)); middleWidget.setCheckClipping(false); @@ -143,7 +142,6 @@ public FigureWidget(final Figure f, DiagramScene scene) { textWidget.addChild(lw); lw.setLabel(displayString); lw.setFont(Diagram.FONT); - lw.setForeground(getTextColor()); lw.setAlignment(LabelWidget.Alignment.CENTER); lw.setVerticalAlignment(LabelWidget.VerticalAlignment.CENTER); lw.setBorder(BorderFactory.createEmptyBorder()); @@ -151,6 +149,8 @@ public FigureWidget(final Figure f, DiagramScene scene) { } formatExtraLabel(false); + refreshColor(); + if (getFigure().getWarning() != null) { ImageWidget warningWidget = new ImageWidget(scene, warningSign); Point warningLocation = new Point(getFigure().getWidth() - Figure.WARNING_WIDTH - Figure.INSET / 2, 0); @@ -186,6 +186,9 @@ protected Sheet createSheet() { public void refreshColor() { middleWidget.setBackground(figure.getColor()); + for (LabelWidget lw : labelWidgets) { + lw.setForeground(getTextColor()); + } } @Override ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423097780 From rcastanedalo at openjdk.org Fri Nov 8 09:06:33 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 09:06:33 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 20:30:26 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java > > Co-authored-by: Andrey Turbanov In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java index e68abd3297e..c4f2ac670e7 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { }; Action[] actionsWithSelection = new Action[]{ + ColorAction.get(ColorAction.class), ExtractAction.get(ExtractAction.class), HideAction.get(HideAction.class), null, @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); toolBar.addSeparator(); - toolBar.add(ColorAction.get(ColorAction.class)); - toolBar.addSeparator(); toolBar.add(ExtractAction.get(ExtractAction.class)); toolBar.add(HideAction.get(HideAction.class)); toolBar.add(ShowAllAction.get(ShowAllAction.class)); diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java index a51934a4322..92921c81512 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java @@ -43,7 +43,7 @@ @ActionReference(path = "Shortcuts", name = "D-C") }) @Messages({ - "CTL_ColorAction=Color action", + "CTL_ColorAction=Color", "HINT_ColorAction=Color current set of selected nodes" }) public final class ColorAction extends ModelAwareAction { diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java index 24815527a0e..c2329cbb26f 100644 --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java @@ -42,7 +42,7 @@ @ActionReference(path = "Shortcuts", name = "D-X") }) @Messages({ - "CTL_ExtractAction=Extract action", + "CTL_ExtractAction=Extract", "HINT_ExtractAction=Extract current set of selected nodes" }) public final class ExtractAction extends ModelAwareAction { @tobiasholenstein @chhagedorn what do you think? If you agree, feel free to merge the patch into this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464175649 From chagedorn at openjdk.org Fri Nov 8 09:13:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 09:13:51 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 08:49:33 GMT, Roberto Casta?eda Lozano wrote: > If the user selects a dark color, the node labels might become hard to read. Here's a simple change that addresses that by coloring the labels in white in that case. Please consider merging it into this PR: Great idea! > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there I agree with this. Maybe we should generally think about cleaning the toolbar and dropping some of the fewer used icons. > On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. I thought about this, too. I think that would be quite handy and an intuitive thing to do when not being aware of the feature and checking if there is an option to do it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464190597 From mli at openjdk.org Fri Nov 8 10:29:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 8 Nov 2024 10:29:14 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 01:41:51 GMT, Fei Yang wrote: > Please also update the JBS title to reflect the latest version, as we are targeting more options than a single UseZvfh. Thanks, modified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2464345402 From tholenstein at openjdk.org Fri Nov 8 10:29:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:29:32 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v3] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <_3cywd3W1exRW34ru-WV9_7X2OdeZHG_mmVfyuarMuQ=.540d6650-e043-45e5-9f22-91bfab76cb61@github.com> > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: white font for dark colors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/17205bab..56e046ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=01-02 Stats: 17 lines in 1 file changed: 14 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 10:32:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:32:32 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: move from toolbar to menu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/56e046ca..6d7856ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=02-03 Stats: 5 lines in 3 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 10:39:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 10:39:20 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <-nIYtO7mMceSo3Ux84ByTa1FWrZu48pXXw4ED7f4QYc=.dd22ae23-4de4-4ff0-917d-49e59d972e5e@github.com> On Fri, 8 Nov 2024 09:02:53 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java >> >> Co-authored-by: Andrey Turbanov > > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: > > > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > index e68abd3297e..c4f2ac670e7 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { > }; > > Action[] actionsWithSelection = new Action[]{ > + ColorAction.get(ColorAction.class), > ExtractAction.get(ExtractAction.class), > HideAction.get(HideAction.class), > null, > @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} > toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); > toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); > toolBar.addSeparator(); > - toolBar.add(ColorAction.get(ColorAction.class)); > - toolBar.addSeparator(); > toolBar.add(ExtractAction.get(ExtractAction.class)); > toolBar.add(HideAction.get(HideAction.class)); > toolBar.add(ShowAllAction.get(ShowAllAction.class)); > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > index a51934a4322..92921c81512 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > @@ -43,7 +43,7 @@ > @ActionReference(path = "Shortcuts", name = "D-C") > }) > @Messages({ > - "CTL_ColorAction=Color action", > + "CTL_ColorAction=Color", > "HINT_ColorAction=Color current set of selected nodes" > }) > public final class ColorAction extends ModelAwareAction { > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/I... @robcasloz I have applied both your patches ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464366255 From fyang at openjdk.org Fri Nov 8 10:40:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 10:40:22 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2423360257 From rcastanedalo at openjdk.org Fri Nov 8 10:50:06 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 8 Nov 2024 10:50:06 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 10:32:32 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > move from toolbar to menu Looks good, thanks Toby! I see that you also fixed the extra label color, nice! would be good to factor out that code together with that of `FigureWidget::getTextColor()`, but not a must. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423381186 From chagedorn at openjdk.org Fri Nov 8 11:33:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 11:33:24 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> On Fri, 8 Nov 2024 10:32:32 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > move from toolbar to menu Looks good! Just tried it out. One more thing I've noticed: When selecting a color for a node and then trying to color another node, the color selection resets back to `#ffffff`. Would be nice if the last selection would have been stored. But I'm not sure how easy this is. Could also be done separately. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2423469608 From amitkumar at openjdk.org Fri Nov 8 12:48:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 12:48:33 GMT Subject: RFR: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: <-_6fxDHjNO3eG0JX_Nqscc0bVUQgLcmK5OjoVME7xNk=.6ae3a749-36a4-47d2-9ff1-a29c28f97dd8@github.com> Message-ID: <80RUEwYH-agh78uyArNV-ZD6Thxj-76vDyNnwCMYwm0=.1b97264a-4d7b-48c3-8eab-696dbbc01de9@github.com> On Wed, 6 Nov 2024 00:56:24 GMT, Dean Long wrote: >> I don't think this is necessary. Unsigned subtraction with wrap-around is not undefined behavior. > > Right, it's not UB, but sometimes it is a bug, and would be flagged by things like -fsanitize=unsigned-integer-overflow, so my preference would be to avoid it if possible. As it is not really required and for `storage to storage` instructions `length = 0` is invalid case, which current code is already taking care of. So I would just simply keep it that way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21864#discussion_r1834351053 From amitkumar at openjdk.org Fri Nov 8 12:48:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 12:48:33 GMT Subject: Integrated: 8343506: [s390x] multiple test failures with ubsan In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 05:27:59 GMT, Amit Kumar wrote: > This is trivial patch which fixes the error I am seeing on s390x, while running tier1 with ubsan enabled. Please see JBS for more details. This pull request has now been integrated. Changeset: f6edfe58 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/f6edfe58d6931b058a5fec722615740818711065 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8343506: [s390x] multiple test failures with ubsan Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/21864 From rrich at openjdk.org Fri Nov 8 14:24:21 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:24:21 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms Message-ID: Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. ------------- Commit messages: - Exclude ppc64 Changes: https://git.openjdk.org/jdk/pull/21975/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343774 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 14:24:21 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:24:21 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. @offamitkumar want me to exclude s390x as well? @MBaesken said TestCastX2NotProcessedIGVN.java was failing there too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464886794 From tholenstein at openjdk.org Fri Nov 8 14:31:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:31:58 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v5] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remember recent colors and have 10 defaults ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/6d7856ed..f0b78af7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=03-04 Stats: 78 lines in 1 file changed: 72 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 14:31:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:31:58 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> <8pFlnBnaHRzeynpL2wS6sd7kiOCAy08J8wc1jRAC8AU=.579c3e98-6b9c-476d-bca2-d9bf05ec1c51@github.com> Message-ID: On Fri, 8 Nov 2024 11:29:47 GMT, Christian Hagedorn wrote: > Looks good! Just tried it out. One more thing I've noticed: When selecting a color for a node and then trying to color another node, the color selection resets back to `#ffffff`. Would be nice if the last selection would have been stored. But I'm not sure how easy this is. Could also be done separately. ![Screenshot 2024-11-08 at 15 27 06](https://github.com/user-attachments/assets/6dbf0732-f643-4ee2-add9-6adaee380fc0) right. I have updated it now to save the last colors and provide some default colors ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464903951 From roland at openjdk.org Fri Nov 8 14:41:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 14:41:11 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) See `TestBoolNodeGVN.java` for instance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464920227 From amitkumar at openjdk.org Fri Nov 8 14:45:17 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 8 Nov 2024 14:45:17 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: <0Q64LeFu-0amchhchBKVMWTr6CZjv8LQrrF7RtPW_Po=.78656350-0d90-4d58-84d4-670536f537c7@github.com> On Fri, 8 Nov 2024 14:21:26 GMT, Richard Reingruber wrote: > @offamitkumar want me to exclude s390x as well? @MBaesken said TestCastX2NotProcessedIGVN.java was failing there too. Yes it is failing for s390x as well. I think we should exclude s390x as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464931057 From tholenstein at openjdk.org Fri Nov 8 14:50:45 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:50:45 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v6] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: refactor getTextColor() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/f0b78af7..30dc5261 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=04-05 Stats: 34 lines in 2 files changed: 9 ins; 10 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Fri Nov 8 14:50:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:50:46 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v4] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 10:46:35 GMT, Roberto Casta?eda Lozano wrote: > Looks good, thanks Toby! I see that you also fixed the extra label color, nice! would be good to factor out that code together with that of `FigureWidget::getTextColor()`, but not a must. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2464943221 From rrich at openjdk.org Fri Nov 8 14:53:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:53:25 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Positive list for test2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21975/files - new: https://git.openjdk.org/jdk/pull/21975/files/c6bac710..a7c2872b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 14:53:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 14:53:25 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:37:05 GMT, Roland Westrelin wrote: > Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: > > ``` > applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) > ``` > > See `TestBoolNodeGVN.java` for instance. Ok. I've done that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2464947087 From tholenstein at openjdk.org Fri Nov 8 14:53:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 14:53:53 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v7] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/30dc5261..403d8b5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=05-06 Stats: 5 lines in 5 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From duke at openjdk.org Fri Nov 8 14:55:53 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 14:55:53 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add set_root_as_ctrl ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/798a6172..3dc3befd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=02-03 Stats: 23 lines in 6 files changed: 4 ins; 6 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From roland at openjdk.org Fri Nov 8 15:00:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 8 Nov 2024 15:00:49 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:53:25 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Positive list for test2 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2424024608 From duke at openjdk.org Fri Nov 8 15:07:30 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 15:07:30 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:48:27 GMT, Vladimir Kozlov wrote: >>> Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > >> > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > > My suggesting is about additional cleaning code. I think 3 + 5 places are enough to justify to have a new function in header file. Also `set_root_as_ctrl(n)` could be copy of `set_ctrl(n, ctrl)` without 2 asserts which checks `ctrl`. It will be faster. @vnkozlov I implemented your suggestion. Would you like to take another look? (It also helped me discover some more cases which can be replaced with the new *con*() functions.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2464983061 From chagedorn at openjdk.org Fri Nov 8 15:15:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 8 Nov 2024 15:15:30 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v7] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <_GEY6_2QtpdQv7_4xLHACtWQE3QZC4EWZjq1tTxbNgI=.2b8bf7ef-71a8-4297-87ed-caf47766557f@github.com> On Fri, 8 Nov 2024 14:53:53 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > copyright year Great idea with the defaults! I've just tried it out on Linux. It does seem to remember my last choice and offers me some defaults. But somehow the window look off: ![image](https://github.com/user-attachments/assets/626f6aef-195a-4b37-9869-16e8ccf26517) The defaults are hard to see and the rectangle saying "Preview" to the left is strange. It also says "Color Name: #ffffff" even though it chooses the last selected one correctly when pressing "OK". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2465002719 From tholenstein at openjdk.org Fri Nov 8 15:46:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 15:46:33 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v8] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: use MetalLookAndFeel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/403d8b5c..c9af2285 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=06-07 Stats: 42 lines in 1 file changed: 40 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From fyang at openjdk.org Fri Nov 8 15:48:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 8 Nov 2024 15:48:26 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: References: Message-ID: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> On Fri, 8 Nov 2024 14:53:25 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Positive list for test2 test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 66: > 64: @Test > 65: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 1"}, > 66: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) Hi, Could you please remove `riscv64` from this line? I just found that this test also fails when testing on riscv64 platforms where the vector extension is not available. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21975#discussion_r1834607797 From tholenstein at openjdk.org Fri Nov 8 15:57:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 15:57:28 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v9] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <5P3xMWErZk3AumFMcgTpnUtQkGhJDbKk9H18YsoPRGQ=.26f408a2-ead1-4e88-a9df-96bc5f2280fe@github.com> > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: set font only for ColorChooser ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/c9af2285..5cf1e8b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=07-08 Stats: 16 lines in 1 file changed: 1 ins; 13 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From rrich at openjdk.org Fri Nov 8 15:57:37 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 15:57:37 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Remove riscv64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21975/files - new: https://git.openjdk.org/jdk/pull/21975/files/a7c2872b..82ac4751 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21975&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21975/head:pull/21975 PR: https://git.openjdk.org/jdk/pull/21975 From rrich at openjdk.org Fri Nov 8 15:57:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 8 Nov 2024 15:57:39 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v2] In-Reply-To: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> References: <--JLrmhyB78xAe6PkT73i-CdbWyJiXAIShdz7Qh_OTE=.3914a796-ae28-4d35-982f-0e2e3cbef663@github.com> Message-ID: On Fri, 8 Nov 2024 15:44:46 GMT, Fei Yang wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Positive list for test2 > > test/hotspot/jtreg/compiler/c2/TestCastX2NotProcessedIGVN.java line 66: > >> 64: @Test >> 65: @IR(counts = {IRNode.LOAD_VECTOR_I, "> 1"}, >> 66: applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) > > Hi, Could you please remove `riscv64` from this line? I just found that this test also fails when testing on riscv64 platforms where the vector extension is not available. Thanks. Sure. I've removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21975#discussion_r1834622612 From kvn at openjdk.org Fri Nov 8 16:23:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 16:23:31 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: <0S9oX4sARVGPJZHQ_RvQcKWXwFEIt_W--uSruO2wKF8=.359ff9de-b7c8-42c2-acc6-a106b416d386@github.com> On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21968#pullrequestreview-2424243443 From kvn at openjdk.org Fri Nov 8 16:23:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 16:23:34 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: <7msdSYuP6w8tSJ-GXs2riNuwRFhLIXdIWxa8UiLXWXw=.b17b2cb1-994d-4540-aa9d-b808e6522088@github.com> On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Looks fine to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21969#pullrequestreview-2424253334 From tschatzl at openjdk.org Fri Nov 8 16:46:43 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 16:46:43 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 Message-ID: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Hi all, please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. Testing: gha, tier1-3 Thanks, Thomas ------------- Commit messages: - 8343824 Changes: https://git.openjdk.org/jdk/pull/21973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343824 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21973/head:pull/21973 PR: https://git.openjdk.org/jdk/pull/21973 From kvn at openjdk.org Fri Nov 8 17:47:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 17:47:11 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:55:53 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add set_root_as_ctrl Looks good. I have few additional comments. src/hotspot/share/opto/loopnode.cpp line 3147: > 3145: ConINode* zero = igvn->intcon(0); > 3146: if (iloop != nullptr) { > 3147: iloop->set_root_as_ctrl(zero); Please look on history of this code. This is suspicious - constant nodes should be always attached to Root. src/hotspot/share/opto/loopnode.hpp line 996: > 994: } > 995: void set_root_as_ctrl(Node* n) { > 996: assert( !has_node(n) || has_ctrl(n), "" ); We don't use spaces after and before `()` in assert(). Ignore old style in previous lines. src/hotspot/share/opto/loopopts.cpp line 195: > 193: set_root_as_ctrl(x); > 194: continue; > 195: } This looks like "band-aid" - this should be assert. May be investigate in separate RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2424513220 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834825217 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834821279 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1834827466 From kvn at openjdk.org Fri Nov 8 17:50:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 17:50:29 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas @tschatzl do you know history of these flags and why they are not used? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465420995 From rehn at openjdk.org Fri Nov 8 18:58:30 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 8 Nov 2024 18:58:30 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Sure, thanks. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21885#pullrequestreview-2424660232 From acobbs at openjdk.org Fri Nov 8 19:06:58 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:06:58 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v3] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/21c83e93..a574dda6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01-02 Stats: 131587 lines in 749 files changed: 103986 ins; 9680 del; 17921 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From tschatzl at openjdk.org Fri Nov 8 19:52:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 19:52:59 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Fwiw, the GHA failures are infrastructure issues, some dependencies could not be installed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465635907 From tschatzl at openjdk.org Fri Nov 8 19:52:59 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 19:52:59 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 17:46:49 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. >> >> Testing: gha, tier1-3 >> >> Thanks, >> Thomas > > @tschatzl do you know history of these flags and why they are not used? @vnkozlov: no and no - I am just starting looking at the C1 compiler to implement frequency based generation of post-write barrier filters (i.e. add the counters for later C2 compilation) as a follow-up to the post-write barrier changes. I only noticed that they were unused; looking back right now they are unused since at least JDK7u. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465632718 From acobbs at openjdk.org Fri Nov 8 19:59:44 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:59:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 15:43:45 GMT, Archie Cobbs wrote: > but all of the ones I've checked appear to be ... Correction - there is actually one case that revealed a compiler bug: [JDK-8343286](https://bugs.openjdk.org/browse/JDK-8343286). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2465649956 From kvn at openjdk.org Fri Nov 8 20:24:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 20:24:19 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Good. Yes, it looks like leftover from JDK 6 development. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21973#pullrequestreview-2424858930 From vlivanov at openjdk.org Fri Nov 8 20:28:07 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 8 Nov 2024 20:28:07 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 08:15:32 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. src/hotspot/share/opto/vectornode.cpp line 2132: > 2130: // Directly forward masked inputs if > 2131: if (n->Opcode() == Op_AndV) { > 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); This particular check should ensure that Replicate constant is `0xFFFFFFFF`. ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2424864897 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835023354 From dlong at openjdk.org Fri Nov 8 20:53:15 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 20:53:15 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: <4PVt1yp3fEDDRWyMYSmwOfS6N4zWQHnjbUqyxekI1Ac=.479c7d6a-de00-4179-a03a-69c57f9b8159@github.com> On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21973#pullrequestreview-2424897413 From tholenstein at openjdk.org Fri Nov 8 22:38:35 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 8 Nov 2024 22:38:35 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v10] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Create a panel for color chooser and apply the LAF to it - save location ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/5cf1e8b4..46024d07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=08-09 Stats: 27 lines in 1 file changed: 16 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From dlong at openjdk.org Fri Nov 8 23:09:19 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 8 Nov 2024 23:09:19 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: <8X6_A2Urx4zdtPDcHxFowwn14TMxNF2LxvfGq8-8dh4=.4439fd66-9db3-4f7a-897d-11b70281b050@github.com> On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas SCCS archeology reveals that these 3 were converted from boolean fields by JDK-4649182: bool _needs_write_barrier; bool _needs_store_check; bool _is_eliminated; // Set by store elimination InWorkListFlag was later added by JDK-7153771. As far as I can tell, the only one that was ever used is _is_eliminated/IsEliminatedFlag, which seems to have gone away between jdk5 and jdk6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2465880763 From sviswanathan at openjdk.org Fri Nov 8 23:20:29 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 8 Nov 2024 23:20:29 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 10:24:53 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> [vectorapi] Refactor VectorShuffle implementation > > I have adapted the patch in accordance with https://github.com/openjdk/jdk/pull/20634, I moved the index wrapping into C2 instead of making it a separate step as I think it seems clearer. Also, I think in the future we can eliminate this step so putting it in C2 would make the progress easier. > > Please take a look, thanks a lot. @merykitty Could you please merge with the latest and resolve conflicts? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2465889165 From sviswanathan at openjdk.org Fri Nov 8 23:21:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 8 Nov 2024 23:21:30 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 20:25:10 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Creating specialized IR to shield pattern from subsequent transforms in optimization pipeline > > src/hotspot/share/opto/vectornode.cpp line 2132: > >> 2130: // Directly forward masked inputs if >> 2131: if (n->Opcode() == Op_AndV) { >> 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); > > This particular check should ensure that Replicate constant is `0xFFFFFFFF`. Yes, this should ensure 0xFFFFFFFF. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835148834 From fyang at openjdk.org Sat Nov 9 01:13:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 9 Nov 2024 01:13:54 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2425114315 From dlong at openjdk.org Sat Nov 9 01:55:12 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 9 Nov 2024 01:55:12 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. Nevermind about print_inlining_commit() doing an append -- that is apparently the intended behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2465981620 From swen at openjdk.org Sat Nov 9 02:36:14 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 9 Nov 2024 02:36:14 GMT Subject: RFR: 8343629: More MergeStore benchmark [v2] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - from @eme64 add MergeStoresDisabled - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 - Merge branch 'master' into merge_store_bench_202410 - add putBytes4 and improved put ------------- Changes: https://git.openjdk.org/jdk/pull/21659/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=01 Stats: 320 lines in 1 file changed: 76 ins; 51 del; 193 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From amitkumar at openjdk.org Sat Nov 9 03:03:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 9 Nov 2024 03:03:42 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms [v3] In-Reply-To: References: Message-ID: <6XRfvjzDnmSSAWIGJ7vU1M0bP2_YZRsq1gbtbNr3hyk=.9cf5b9b8-55fe-47cd-b61f-93d6332a247d@github.com> On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 Marked as reviewed by amitkumar (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21975#pullrequestreview-2425172328 From amitkumar at openjdk.org Sat Nov 9 03:45:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 9 Nov 2024 03:45:30 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 14:49:09 GMT, Richard Reingruber wrote: >> Thanks for taking care of that. >> Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. >> The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >> >> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >> >> See `TestBoolNodeGVN.java` for instance. > >> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >> >> ``` >> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >> ``` >> >> See `TestBoolNodeGVN.java` for instance. > > Ok. I've done that. @reinrich Sorry for creating mess here. Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2466026795 From swen at openjdk.org Sat Nov 9 03:55:37 2024 From: swen at openjdk.org (Shaojin Wen) Date: Sat, 9 Nov 2024 03:55:37 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 07:22:40 GMT, Emanuel Peter wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > You can find an example of how to do that easily here: > https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 @eme64 Why is there no noticeable difference in the performance of `+/-MergeStores` | | -MergeStores | +MergeStores | delta | | --- | --- | --- | --- | | getCharB | 5900.246 | 5902.316 | -0.04% | | getCharBU | 4865.881 | 4866.630 | -0.02% | | getCharBV | 3084.194 | 3078.657 | 0.18% | | getCharC | 2233.422 | 2232.788 | 0.03% | | getCharL | 6032.213 | 6028.447 | 0.06% | | getCharLU | 4492.928 | 4482.773 | 0.23% | | getCharLV | 2220.004 | 2220.231 | -0.01% | | getIntB | 7996.907 | 8050.658 | -0.67% | | getIntBU | 9041.783 | 9035.892 | 0.07% | | getIntBV | 309.469 | 308.076 | 0.45% | | getIntL | 7887.687 | 7881.362 | 0.08% | | getIntLU | 8856.416 | 8863.707 | -0.08% | | getIntLV | 2225.803 | 2225.789 | 0.00% | | getIntRB | 8619.974 | 8616.985 | 0.03% | | getIntRBU | 11098.237 | 11100.091 | -0.02% | | getIntRL | 8959.808 | 8958.688 | 0.01% | | getIntRLU | 9237.407 | 9236.465 | 0.01% | | getIntRU | 2502.967 | 2503.585 | -0.02% | | getIntU | 2492.784 | 2492.675 | 0.00% | | getLongB | 24807.583 | 24797.555 | 0.04% | | getLongBU | 14022.093 | 14008.556 | 0.10% | | getLongBV | 601.878 | 600.904 | 0.16% | | getLongL | 25076.552 | 25111.661 | -0.14% | | getLongLU | 14470.997 | 14474.230 | -0.02% | | getLongLV | 2223.678 | 2223.882 | -0.01% | | getLongRB | 24769.555 | 24778.684 | -0.04% | | getLongRBU | 14017.091 | 14024.421 | -0.05% | | getLongRL | 25070.811 | 25085.936 | -0.06% | | getLongRLU | 14462.097 | 14467.410 | -0.04% | | getLongRU | 3056.826 | 3056.270 | 0.02% | | getLongU | 3045.057 | 3045.650 | -0.02% | | putBytes4 | 928.032 | 928.111 | -0.01% | | putBytes4GetBytes | 5876.794 | 5875.995 | 0.01% | | putBytes4U | 926.596 | 928.596 | -0.22% | | putBytes4X | 927.929 | 927.928 | 0.00% | | putChars4B | 5635.803 | 5635.872 | 0.00% | | putChars4BU | 1142.948 | 1141.809 | 0.10% | | putChars4BV | 4482.613 | 4480.597 | 0.04% | | putChars4C | 1132.133 | 1132.881 | -0.07% | | putChars4L | 5640.644 | 5632.055 | 0.15% | | putChars4LU | 1141.009 | 1142.132 | -0.10% | | putChars4LV | 1133.833 | 1133.137 | 0.06% | | putChars4S | 1132.469 | 1132.250 | 0.02% | | setCharBS | 6080.539 | 6081.117 | -0.01% | | setCharBV | 3598.374 | 3591.190 | 0.20% | | setCharC | 4497.279 | 4544.706 | -1.04% | | setCharLS | 5615.475 | 5620.162 | -0.08% | | setCharLV | 2249.104 | 2245.083 | 0.18% | | setIntB | 7999.139 | 8030.850 | -0.39% | | setIntBU | 17922.810 | 17942.929 | -0.11% | | setIntBV | 3237.265 | 3224.414 | 0.40% | | setIntL | 2124.492 | 2109.906 | 0.69% | | setIntLU | 4772.256 | 4801.314 | -0.61% | | setIntLV | 2110.382 | 2120.022 | -0.45% | | setIntRB | 13773.518 | 13775.889 | -0.02% | | setIntRBU | 14752.651 | 14754.926 | -0.02% | | setIntRL | 3226.597 | 3227.019 | -0.01% | | setIntRLU | 5862.400 | 5882.564 | -0.34% | | setIntRU | 5915.139 | 5917.139 | -0.03% | | setIntU | 4794.627 | 4780.927 | 0.29% | | setLongB | 31661.626 | 31598.635 | 0.20% | | setLongBU | 25681.380 | 25622.835 | 0.23% | | setLongBV | 2167.426 | 2164.900 | 0.12% | | setLongL | 5380.433 | 5321.645 | 1.10% | | setLongLU | 4281.526 | 4280.263 | 0.03% | | setLongLV | 2109.982 | 2110.138 | -0.01% | | setLongRB | 29807.728 | 29826.089 | -0.06% | | setLongRBU | 24973.926 | 24903.052 | 0.28% | | setLongRL | 4518.310 | 4518.594 | -0.01% | | setLongRLU | 4792.258 | 4795.612 | -0.07% | | setLongRU | 4796.491 | 4792.139 | 0.09% | | setLongU | 4280.624 | 4507.839 | -5.04% | ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2466029503 From syan at openjdk.org Sat Nov 9 11:41:03 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 9 Nov 2024 11:41:03 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr Message-ID: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr ------------- Commit messages: - support deal with "cbnz\tx0, Stub::_large_arrays_hashcode_short" - support deal with "adrp\tx0 = mnaddF_reg_regNode::pipeline_class()" - support deal with b\tStub::indexof_linear_ul - fix the var name bugs - deal with ": cbnz\tx16, Stub:: " difference and ": adrp\tx16, = TemplateInterpreterGenerator::generate_CRC32_update_entry()+32" difference - 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr Changes: https://git.openjdk.org/jdk/pull/21955/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21955&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343763 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21955.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21955/head:pull/21955 PR: https://git.openjdk.org/jdk/pull/21955 From syan at openjdk.org Sat Nov 9 12:14:42 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 9 Nov 2024 12:14:42 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. GHA report two failures, the fails seems like environmental issue, it's unreleated to this PR. 1. macos-aarch64 jdk/tier1 part1 at `install dependencied` stage fails `invalid developer directory` 2. macos-aarch64 hs/tier1 runtime at `install dependencied` stage fails `invalid developer directory` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21955#issuecomment-2466191838 From jbhateja at openjdk.org Sun Nov 10 07:43:48 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 07:43:48 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 20:25:23 GMT, Vladimir Ivanov wrote: > In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. > > So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. Hi Vladimir, Problem occurs if AndV gets shared, in such case matcher will not be able to absorb the masking pattern. Specialized IR overrules any such limitations and shields pattern it represents from downstream optimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2466624605 From jbhateja at openjdk.org Sun Nov 10 07:43:49 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 07:43:49 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Fri, 8 Nov 2024 23:18:18 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2132: >> >>> 2130: // Directly forward masked inputs if >>> 2131: if (n->Opcode() == Op_AndV) { >>> 2132: return n->in(1)->Opcode() == Op_Replicate ? n->in(2) : n->in(1); >> >> This particular check should ensure that Replicate constant is `0xFFFFFFFF`. > > Yes, this should ensure 0xFFFFFFFF. We land here after checking if inputs are uints. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1835611481 From jbhateja at openjdk.org Sun Nov 10 08:22:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 10 Nov 2024 08:22:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v3] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Refining comment - Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/613f491b..eba586b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=01-02 Stats: 17 lines in 2 files changed: 8 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From chagedorn at openjdk.org Mon Nov 11 06:21:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:21:18 GMT Subject: RFR: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21944#issuecomment-2467329571 From chagedorn at openjdk.org Mon Nov 11 06:21:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:21:18 GMT Subject: Integrated: 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor In-Reply-To: References: Message-ID: <2kqGwmXSFfj0wWGRIdXjk9WHG94L4CKAzYPKdi7AtuI=.f1d7623d-84ec-46dd-9dae-cc68ee13b8ee@github.com> On Thu, 7 Nov 2024 09:22:02 GMT, Christian Hagedorn wrote: > #### Replacing the Remaining Predicate Walking and Cloning Code > The goal is to replace and unify all the remaining custom predicate walking and cloning code currently used for: > - [JDK-8341977](https://bugs.openjdk.org/browse/JDK-8341977): Loop Peeling (integrated with https://github.com/openjdk/jdk/pull/21679)) > - [JDK-8342943](https://bugs.openjdk.org/browse/JDK-8342943): Main and Post Loop (integrated with https://github.com/openjdk/jdk/pull/21790) > - [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945): `PhaseIdealLoop::get_assertion_predicates()` used for Loop Unswitching and removing useless Template Assertion Predicate (integrated with https://github.com/openjdk/jdk/pull/21918) > - [JDK-8342946](https://bugs.openjdk.org/browse/JDK-8342946): Loop Unrolling (this PR) > > --- > (Sections taken over from https://github.com/openjdk/jdk/pull/21679 / https://github.com/openjdk/jdk/pull/21790 / https://github.com/openjdk/jdk/pull/21918) > > #### Single Template Assertion Predicate Check > This replacement allows us to have a single `TemplateAssertionPredicate::is_predicate()` check that is called for all predicate matching code. This enables the removal of uncommon traps for Template Assertion Predicates with [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) which is a missing piece in order to fix the remaining problems with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). > > #### Common Refactorings for all the Patches in this Series > In each of the patch, I will do similar refactoring ideas: > - Replace the existing code in the corresponding `PhaseIdealLoop` method with call to a new (or existing) predicate visitor which extends the `PredicateVisitor` interface. > - The visitor implements the Assertion Predicate `visit()` methods to implement the cloning and initialization of the Template Assertion Predicates. > - The predicate visitor is then passed to the `PredicateIterator` which walks through all predicates found at a loop and applies the visitor for each predicate. > - The visitor creates new nodes (if there are Template Assertion Predicates) either in place or at the loop entry of a target loop. In the latter case, the calling code of the `PredicateIterator` must make sure to connect the tail of the newly created predicate chain after the old loop entry to the target loop head. > - Keep the semantics which includes to only apply the Template Assertion Predicate processing if there are Parse Predicates. T... This pull request has now been integrated. Changeset: 5f338e9a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/5f338e9adbcf7fe7ee90abfd34a24a3a93c22211 Stats: 203 lines in 4 files changed: 161 ins; 29 del; 13 mod 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21944 From chagedorn at openjdk.org Mon Nov 11 06:26:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:26:28 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v2] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21969/files - new: https://git.openjdk.org/jdk/pull/21969/files/e9161d16..e9161d16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Mon Nov 11 06:30:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:30:18 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v2] In-Reply-To: References: Message-ID: <_eqJKUiwXpMRDabUbh96ktkMtEaic0l13KEovL4FA40=.8813cbf6-c359-40f7-9c1f-2f2d3acb4a83@github.com> On Mon, 11 Nov 2024 06:26:28 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21969#issuecomment-2467340652 From chagedorn at openjdk.org Mon Nov 11 06:54:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:54:32 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v3] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8343745 - 8343745: Only update Last Value Assertion Predicates in Loop Unrolling - Add const - 8342946: Replace predicate walking code in Loop Unrolling with a predicate visitor ------------- Changes: https://git.openjdk.org/jdk/pull/21969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=02 Stats: 130 lines in 7 files changed: 47 ins; 13 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From chagedorn at openjdk.org Mon Nov 11 06:57:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 06:57:37 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21969/files - new: https://git.openjdk.org/jdk/pull/21969/files/fb9dadfd..7279d42e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21969&range=02-03 Stats: 31 lines in 1 file changed: 0 ins; 31 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21969/head:pull/21969 PR: https://git.openjdk.org/jdk/pull/21969 From rrich at openjdk.org Mon Nov 11 07:18:47 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 07:18:47 GMT Subject: RFR: 8343774: compiler/c2/TestCastX2NotProcessedIGVN.java fails on ppc64(le) & s390x platforms In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 03:41:56 GMT, Amit Kumar wrote: >>> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >>> >>> ``` >>> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >>> ``` >>> >>> See `TestBoolNodeGVN.java` for instance. >> >> Ok. I've done that. > > @reinrich Sorry for creating mess here. > > Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; > > I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. > > While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467401535 From epeter at openjdk.org Mon Nov 11 07:26:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Nov 2024 07:26:29 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> On Wed, 6 Nov 2024 07:22:40 GMT, Emanuel Peter wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > You can find an example of how to do that easily here: > https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 > @eme64 Why is there no noticeable difference in the performance of +/-MergeStores What did you do to find out yourself? Did you use the trace flags to see if there is a difference in what is optimized / the output assembly code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2467415168 From epeter at openjdk.org Mon Nov 11 07:26:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 11 Nov 2024 07:26:30 GMT Subject: RFR: 8343629: More MergeStore benchmark [v2] In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 02:36:14 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - from @eme64 add MergeStoresDisabled > - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 > - Merge remote-tracking branch 'upstream/master' into merge_store_bench_202410 > - Merge branch 'master' into merge_store_bench_202410 > - add putBytes4 and improved put test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 1153: > 1151: } > 1152: > 1153: @Fork(value = 1, jvmArgsPrepend = { Suggestion: @Fork(value = 1, jvmArgs = { Can you make this change, and run the benchmarks again? There was a recent JMH build script change, and all usages of `jvmArgsPrepend` in JMH tests were supposed to be changed to `jvmArgs`. I think in your case the flag is actually not applied. Not sure if that is true, but it looks that way to me. https://github.com/openjdk/jdk/pull/21800 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1836062193 From duke at openjdk.org Mon Nov 11 07:43:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 07:43:16 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: <-1uIsg-ge9MgmoMQFqE7ojuoKr16S4v545Vy71uCs18=.a37c4555-9f94-4aa8-ae59-037f33ff8f05@github.com> On Fri, 8 Nov 2024 17:39:32 GMT, Vladimir Kozlov wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add set_root_as_ctrl > > src/hotspot/share/opto/loopnode.cpp line 3147: > >> 3145: ConINode* zero = igvn->intcon(0); >> 3146: if (iloop != nullptr) { >> 3147: iloop->set_root_as_ctrl(zero); > > Please look on history of this code. This is suspicious - constant nodes should be always attached to Root. @TobiHartmann Pointed out that this method is also called from code outside of loop opts, for example, `PhaseMacroExpand::expand_macro_nodes`. Since there's no PhaseIdealLoop in this case, nullptr is passed instead and we cannot set control as we are not inside a loop opt. Maybe @rwestrel can also take a look as he originally introduced this code in [this PR](https://github.com/openjdk/jdk/pull/7364/files#diff-d49652d43244d52415873c37bf6990269b0d6e2f2111f4f971660470b6bca738R2860). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836075707 From duke at openjdk.org Mon Nov 11 07:48:00 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 07:48:00 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v5] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Improve brace style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/3dc3befd..b472aafe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From rrich at openjdk.org Mon Nov 11 08:12:15 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:12:15 GMT Subject: RFR: 8343774: Positiv list ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: <05tFL9qXcLev2gBaotCrbqvPVv4Zb5pVN1tPNypCBBs=.db0999d8-64c9-41ac-90a4-019dd7ec4adf@github.com> On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 I've converted this issue [JDK-8343774](https://bugs.openjdk.org/browse/JDK-8343774) into a subtask. In the subtask the platforms where the ir checks of `test2` succeed are positive listed. The issues of other platforms are tracked in the parent task. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467486579 From duke at openjdk.org Mon Nov 11 08:30:46 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 08:30:46 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:41:33 GMT, Vladimir Kozlov wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add set_root_as_ctrl > > src/hotspot/share/opto/loopopts.cpp line 195: > >> 193: set_root_as_ctrl(x); >> 194: continue; >> 195: } > > This looks like "band-aid" - this should be assert. May be investigate in separate RFE. I opened an RFE for this https://bugs.openjdk.org/browse/JDK-8343907 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836126244 From jbhateja at openjdk.org Mon Nov 11 08:32:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 11 Nov 2024 08:32:18 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:48:00 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into phase-lowering >> - Remove platform-dependent node definitions, rework PhaseLowering implementation >> - Address some changes from code review >> - Implement PhaseLowering > > Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: > >> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). > > This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. > >> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. > > The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. > >> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. > > My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case ... Hi @jaskarth , I was trying to lower LShiftVB and URShiftVB IR for x86 backend intending to factor out upfront bytevector to shortvector conversion for input and shift vectors through GVN if both these are shared across two operations since x86 ISA does support direct byte vector shifts. To begin with, I simply made the following diff expecting status quo, but getting the following Fatal error at build time, can you kindly check? diff --git a/src/hotspot/cpu/x86/c2_lowering_x86.cpp b/src/hotspot/cpu/x86/c2_lowering_x86.cpp index cf4c014ffda..bc8df186396 100644 --- a/src/hotspot/cpu/x86/c2_lowering_x86.cpp +++ b/src/hotspot/cpu/x86/c2_lowering_x86.cpp @@ -32,6 +32,6 @@ Node* PhaseLowering::lower_node_platform(Node* n) { } bool PhaseLowering::should_lower() { - return false; + return true; } #endif // COMPILER2 ``` ERROR: Build failed for target 'images' in configuration 'linux-x86_64-server-fastdebug' (exit code 2) === Output from failing command(s) repeated here === * For target support_interim-image-jlink__jlink_interim_image_exec: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/jatinbha/sandboxes/jdk-trunk/jdk/src/hotspot/share/opto/node.hpp:960), pid=1961256, tid=1961293 # assert(is_MachReturn()) failed: invalid node class: Con # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.root.jdk) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.root.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x140f939] Matcher::Fixup_Save_On_Entry()+0x279 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/jatinbha/sandboxes/jdk-trunk/jdk/make/core.1961256) # # An error report file with more information is saved as: # /home/jatinbha/sandboxes/jdk-trunk/jdk/make/hs_err_pid1961256.log ... (rest of output omitted) * All command lines available in /home/jatinbha/sandboxes/jdk-trunk/jdk/build/linux-x86_64-server-fastdebug/make-support/failure-logs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2467523805 From duke at openjdk.org Mon Nov 11 08:37:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 08:37:26 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v6] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Merge branch 'master' into JDK-8343148 - Improve brace style - Add set_root_as_ctrl - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter - Add helper methods for zerocon, makecon, and integercon too - 8343148: C2: Refactor uses of "PhaseValues::intcon() + PhaseIdealLoop::set_ctrl()" into separate method ------------- Changes: https://git.openjdk.org/jdk/pull/21836/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=05 Stats: 130 lines in 7 files changed: 44 ins; 42 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From amitkumar at openjdk.org Mon Nov 11 08:44:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:44:51 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java In-Reply-To: References: Message-ID: On Sat, 9 Nov 2024 03:41:56 GMT, Amit Kumar wrote: >>> Thanks for taking care of that. Maybe it would be more robust to only enable the test on x86, aarch64 and riscv64. The whole test doesn't need to excluded actually. Only IR matching on `test2` needs to be disabled. This can be done with: >>> >>> ``` >>> applyIfPlatformOr = {"x64", "true", "aarch64", "true", "riscv64", "true"}) >>> ``` >>> >>> See `TestBoolNodeGVN.java` for instance. >> >> Ok. I've done that. > > @reinrich Sorry for creating mess here. > > Yesterday, this test failed while testing changes for JEP 450 related to compact headers. However, now I checked and head stream testing job shows that it does not fail with `jdk-head`; > > I have verified and It fails only on s390x when I enable UseCompactObjectHeaders: `make test TEST=jtreg:$(find . -name TestCastX2NotProcessedIGVN.java) JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders"`. > > While this issue could potentially occur with other settings, `UseCompactObjectHeaders` is the only one I have observed causing this failure. Do you suggest disabling this, or is separate debugging required to investigate this behaviour?" > It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. I did same yesterday for s390x as well. I have added it in todo list for the internal tracker. Maybe If possible, you can add s390x as affected architecture for JBS issue: [JDK-8343906](https://bugs.openjdk.org/browse/JDK-8343906). I am fine now with integrating it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467547961 From amitkumar at openjdk.org Mon Nov 11 08:50:56 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:50:56 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 15:57:37 GMT, Richard Reingruber wrote: >> Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Remove riscv64 just a minor title update `Positiv` -> `Positive` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467556536 From rrich at openjdk.org Mon Nov 11 08:50:56 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:50:56 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: <3f0OD3ntQxIsA5RmuBT_hifohIFreR9H18htd77NLkA=.f5108f38-9b34-4ae3-b66a-207ac4e91d72@github.com> On Mon, 11 Nov 2024 08:44:50 GMT, Amit Kumar wrote: > just a minor title update `Positiv` -> `Positive` Thanks :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467560664 From rrich at openjdk.org Mon Nov 11 08:58:16 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 08:58:16 GMT Subject: RFR: 8343774: Positiv list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java [v3] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 08:44:50 GMT, Amit Kumar wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove riscv64 > > just a minor title update `Positiv` -> `Positive` > > It's really up to you @offamitkumar. For PPC we have opened an internal bug (actually it should be mirrored by a JBS-issue) to revise the compilation of `test2`. > > I did same yesterday for s390x as well. I have added it in todo list for the internal tracker. Maybe If possible, you can add s390x as affected architecture for JBS issue: [JDK-8343906](https://bugs.openjdk.org/browse/JDK-8343906). I've added s390x. Please feel free to add details if needed. > I am fine now with integrating it. Ok, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21975#issuecomment-2467579388 From amitkumar at openjdk.org Mon Nov 11 08:58:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 08:58:53 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. @RealLucy ? PS: Should we backport it as well ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467577885 From tschatzl at openjdk.org Mon Nov 11 09:11:50 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 09:11:50 GMT Subject: RFR: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 17:46:49 GMT, Vladimir Kozlov wrote: >> Hi all, >> >> please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. >> >> Testing: gha, tier1-3 >> >> Thanks, >> Thomas > > @tschatzl do you know history of these flags and why they are not used? Thanks @vnkozlov @dean-long for your reviews. Thanks for the additional archeology information. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21973#issuecomment-2467603389 From tschatzl at openjdk.org Mon Nov 11 09:11:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 11 Nov 2024 09:11:51 GMT Subject: Integrated: 8343824: Remove unused InstructionFlags in C1 In-Reply-To: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> References: <_4kRSWzc_WTe7oVsSWZZh0n6s3aRBF8sHS2rxVD50jA=.b63db74f-da56-4ecf-aef5-ef8ce8a5a198@github.com> Message-ID: On Fri, 8 Nov 2024 11:09:27 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes unused/unreferenced flags from the `Instruction::InstructionFlag` enum for the C1 compiler. > > Testing: gha, tier1-3 > > Thanks, > Thomas This pull request has now been integrated. Changeset: ae6bb3cd Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/ae6bb3cd29bd4cdbb2df320fbfe0dabb7c0647d7 Stats: 10 lines in 1 file changed: 0 ins; 10 del; 0 mod 8343824: Remove unused InstructionFlags in C1 Reviewed-by: kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/21973 From lucy at openjdk.org Mon Nov 11 09:31:14 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 11 Nov 2024 09:31:14 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: <3ZMlCyS_CutzkhsO4vdfx4GRpWNqUZmapdlIvkJEaAM=.a29c06ea-37c2-42fe-995d-022e82351107@github.com> On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21967#pullrequestreview-2426510952 From lucy at openjdk.org Mon Nov 11 09:31:14 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 11 Nov 2024 09:31:14 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> References: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> Message-ID: <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> On Mon, 11 Nov 2024 08:55:02 GMT, Amit Kumar wrote: > PS: Should we backport it as well ? I'm not so much in favor of backporting all that scanner noise. But I may be alone with my opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467647457 From amitkumar at openjdk.org Mon Nov 11 09:35:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 09:35:53 GMT Subject: RFR: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> References: <1WgN6l7bp21jxLEkNjVEzNIb6M0egdURLq2yyli0xHc=.55f2ea60-5a70-4978-a442-3c5ad724b697@github.com> <8Mv6cUvdNcSdihLLYdH7aHW47mwFMyqP-voEbKGQ1Ro=.d569dc4e-acd9-4ede-bb2d-10abf6500de4@github.com> Message-ID: On Mon, 11 Nov 2024 09:27:50 GMT, Lutz Schmidt wrote: > I'm not so much in favor of backporting all that scanner noise. But I may be alone with my opinion. Sure, Let's skip it then. Thanks for the approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21967#issuecomment-2467655641 From amitkumar at openjdk.org Mon Nov 11 09:35:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 11 Nov 2024 09:35:53 GMT Subject: Integrated: 8343810: [s390x] is_uimm* methods should take unsigned arguments In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 04:47:23 GMT, Amit Kumar wrote: > trivial patch which just updates the argument datatype of `is_uimm*` methods, from `int64_t` to `uint64_t`. This pull request has now been integrated. Changeset: a93bd9df Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/a93bd9dfdd7e340b10c24a15fb70a3801bfb373d Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod 8343810: [s390x] is_uimm* methods should take unsigned arguments Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/21967 From rcastanedalo at openjdk.org Mon Nov 11 10:08:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 10:08:36 GMT Subject: RFR: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 20:53:03 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Hoist changed offset input check > > Good. Thanks for reviewing @vnkozlov and @iwanowww! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21898#issuecomment-2467732067 From rcastanedalo at openjdk.org Mon Nov 11 10:08:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 10:08:38 GMT Subject: Integrated: 8343067: C2: revisit constant-offset AddP chains after successful input idealizations In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:02:16 GMT, Roberto Casta?eda Lozano wrote: > This changeset re-adds a constant-offset AddP node (`u`) to C2's IGVN worklist when its address is given by another AddP node (`use`) whose offset has changed. This makes it possible for `AddPNode::Ideal` to flatten the address computation in cases where the offset of the latter (`use->in(AddPNode::Offset)`) is found to be constant during IGVN: > > ![idealization](https://github.com/user-attachments/assets/6b632642-c037-457f-bd19-6b30f24e6ac6) > > The end result is the generation of fewer explicit address computation instructions. > > #### Testing > > ##### Functionality > > - tier1-5 (linux-x64, windows-x64, macosx-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. This pull request has now been integrated. Changeset: ec13364c Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/ec13364cdab5a52f704bc5d1575f3da17380b4f2 Stats: 73 lines in 3 files changed: 70 ins; 0 del; 3 mod 8343067: C2: revisit constant-offset AddP chains after successful input idealizations Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21898 From duke at openjdk.org Mon Nov 11 10:14:50 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 11 Nov 2024 10:14:50 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v7] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Cover another case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/aaa7cf20..8c51ec99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From tholenstein at openjdk.org Mon Nov 11 11:14:24 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 11:14:24 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v11] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: simplify ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/46024d07..ace2ebfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=09-10 Stats: 51 lines in 1 file changed: 15 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Mon Nov 11 11:56:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 11:56:17 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v7] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:14:50 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Cover another case src/hotspot/share/opto/loopTransform.cpp line 2054: > 2052: Node *newcle = old_new[loop_end->_idx]; > 2053: _igvn.hash_delete(newcle); > 2054: Node *one = intcon(1); While at it, you can also fix the wrong `*` placement (should be at type): Suggestion: Node* one = intcon(1); src/hotspot/share/opto/loopTransform.cpp line 2434: > 2432: } > 2433: if (p_offset != nullptr) { > 2434: Node *zero = zerocon(bt); Suggestion: Node* zero = zerocon(bt); src/hotspot/share/opto/loopTransform.cpp line 2485: > 2483: if (p_offset != nullptr) { > 2484: if (which == 1) { // must negate the extracted offset > 2485: Node *zero = integercon(0, exp_bt); Suggestion: Node* zero = integercon(0, exp_bt); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836340828 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836343303 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836344082 From chagedorn at openjdk.org Mon Nov 11 11:56:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 11:56:17 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: References: Message-ID: <7-rX03iJvNPTk_sjMgAEI4ki96PwMO3jt1YTDuddgkE=.e98d59ff-d768-407a-abb2-ddc2416a3b06@github.com> On Mon, 11 Nov 2024 08:28:02 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/loopopts.cpp line 195: >> >>> 193: set_root_as_ctrl(x); >>> 194: continue; >>> 195: } >> >> This looks like "band-aid" - this should be assert. May be investigate in separate RFE. > > I opened an RFE for this https://bugs.openjdk.org/browse/JDK-8343907 If you modify the following code above to use your new `makecon()` (could be done either way), could this then be turned into an assert? By looking at the code, it suggests that we only miss to set ctrl in the `singleton` case which would then be covered. https://github.com/openjdk/jdk/blob/5ca6698ba418e82ff93471fbb495759850f26f63/src/hotspot/share/opto/loopopts.cpp#L123-L125 You could also only change `makecon()` above and revisit this code later again to remove the `set_root_as_ctrl()` and add an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1836547506 From swen at openjdk.org Mon Nov 11 12:42:24 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 12:42:24 GMT Subject: RFR: 8343925: Test HugeToString.java crashes at java.util.BitSet.toString()Ljava/lang/String Message-ID: 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, so submit this PR to roll back ------------- Commit messages: - Revert "8342650: Move getChars to DecimalDigits" Changes: https://git.openjdk.org/jdk/pull/22012/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22012&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343925 Stats: 757 lines in 12 files changed: 352 ins; 381 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22012.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22012/head:pull/22012 PR: https://git.openjdk.org/jdk/pull/22012 From tholenstein at openjdk.org Mon Nov 11 12:51:12 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 12:51:12 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: make it work in Linux and MacOS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21925/files - new: https://git.openjdk.org/jdk/pull/21925/files/ace2ebfa..99e2ed7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21925&range=10-11 Stats: 36 lines in 1 file changed: 4 ins; 18 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/21925.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21925/head:pull/21925 PR: https://git.openjdk.org/jdk/pull/21925 From chagedorn at openjdk.org Mon Nov 11 12:53:55 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 11 Nov 2024 12:53:55 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Mon, 11 Nov 2024 12:51:12 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make it work in Linux and MacOS Now, after many tries, it seems to work! :-) Thanks for investigating further. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2427218879 From rcastanedalo at openjdk.org Mon Nov 11 13:11:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 11 Nov 2024 13:11:44 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v12] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: <7ZgYZpPUL1d2nabYrCdMkZ_m1L1i71xtDvBb-n2g1M8=.64d14226-756e-49e8-a5d8-5b5cc0d35247@github.com> On Mon, 11 Nov 2024 12:51:12 GMT, Tobias Holenstein wrote: >> color >> >> pick >> >> nodes >> >> Adds new option to IGV to color selected nodes: >> 1) select some nodes >> 2) `Ctrl + C` or `View` -> `Color action` >> 3) pick a color and apply > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make it work in Linux and MacOS Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21925#pullrequestreview-2427253534 From tholenstein at openjdk.org Mon Nov 11 13:28:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 13:28:22 GMT Subject: Integrated: 8343535: IGV: Colorize nodes on demand In-Reply-To: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Wed, 6 Nov 2024 12:19:47 GMT, Tobias Holenstein wrote: > color > > pick > > nodes > > Adds new option to IGV to color selected nodes: > 1) select some nodes > 2) `Ctrl + C` or `View` -> `Color action` > 3) pick a color and apply This pull request has now been integrated. Changeset: f3ba7676 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/f3ba7676043756f7cf95d5215e18bd65e9f167e6 Stats: 231 lines in 7 files changed: 209 ins; 15 del; 7 mod 8343535: IGV: Colorize nodes on demand Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/21925 From tholenstein at openjdk.org Mon Nov 11 13:28:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 11 Nov 2024 13:28:21 GMT Subject: RFR: 8343535: IGV: Colorize nodes on demand [v2] In-Reply-To: References: <3JYpZzOT0FmT4ggXTj3FPphh_aqBcoXgPxneb3315WM=.4d80fdfe-46cd-4d08-a706-1b4c00a9fd7b@github.com> Message-ID: On Fri, 8 Nov 2024 09:02:53 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/DiagramScene.java >> >> Co-authored-by: Andrey Turbanov > > In my opinion, the IGV toolbar is already pretty crowded (this hurts most when opening two graphs side-by-side) and I would prefer not adding the color icon there. On the other hand, we could add the action to the pop-up menu that's opened when right-clicking into a node or set of nodes. Here's my suggestion: > > > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > index e68abd3297e..c4f2ac670e7 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/EditorTopComponent.java > @@ -100,6 +100,7 @@ public EditorTopComponent(DiagramViewModel diagramViewModel) { > }; > > Action[] actionsWithSelection = new Action[]{ > + ColorAction.get(ColorAction.class), > ExtractAction.get(ExtractAction.class), > HideAction.get(HideAction.class), > null, > @@ -168,8 +169,6 @@ public void mouseMoved(MouseEvent e) {} > toolBar.add(ReduceDiffAction.get(ReduceDiffAction.class)); > toolBar.add(ExpandDiffAction.get(ExpandDiffAction.class)); > toolBar.addSeparator(); > - toolBar.add(ColorAction.get(ColorAction.class)); > - toolBar.addSeparator(); > toolBar.add(ExtractAction.get(ExtractAction.class)); > toolBar.add(HideAction.get(HideAction.class)); > toolBar.add(ShowAllAction.get(ShowAllAction.class)); > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > index a51934a4322..92921c81512 100644 > --- a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > +++ b/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ColorAction.java > @@ -43,7 +43,7 @@ > @ActionReference(path = "Shortcuts", name = "D-C") > }) > @Messages({ > - "CTL_ColorAction=Color action", > + "CTL_ColorAction=Color", > "HINT_ColorAction=Color current set of selected nodes" > }) > public final class ColorAction extends ModelAwareAction { > diff --git a/src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/ExtractAction.java b/src/utils/I... thanks for the reviews @robcasloz and @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/21925#issuecomment-2468173021 From swen at openjdk.org Mon Nov 11 13:47:13 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 13:47:13 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. It has been verified that it is caused by unsafe offset overflow. The problem has been reproduced and fixed. I submitted PR #22014. Would you consider fixing it this way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22012#issuecomment-2468215673 From alanb at openjdk.org Mon Nov 11 13:55:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 11 Nov 2024 13:55:11 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. Changes in this area need to be very carefully reviewed and tested. I think continue with the current plan to blackout the original change and seeing wider review and testing for the REDO. Chen is testing the blackout now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22012#issuecomment-2468230634 From jpai at openjdk.org Mon Nov 11 14:10:05 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 11 Nov 2024 14:10:05 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. I have verified that this backout matches a `git revert` of the commit that introduced the change in https://bugs.openjdk.org/browse/JDK-8342650. So on that front, this backout looks OK to me. Alan has noted that Chen is running some tests with this backout. So please wait for that review, before integrating. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427380530 From alanb at openjdk.org Mon Nov 11 14:34:26 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 11 Nov 2024 14:34:26 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. Thanks for the BACKOUT, looks right. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427435024 From liach at openjdk.org Mon Nov 11 14:56:27 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 11 Nov 2024 14:56:27 GMT Subject: RFR: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. CI results look good. ------------- Marked as reviewed by liach (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22012#pullrequestreview-2427501471 From swen at openjdk.org Mon Nov 11 15:17:21 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 11 Nov 2024 15:17:21 GMT Subject: Integrated: 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 12:34:44 GMT, Shaojin Wen wrote: > 8343925 Feedback PR #21593 test/jdk/java/util/BitSet/HugeToString.java crash, > > I can't reproduce the problem on a MacBook M1 Max, but I agree that more testing is needed, so let's roll it back first. This pull request has now been integrated. Changeset: b0a371b0 Author: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/b0a371b0850b8f467ed985ef39a6fce476b62acf Stats: 757 lines in 12 files changed: 352 ins; 381 del; 24 mod 8343925: [BACKOUT] JDK-8342650 Move getChars to DecimalDigits Reviewed-by: jpai, alanb, liach ------------- PR: https://git.openjdk.org/jdk/pull/22012 From rrich at openjdk.org Mon Nov 11 16:38:22 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 11 Nov 2024 16:38:22 GMT Subject: Integrated: 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 11:44:21 GMT, Richard Reingruber wrote: > Vectorization of the loop in `test2` does not work on ppc therefore I want to exclude it there. This pull request has now been integrated. Changeset: 889f9062 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/889f906235e99b7207f2e30e1f6f5771188f5a56 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8343774: Positive list platforms for ir checks of compiler/c2/TestCastX2NotProcessedIGVN.java Reviewed-by: fyang, amitkumar, roland ------------- PR: https://git.openjdk.org/jdk/pull/21975 From mli at openjdk.org Mon Nov 11 21:36:52 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 11 Nov 2024 21:36:52 GMT Subject: RFR: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 18:42:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this simple patch? >> Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > turn more verified extensions as DIAGNOSTIC Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21885#issuecomment-2469056625 From mli at openjdk.org Mon Nov 11 21:36:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 11 Nov 2024 21:36:53 GMT Subject: Integrated: 8343555: RISC-V: make some verified (on hardware) extension options diagnostic In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:33:00 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously this UseZvfh was changed from experimental to product because we have real hardware to verify the feature, but as pointed out by @RealFYang , this one should ber a diagnostic option, as we don't want to expose too many options to users. > Thanks This pull request has now been integrated. Changeset: cbf4dd58 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/cbf4dd588bf371e13e81204b1585d34bfadddb42 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod 8343555: RISC-V: make some verified (on hardware) extension options diagnostic Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/21885 From swen at openjdk.org Tue Nov 12 01:30:29 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 12 Nov 2024 01:30:29 GMT Subject: RFR: 8342650: Move getChars to DecimalDigits Message-ID: This PR is a resubmission after PR #21593 was rolled back, and the unsafe offset overflow issue has been fixed. Move getChars methods of StringLatin1 and StringUTF16 to DecimalDigits to reduce duplication HexDigits and OctalDigits also include getCharsLatin1 and getCharsUTF16 Putting these two methods into DecimalDigits can avoid the need to expose them in JavaLangAccess Eliminate duplicate code in BigDecimal This PR will improve the performance of Integer/Long.toString and StringBuilder.append(int/long) scenarios. This is because Unsafe.putByte is used to eliminate array bounds checks, and of course this elimination is safe. In previous versions, in Integer/Long.toString and StringBuilder.append(int/long) scenarios, -COMPACT_STRING performed better than +COMPACT_STRING. This is because StringUTF16.getChars uses StringUTF16.putChar, which is similar to Unsafe.putChar, and there is no bounds check. ------------- Commit messages: - fix unsafe address overflow - add benchmark - remove comments, from @liach - Merge remote-tracking branch 'upstream/master' into int_get_chars_dedup_202410 - fix Helper - fix Helper - fix Helper - unsafe putByte - remove digitPair - fix import - ... and 4 more: https://git.openjdk.org/jdk/compare/5890d943...cd9ba309 Changes: https://git.openjdk.org/jdk/pull/22023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22023&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342650 Stats: 757 lines in 12 files changed: 381 ins; 352 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22023/head:pull/22023 PR: https://git.openjdk.org/jdk/pull/22023 From dholmes at openjdk.org Tue Nov 12 01:51:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Nov 2024 01:51:15 GMT Subject: RFR: 8342650: Move getChars to DecimalDigits In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 01:25:16 GMT, Shaojin Wen wrote: > This PR is a resubmission after PR #21593 was rolled back, and the unsafe offset overflow issue has been fixed. > > Move getChars methods of StringLatin1 and StringUTF16 to DecimalDigits to reduce duplication > > HexDigits and OctalDigits also include getCharsLatin1 and getCharsUTF16 > > Putting these two methods into DecimalDigits can avoid the need to expose them in JavaLangAccess > Eliminate duplicate code in BigDecimal > > This PR will improve the performance of Integer/Long.toString and StringBuilder.append(int/long) scenarios. This is because Unsafe.putByte is used to eliminate array bounds checks, and of course this elimination is safe. > > In previous versions, in Integer/Long.toString and StringBuilder.append(int/long) scenarios, -COMPACT_STRING performed better than +COMPACT_STRING. This is because StringUTF16.getChars uses StringUTF16.putChar, which is similar to Unsafe.putChar, and there is no bounds check. @wenshao you need a new JBS issue to complete this work under. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22023#issuecomment-2469426151 From fyang at openjdk.org Tue Nov 12 03:42:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 03:42:08 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node Message-ID: Hi, please review this small change. Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders (Tagging: @Hamlin-Li) ------------- Commit messages: - 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node Changes: https://git.openjdk.org/jdk/pull/22025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343964 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22025/head:pull/22025 PR: https://git.openjdk.org/jdk/pull/22025 From fyang at openjdk.org Tue Nov 12 06:44:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 06:44:26 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: > Hello, please review this trivial change. > > The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): > > > $ java -Xlog:stubs -XX:-UseRVC -version > [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 > [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 > [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 > [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Add more space for hardware platforms with vector extension ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21966/files - new: https://git.openjdk.org/jdk/pull/21966/files/be8bff6d..b24ce03d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21966&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21966/head:pull/21966 PR: https://git.openjdk.org/jdk/pull/21966 From dlunden at openjdk.org Tue Nov 12 07:03:04 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Looks great, and I can confirm the new phases are very useful! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2468592185 From rcastanedalo at openjdk.org Tue Nov 12 07:03:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps Message-ID: This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: - Initial liveness: after initial liveness information is computed. - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. - Initial spilling: after initial round of spilling derived from physical interference graph construction. - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). - Iterative spilling: after each round of spilling. - After iterative spilling: after the main register allocation loop. - Post-allocation copy removal: after peephole copy removal. - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). #### Testing - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). ------------- Commit messages: - Fix IR framework definitions - Dump graph at intermediate register allocation points Changes: https://git.openjdk.org/jdk/pull/22017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343941 Stats: 46 lines in 3 files changed: 46 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22017/head:pull/22017 PR: https://git.openjdk.org/jdk/pull/22017 From rcastanedalo at openjdk.org Tue Nov 12 07:03:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 07:03:04 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 16:37:04 GMT, Daniel Lund?n wrote: > Looks great, and I can confirm the new phases are very useful! Thanks Daniel, feel free to review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2469743531 From dlunden at openjdk.org Tue Nov 12 08:17:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 08:17:27 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: <7erkwjsUNJJR5xrK2_DapO59QrIUTzn_wrqm-8Jo4EQ=.ebbbf82d-722f-459c-bbbc-871a4151e7f8@github.com> On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2428842886 From chagedorn at openjdk.org Tue Nov 12 08:29:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 08:29:54 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Looks good! Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? src/hotspot/share/opto/phasetype.hpp line 104: > 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ > 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ > 104: flags(FIXUP_SPILLS, "Fix up spills") \ Should we split at the word boundary? Suggestion: flags(FIX_UP_SPILLS, "Fix up spills") \ ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2428867541 PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1837667475 From thartmann at openjdk.org Tue Nov 12 09:28:42 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 09:28:42 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 06:57:37 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix after merge Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21969#pullrequestreview-2429031032 From mli at openjdk.org Tue Nov 12 09:30:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:30:14 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:55:48 GMT, Fei Yang wrote: > Hi, please review this small change. > > Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: > > > 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 > 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders > > > (Tagging: @Hamlin-Li) Looks good to me. Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22025#pullrequestreview-2429039168 From mli at openjdk.org Tue Nov 12 09:35:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:35:39 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:28:33 GMT, Hamlin Li wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more space for hardware platforms with vector extension > > src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: > >> 40: _initial_stubs_code_size = 10000, >> 41: _continuation_stubs_code_size = 2000, >> 42: _compiler_stubs_code_size = 45000, > > Hey, why do we remove the `ZGC_ONLY` here? Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837766886 From mli at openjdk.org Tue Nov 12 09:35:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 09:35:38 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 06:44:26 GMT, Fei Yang wrote: >> Hello, please review this trivial change. >> >> The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): >> >> >> $ java -Xlog:stubs -XX:-UseRVC -version >> [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 >> [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 >> [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 >> [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 >> >> >> (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add more space for hardware platforms with vector extension Thanks for catching and fix. Just one minor comment. src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: > 40: _initial_stubs_code_size = 10000, > 41: _continuation_stubs_code_size = 2000, > 42: _compiler_stubs_code_size = 45000, Hey, why do we remove the `ZGC_ONLY` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/21966#pullrequestreview-2429041244 PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837761273 From fyang at openjdk.org Tue Nov 12 09:47:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 09:47:22 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:31:47 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubRoutines_riscv.hpp line 42: >> >>> 40: _initial_stubs_code_size = 10000, >>> 41: _continuation_stubs_code_size = 2000, >>> 42: _compiler_stubs_code_size = 45000, >> >> Hey, why do we remove the `ZGC_ONLY` here? > > Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? Yeah, I removed the `ZGC_ONLY` check as I think it doesn't seem necessary here. I simply did two jdk builds with and without the ZGC feature configured and compared the used compiler stubs from the log output. I witnessed no difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837785528 From chagedorn at openjdk.org Tue Nov 12 10:12:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:12:04 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation Message-ID: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 Details about how this endless widening is happening are provided as comments in the test case. Thanks, Christian ------------- Commit messages: - 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless compilation Changes: https://git.openjdk.org/jdk/pull/22033/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22033&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343944 Stats: 81 lines in 2 files changed: 80 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22033.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22033/head:pull/22033 PR: https://git.openjdk.org/jdk/pull/22033 From mli at openjdk.org Tue Nov 12 10:14:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 10:14:07 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 06:44:26 GMT, Fei Yang wrote: >> Hello, please review this trivial change. >> >> The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): >> >> >> $ java -Xlog:stubs -XX:-UseRVC -version >> [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 >> [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 >> [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 >> [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 >> >> >> (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Add more space for hardware platforms with vector extension Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21966#pullrequestreview-2429149280 From mli at openjdk.org Tue Nov 12 10:14:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 12 Nov 2024 10:14:07 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Tue, 12 Nov 2024 09:43:34 GMT, Fei Yang wrote: >> Seems to me it could trigger the similar issue unexpectedly, because for now G1 is still the default one, developers could only test default one before push their code? > > Yeah, I removed the `ZGC_ONLY` check as I think it doesn't seem necessary here. I simply did two jdk builds with and without the ZGC feature configured and compared the used compiler stubs from the log output. I witnessed no difference. OK, I think for now it's safe, I only found below code in the stub generator related to UseZGC, and it's for final stubs: // The size of copy32_loop body increases significantly with ZGC GC barriers. // Need conditional far branches to reach a point beyond the loop in this case. bool is_far = UseZGC; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21966#discussion_r1837827471 From chagedorn at openjdk.org Tue Nov 12 10:14:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:14:11 GMT Subject: RFR: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling [v4] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 06:57:37 GMT, Christian Hagedorn wrote: >> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) >> >> This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. >> >> In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. >> >> To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix after merge Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21969#issuecomment-2470119152 From chagedorn at openjdk.org Tue Nov 12 10:14:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:14:11 GMT Subject: Integrated: 8343745: Only update Last Value Assertion Predicates in Loop Unrolling In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 07:12:12 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/21944 which is fully reviewed but I'd prefer to integrate it on Monday and run again some testing over the weekend) > > This is a follow-up feature to https://github.com/openjdk/jdk/pull/21944 which updated Loop Unrolling to use a new predicate visitor and enables this patch. > > In Loop Unrolling, we only update the stride and not the init value of a loop. Thus, we actually only require to update the Last Value Assertion Predicates because the Init Value Assertion Predicates do not use `OpaqueLoopStride`. So, we also would not be required to kill the old Init Value Initialized Assertion Predicates. This patch implements that improvement. > > To make this work, we need to query the associated `AssertionPredicateType` of an Assertion Predicate which is stored in the `If/RangeCheckNode` (I guess it's okay to have this additional node field in product). This was guarded by `NOT_PRODUCT` before. For this patch, I make this information available in product builds and use it to implement this feature. > > Thanks, > Christian This pull request has now been integrated. Changeset: 3727f404 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/3727f4046188bb623f9efec6fa149f767a9ffa30 Stats: 101 lines in 7 files changed: 16 ins; 13 del; 72 mod 8343745: Only update Last Value Assertion Predicates in Loop Unrolling Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21969 From thartmann at openjdk.org Tue Nov 12 10:26:56 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:26:56 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Message-ID: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). Thanks, Tobias ------------- Commit messages: - 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Changes: https://git.openjdk.org/jdk/pull/22034/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22034&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344018 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22034/head:pull/22034 PR: https://git.openjdk.org/jdk/pull/22034 From thartmann at openjdk.org Tue Nov 12 10:34:00 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:34:00 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Good catch! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429198930 From roland at openjdk.org Tue Nov 12 10:34:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 12 Nov 2024 10:34:30 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Looks good and trivial to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22034#pullrequestreview-2429203557 From thartmann at openjdk.org Tue Nov 12 10:48:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 10:48:20 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Thanks Roland. I'll integrate this when testing finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22034#issuecomment-2470194485 From chagedorn at openjdk.org Tue Nov 12 10:52:55 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 10:52:55 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Thanks Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22033#issuecomment-2470204011 From chagedorn at openjdk.org Tue Nov 12 11:02:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 11:02:30 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: <8GUS7Y3V5IZdeV1qK3y6C25fgIl4gBpUobiGw1KPo34=.948bd499-1b45-46ed-a04d-47184d6928ca@github.com> On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22034#pullrequestreview-2429268446 From rcastanedalo at openjdk.org Tue Nov 12 11:55:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 11:55:09 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Split FIXUP_SPILLS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22017/files - new: https://git.openjdk.org/jdk/pull/22017/files/90f9a24e..e44fa796 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=00-01 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22017/head:pull/22017 PR: https://git.openjdk.org/jdk/pull/22017 From rcastanedalo at openjdk.org Tue Nov 12 11:55:09 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 11:55:09 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 08:26:26 GMT, Christian Hagedorn wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Split FIXUP_SPILLS > > src/hotspot/share/opto/phasetype.hpp line 104: > >> 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ >> 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ >> 104: flags(FIXUP_SPILLS, "Fix up spills") \ > > Should we split at the word boundary? > Suggestion: > > flags(FIX_UP_SPILLS, "Fix up spills") \ Thanks, done in commit e44fa796. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1837972013 From rcastanedalo at openjdk.org Tue Nov 12 12:02:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:02:42 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 08:27:45 GMT, Christian Hagedorn wrote: > Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470345440 From rcastanedalo at openjdk.org Tue Nov 12 12:13:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:13:22 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429423589 From chagedorn at openjdk.org Tue Nov 12 12:18:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:18:39 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429432836 From chagedorn at openjdk.org Tue Nov 12 12:18:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:18:40 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 11:59:28 GMT, Roberto Casta?eda Lozano wrote: > > Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? > > I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). I see, that does not seem to be straight forward. I guess then it's okay to omit these descriptions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470376497 From chagedorn at openjdk.org Tue Nov 12 12:26:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 12:26:42 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22033#issuecomment-2470392404 From rcastanedalo at openjdk.org Tue Nov 12 12:30:20 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 12:30:20 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Thanks Daniel and Christian for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470401879 From galder at openjdk.org Tue Nov 12 12:36:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:39 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> Message-ID: On Tue, 12 Nov 2024 12:31:52 GMT, Galder Zamarre?o wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Added copyright and @bug identifiers > > macos-aarch64 CI failed with, is this transitory or something needs fixing? > > > xcode-select: error: invalid developer directory '/Applications/Xcode_14.3.1.app/Contents/Developer' > @galderz, I'd appreciate it if you can add `Copyright (c) 2024 JetBrains s.r.o.. All rights reserved.` to the header. Thanks! Just pushed a commit to add that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2470412553 From galder at openjdk.org Tue Nov 12 12:36:38 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:38 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v3] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Added Jetbrains copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/1bf6992c..9d9909f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Tue Nov 12 12:36:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 12 Nov 2024 12:36:39 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> Message-ID: On Thu, 7 Nov 2024 10:50:19 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Added copyright and @bug identifiers macos-aarch64 CI failed with, is this transitory or something needs fixing? xcode-select: error: invalid developer directory '/Applications/Xcode_14.3.1.app/Contents/Developer' ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2470411444 From thartmann at openjdk.org Tue Nov 12 12:45:16 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 12:45:16 GMT Subject: RFR: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias Thanks for the review Christian. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22034#issuecomment-2470431373 From thartmann at openjdk.org Tue Nov 12 12:45:16 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 12 Nov 2024 12:45:16 GMT Subject: Integrated: 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes In-Reply-To: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> References: <6fpouEC9uo00iQIEwYKDvcftT6IIrwGQZBssvZYe95w=.2ed5adc7-74e1-4c96-96ae-31b3c710e31c@github.com> Message-ID: On Tue, 12 Nov 2024 10:22:16 GMT, Tobias Hartmann wrote: > [JDK-8342612](https://bugs.openjdk.org/browse/JDK-8342612) increased the memory limit as a workaround for [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) which was found to be a separate issue ([JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). Let's remove the setting now that [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038) got resolved by [JDK-8340824](https://bugs.openjdk.org/browse/JDK-8340824). > > Thanks, > Tobias This pull request has now been integrated. Changeset: 67d1ef14 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/67d1ef14798be5dbd083ba23b9e3ae8e80f72728 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8344018: Remove unlimited memory setting from TestScalarReplacementMaxLiveNodes Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22034 From rcastanedalo at openjdk.org Tue Nov 12 13:37:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 13:37:16 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty Message-ID: This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-8337660) ). #### Testing - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) ------------- Commit messages: - Take into account BoxLock nodes when determining if a block is empty Changes: https://git.openjdk.org/jdk/pull/22038/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22038&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337660 Stats: 93 lines in 2 files changed: 88 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22038.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22038/head:pull/22038 PR: https://git.openjdk.org/jdk/pull/22038 From dfenacci at openjdk.org Tue Nov 12 13:45:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 13:45:25 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Very very cool! @robcasloz do you think it could make sense to add a few IR tests just to make sure that the new steps are actually dumped? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2470555444 From dfenacci at openjdk.org Tue Nov 12 13:45:25 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 13:45:25 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 11:51:31 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/phasetype.hpp line 104: >> >>> 102: flags(POST_ALLOCATION_COPY_REMOVAL, "Post-allocation copy removal") \ >>> 103: flags(MERGE_MULTIDEFS, "Merge multiple definitions") \ >>> 104: flags(FIXUP_SPILLS, "Fix up spills") \ >> >> Should we split at the word boundary? >> Suggestion: >> >> flags(FIX_UP_SPILLS, "Fix up spills") \ > > Thanks, done in commit e44fa796. To be consistent I guess the same could be done for `MERGE_MULTIDEFS` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1838122312 From stuefe at openjdk.org Tue Nov 12 13:50:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 12 Nov 2024 13:50:15 GMT Subject: RFR: 8344014: Simplify TracePhase constructor Message-ID: As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. ------------- Commit messages: - fixes - Rework TracePhase construction Changes: https://git.openjdk.org/jdk/pull/22029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344014 Stats: 199 lines in 18 files changed: 75 ins; 53 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/22029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22029/head:pull/22029 PR: https://git.openjdk.org/jdk/pull/22029 From stuefe at openjdk.org Tue Nov 12 13:50:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 12 Nov 2024 13:50:15 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 07:56:24 GMT, Thomas Stuefe wrote: > As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. > > Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). > > There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. > > The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. > > Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. Mac OS error unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22029#issuecomment-2470568530 From qamai at openjdk.org Tue Nov 12 14:01:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 12 Nov 2024 14:01:32 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: <6Fw6s8C3ovd8wuJEqp0CmvcjyUg_Ar-avXL_uVTyog4=.3aadfc7e-b92c-44f9-9ecf-cc3572ecf185@github.com> On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) May I ask what's wrong with making `BoxLock` a subclass of `MachNode`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2470602453 From fyang at openjdk.org Tue Nov 12 15:31:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 15:31:11 GMT Subject: RFR: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions [v2] In-Reply-To: References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: <1yr1z8KFY3b6KRAjF3cwNUUaT368zoxCWi-oU63_pYY=.18d15c0c-fe71-466a-991c-281c8ac1418e@github.com> On Tue, 12 Nov 2024 10:11:09 GMT, Hamlin Li wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more space for hardware platforms with vector extension > > Looks good, Thanks! @Hamlin-Li : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21966#issuecomment-2470831692 From fyang at openjdk.org Tue Nov 12 15:31:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 12 Nov 2024 15:31:11 GMT Subject: Integrated: 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions In-Reply-To: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> References: <0MGF-2vv_88fCBAwbnHUgug-nizGHUDJNCZHKOwGjSU=.d9b3fa18-7f5f-4231-ab5f-26504becf025@github.com> Message-ID: On Fri, 8 Nov 2024 01:54:55 GMT, Fei Yang wrote: > Hello, please review this trivial change. > > The reason of the crash is that we will use more space for compiler stubs during stubRoutines generation when compressed instructions is disabled expecially when the CPU is not equipped with the RISC-V B extension. So this simply increases the reserved size of compiler stubs for this CPU platform. After this change, we have (without B extension): > > > $ java -Xlog:stubs -XX:-UseRVC -version > [0.010s][info][stubs] StubRoutines (initial stubs) [0x0000003f8f3cf340, 0x0000003f8f3d1cd0] used: 604, free: 10036 > [0.117s][info][stubs] StubRoutines (continuation stubs) [0x0000003f8f3d25c0, 0x0000003f8f3d3010] used: 628, free: 2012 > [0.153s][info][stubs] StubRoutines (final stubs) [0x0000003f8f4025c0, 0x0000003f8f409d70] used: 9380, free: 21260 > [0.199s][info][stubs] StubRoutines (compiler stubs) [0x0000003f8f4d7c40, 0x0000003f8f4e3180] used: 38924, free: 7476 > > > (PS: Same issue also triggers when building without ZGC (`--disable-jvm-feature-zgc`)) This pull request has now been integrated. Changeset: 2989d873 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/2989d8734c70e1db87d2a708719fd2d966903a93 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8343805: RISC-V: JVM crashes on startup when disabling compressed instructions Reviewed-by: mli ------------- PR: https://git.openjdk.org/jdk/pull/21966 From chagedorn at openjdk.org Tue Nov 12 15:33:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 12 Nov 2024 15:33:53 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps Message-ID: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. ### Goal of Assertion Predicates #### Initialized Assertion Predicates These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. #### Template Assertion Predicates Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). ### Why Did we Use UCTs for Template Assertion Predicates? When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? #### Missing UCTs for Predicates above Loops Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. #### Missing UCTs to Create Template Assertion Predicates Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. There is already some special logic for a main loop, where we create Template Assertion Predicates with a Halt node because there is no UCT available for the main loop. But this logic and implementation it is not easily reusable and we would need to keep supporting both formats with UCTs and halt nodes. ### Solution: Assertion Predicates with Halt Nodes only As a simple solution to the problems described above, I propose to get rid of UCTs completely. This not only enables us to fix the remaining unresolved bugs where Assertion Predicates are missing but also simplifies the logic and the IR itself. I've added some comments in the PR to better explain the refactoring steps. Thanks, Christian ------------- Commit messages: - 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps Changes: https://git.openjdk.org/jdk/pull/22040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342047 Stats: 205 lines in 6 files changed: 30 ins; 77 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/22040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22040/head:pull/22040 PR: https://git.openjdk.org/jdk/pull/22040 From duke at openjdk.org Tue Nov 12 15:36:49 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Tue, 12 Nov 2024 15:36:49 GMT Subject: RFR: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/22033#pullrequestreview-2429948361 From duke at openjdk.org Tue Nov 12 15:37:50 2024 From: duke at openjdk.org (Abdelhak Zaaim) Date: Tue, 12 Nov 2024 15:37:50 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by abdelhak-zaaim at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429956837 From dlunden at openjdk.org Tue Nov 12 15:46:30 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 12 Nov 2024 15:46:30 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> Message-ID: On Tue, 12 Nov 2024 11:55:09 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split FIXUP_SPILLS Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2429976564 From swen at openjdk.org Tue Nov 12 16:49:02 2024 From: swen at openjdk.org (Shaojin Wen) Date: Tue, 12 Nov 2024 16:49:02 GMT Subject: RFR: 8343629: More MergeStore benchmark [v3] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: Update test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21659/files - new: https://git.openjdk.org/jdk/pull/21659/files/c6f05f20..4293ced9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From rcastanedalo at openjdk.org Tue Nov 12 17:07:33 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 17:07:33 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Split MERGE_MULTIDEFS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22017/files - new: https://git.openjdk.org/jdk/pull/22017/files/e44fa796..fb35674f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22017&range=01-02 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22017/head:pull/22017 PR: https://git.openjdk.org/jdk/pull/22017 From sviswanathan at openjdk.org Tue Nov 12 17:11:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 12 Nov 2024 17:11:02 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <1SFuaJR81N43qYbZw30JIV-XkAYjglS8Cecr3oUg0os=.9b39cd20-9fd1-43db-8011-3b6d90bdf371@github.com> On Sun, 10 Nov 2024 07:36:55 GMT, Jatin Bhateja wrote: >> Yes, this should ensure 0xFFFFFFFF. > > We land here only after checking if inputs are uints, didn't want redundant match, its just a convince routine for forwarding inputs. I will create a lambda for this. uint check only ensures value <= 0xFFFFFFFF. Here we need value to be 0xFFFFFFFF. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1838479496 From rcastanedalo at openjdk.org Tue Nov 12 17:20:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 17:20:25 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 13:41:35 GMT, Damon Fenacci wrote: >> Thanks, done in commit e44fa796. > > To be consistent I guess the same could be done for `MERGE_MULTIDEFS` Thanks Damon, done in commit fb35674f. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22017#discussion_r1838492084 From epeter at openjdk.org Tue Nov 12 17:21:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 12 Nov 2024 17:21:35 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v2] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <14vSVV6FCG5GxZjE2heaPLpQZGTCB9xWB075R_bz_SA=.e3f8bcdd-5060-4a3c-9918-cd515e23feea@github.com> Message-ID: On Tue, 12 Nov 2024 12:32:24 GMT, Galder Zamarre?o wrote: >> macos-aarch64 CI failed with, is this transitory or something needs fixing? >> >> >> xcode-select: error: invalid developer directory '/Applications/Xcode_14.3.1.app/Contents/Developer' > >> @galderz, I'd appreciate it if you can add `Copyright (c) 2024 JetBrains s.r.o.. All rights reserved.` to the header. Thanks! > > Just pushed a commit to add that. @galderz You have a warning for title mismatch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2471132110 From rcastanedalo at openjdk.org Tue Nov 12 17:25:31 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 12 Nov 2024 17:25:31 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> Message-ID: On Tue, 12 Nov 2024 13:38:58 GMT, Damon Fenacci wrote: > Very very cool! Thanks! > @robcasloz do you think it could make sense to add a few IR tests just to make sure that the new steps are actually dumped? I think that would be a good idea, but since we do not have any such test yet, I propose to address that in a separate RFE, starting from the most important phase dumps (i.e. in increasing graph dump level). Does that make sense @dafedafe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2471139621 From dfenacci at openjdk.org Tue Nov 12 18:03:19 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 18:03:19 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v2] In-Reply-To: <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> References: <61U1yfefGGXjAMffpUIz5nGvsmWUxoAGoOkY8hN1mHs=.a7ae0994-4c0d-42c1-afb5-f719630ddb61@github.com> <6hPXv28ApdHBkkBnRwvT-qs1d6a0Jadm7iip5anKPU0=.e669a111-42b9-43e8-b470-fcff12bc2ce8@github.com> Message-ID: On Tue, 12 Nov 2024 13:38:58 GMT, Damon Fenacci wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Split FIXUP_SPILLS > > Very very cool! > @robcasloz do you think it could make sense to add a few IR tests just to make sure that the new steps are actually dumped? > I think that would be a good idea, but since we do not have any such test yet, I propose to address that in a separate RFE, starting from the most important phase dumps (i.e. in increasing graph dump level). Does that make sense @dafedafe? Totally! Thanks @robcasloz. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2471217614 From dfenacci at openjdk.org Tue Nov 12 18:10:42 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 12 Nov 2024 18:10:42 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 17:07:33 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split MERGE_MULTIDEFS Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2430350117 From dlong at openjdk.org Tue Nov 12 20:59:30 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 12 Nov 2024 20:59:30 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 07:56:24 GMT, Thomas Stuefe wrote: > As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. > > Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). > > There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. > > The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. > > Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. Marked as reviewed by dlong (Reviewer). This looks OK, but doesn't seem strictly necessary for JDK-8344009. We could get the PhaseTraceId from `accumulator - &Phase::timers[0]`. ------------- PR Review: https://git.openjdk.org/jdk/pull/22029#pullrequestreview-2430676045 PR Comment: https://git.openjdk.org/jdk/pull/22029#issuecomment-2471563850 From vlivanov at openjdk.org Tue Nov 12 21:53:09 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 12 Nov 2024 21:53:09 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 10 Nov 2024 07:40:30 GMT, Jatin Bhateja wrote: >> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. >> >> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. > >> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. >> >> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. > > Hi @iwanowww , > Problem occurs only if AndV gets shared; in such a case, matcher will not be able to identify the constrained multiplication pattern and absorb the masking pattern. Specialized IR overrules such limitations and shields the pattern from downstream optimization passes, thereby removing any non-determinism. In addition, it facilitates forwarding inputs to the multiplier, the new IR is explicit in its semantics of considering only lower doublewords of quadword lanes for multiplication, hence we can safely save emitting redundant input masking instructions. We already have specialized IR nodes like MulAddVS2VINode and I see these new IR nodes similar to it. @jatin-bhateja in case when `AndV` is shared, it can't be eliminated unless all users absorb it. For such cases, matcher can perform adhoc node cloning, but in this particular case it looks like an overkill either way. IMO the pattern is too niche to focus on it (either to justify input forwarding or adhoc handling on matcher side). It's good you mentioned `MulAddVS2VI`. On one hand, VNNI operations are more complex (similar to FMA), so such complexity *may* be justified there. On the other hand, it doesn't look like VNNI support in C2 age well. It is tied to auto-vectorizer and, by now, Vector API doesn't benefit from it. So, instead of doubling down on `MulAddVS2VI` path, I'd prefer to leave it aside and reimplement it later in a more maintainable manner. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2471654154 From fyang at openjdk.org Wed Nov 13 01:42:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Nov 2024 01:42:12 GMT Subject: RFR: 8344074: RISC-V: More accurate _exception_handler_size and _deopt_handler_size Message-ID: Hi, please review this small change. I find that the reserved size for these two handlers are not accurate and are larger than needed. For _exception_handler_size, the used size is only 20 bytes for release build and about 60 bytes for debug build. Considering that exception_handler is not trivial, I reserved a little bit more than needed for release build. For _deopt_handler_size, `far_jump` will always emit two instructions. Testing on linux-riscv64: - [x] tier1 (release) - [x] hotspot:tier1 (fastdebug) ------------- Commit messages: - 8344074: RISC-V: More accurate _exception_handler_size and _deopt_handler_size Changes: https://git.openjdk.org/jdk/pull/22053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344074 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22053/head:pull/22053 PR: https://git.openjdk.org/jdk/pull/22053 From jbhateja at openjdk.org Wed Nov 13 02:43:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Nov 2024 02:43:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v4] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains seven commits: - Removing target specific hooks - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Review resoultions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code - Review resolutions - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/eba586b5..43320063 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=02-03 Stats: 70 lines in 7 files changed: 3 ins; 58 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Wed Nov 13 02:43:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 13 Nov 2024 02:43:12 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 12 Nov 2024 21:49:22 GMT, Vladimir Ivanov wrote: >>> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. >>> >>> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. >> >> Hi @iwanowww , >> Problem occurs only if AndV gets shared; in such a case, matcher will not be able to identify the constrained multiplication pattern and absorb the masking pattern. Specialized IR overrules such limitations and shields the pattern from downstream optimization passes, thereby removing any non-determinism. In addition, it facilitates forwarding inputs to the multiplier, the new IR is explicit in its semantics of considering only lower doublewords of quadword lanes for multiplication, hence we can safely save emitting redundant input masking instructions. We already have specialized IR nodes like MulAddVS2VINode and I see these new IR nodes similar to it. > > @jatin-bhateja in case when `AndV` is shared, it can't be eliminated unless all users absorb it. For such cases, matcher can perform adhoc node cloning, but in this particular case it looks like an overkill either way. IMO the pattern is too niche to focus on it (either to justify input forwarding or adhoc handling on matcher side). > > It's good you mentioned `MulAddVS2VI`. On one hand, VNNI operations are more complex (similar to FMA), so such complexity *may* be justified there. On the other hand, it doesn't look like VNNI support in C2 age well. It is tied to auto-vectorizer and, by now, Vector API doesn't benefit from it. So, instead of doubling down on `MulAddVS2VI` path, I'd prefer to leave it aside and reimplement it later in a more maintainable manner. Thanks @iwanowww , Patch in its current form addresses several common patterns, and focusing on optimizing one niche case looks like an overkill, given that we are strength reducing 15 cycle multiplier with a lighter 5-cycle version is itself sufficient to offset redundant input masking instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2472244154 From fjiang at openjdk.org Wed Nov 13 02:51:54 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 13 Nov 2024 02:51:54 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node In-Reply-To: References: Message-ID: <6NPHrbJQOdeNYGTcfFB91Nh2ivJrnOExVN3NVWxKdkM=.dcb1c4c1-136f-4334-9ad2-6dcb08d44fba@github.com> On Tue, 12 Nov 2024 02:55:48 GMT, Fei Yang wrote: > Hi, please review this small change. > > Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: > > > 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 > 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders > > > (Tagging: @Hamlin-Li) LGTM, thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/22025#pullrequestreview-2431651715 From fyang at openjdk.org Wed Nov 13 02:59:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Nov 2024 02:59:27 GMT Subject: RFR: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 09:27:44 GMT, Hamlin Li wrote: >> Hi, please review this small change. >> >> Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: >> >> >> 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 >> 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders >> >> >> (Tagging: @Hamlin-Li) > > Looks good to me. Thanks! @Hamlin-Li @feilongjiang : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22025#issuecomment-2472265275 From fyang at openjdk.org Wed Nov 13 02:59:28 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Nov 2024 02:59:28 GMT Subject: Integrated: 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:55:48 GMT, Fei Yang wrote: > Hi, please review this small change. > > Currently, we print a simple `lwu` for this node, which is not accurate becasue we do a `ld` and logic shift right the loaded 64-bit value for this node. This simply changed it into `load_narrow_klass_compact` like other CPU platforms. After this change, we have: > > > 070 B2: # out( B8 B3 ) <- in( B1 ) Freq: 0.9 > 070 + load_narrow_klass_compact R28, [R12, #4] # compressed class ptr, #@loadNKlassCompactHeaders > > > (Tagging: @Hamlin-Li) This pull request has now been integrated. Changeset: c78de7bf Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/c78de7bf5fc5a4da50c6c64e181abf02a5b12630 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8343964: RISC-V: Improve PrintOptoAssembly output for loadNKlassCompactHeaders node Reviewed-by: mli, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/22025 From dlong at openjdk.org Wed Nov 13 06:22:32 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Nov 2024 06:22:32 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 10:19:51 GMT, theoweidmannoracle wrote: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. I've been trying to understand all these print_inlining_*() functions for a few days now, and I still don't understand the rules about when each can be called, when we should overwrite and when we should append, when the stringStream should be empty or not empty, and how _print_inlining_list works. Then there is the parallel InlineTree that we build, and it has a success/fail message attached too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2472546054 From duke at openjdk.org Wed Nov 13 08:04:11 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 08:04:11 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 06:19:09 GMT, Dean Long wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > I've been trying to understand all these print_inlining_*() functions for a few days now, and I still don't understand the rules about when each can be called, when we should overwrite and when we should append, when the stringStream should be empty or not empty, and how _print_inlining_list works. Then there is the parallel InlineTree that we build, and it has a success/fail message attached too. @dean-long I felt exactly the same way when I started to work on this. The way this works is indeed a bit obscure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2472738935 From rcastanedalo at openjdk.org Wed Nov 13 08:42:10 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 13 Nov 2024 08:42:10 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:14:37 GMT, Christian Hagedorn wrote: >>> Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? >> >> I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). > >> > Just an idea, since you've provided a nice description for each phase in the PR description, should we add these in phasetype.hpp at the phases? >> >> I tried this out but could not find a good way to interleave code comments and `flags` entries (only using multi-line comments with additional backslashes, which looks too convoluted in my opinion). > > I see, that does not seem to be straight forward. I guess then it's okay to omit these descriptions. Thanks Daniel, Christian, and Damon for reviewing! @chhagedorn may I get a re-approval? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2472841322 From chagedorn at openjdk.org Wed Nov 13 08:42:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 08:42:43 GMT Subject: Integrated: 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation In-Reply-To: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> References: <0RLXKFCTyWLfTDOcGzEd8PdDIzXMhD2VvFOSFNWUOM4=.70b9d267-127e-44bf-9382-9c4856ee5edf@github.com> Message-ID: On Tue, 12 Nov 2024 10:07:24 GMT, Christian Hagedorn wrote: > In `MinLNode::add_ring()`, we wrongly take the minimum of the `_widen`of both input types instead of the maximum which leads to an endless widening in CCP without reaching a fixed point with the test case. We eventually hit the memlimit because we keep creating new types endlessly. > > The fix is straight forward to use `MAX2()` instead of `MIN2()` as we are already doing for `MinINode::add_ring()`: > https://github.com/openjdk/jdk/blob/b53ee053f7f7ffcf02ff47e1895ce7be4bc32486/src/hotspot/share/opto/addnode.cpp#L1437-L1443 > > Details about how this endless widening is happening are provided as comments in the test case. > > Thanks, > Christian This pull request has now been integrated. Changeset: 2eeaa57b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/2eeaa57b19780723ad7c74b1a62dea491241b686 Stats: 81 lines in 2 files changed: 80 ins; 0 del; 1 mod 8343944: C2: MinLNode::add_ring() computes _widen wrongly leading to an endless widening/compilation Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22033 From roland at openjdk.org Wed Nov 13 08:46:19 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 13 Nov 2024 08:46:19 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 06:19:09 GMT, Dean Long wrote: > I've been trying to understand all these print_inlining_*() functions for a few days now, and I still don't understand the rules about when each can be called, when we should overwrite and when we should append, when the stringStream should be empty or not empty, and how _print_inlining_list works. Then there is the parallel InlineTree that we build, and it has a success/fail message attached too. It is indeed a mess. The way this work, I think, is that the message for the inlining that's currently happening is accumulated in `_print_inlining_stream`. `_print_inlining_list` is the list of inlining messages. A single entry of `_print_inlining_list` may contain the aggregated messages for multiple call sites. So once we are done, we simply iterate over the list and output each entry. If there's no late inlining involved, then `_print_inlining_list` only has a single entry. When a call site is a candidate for late inlining (i.e. there is a chance that some messages need to be inserted at the current point at a later time), then a new element is added to `_print_inlining_list`. If late inlining does happen at that call site, the logic iterates over `_print_inlining_list` and finds the entry with the matching `CallGenerator`. When the call site is inlined, it's possible that this will cause some inlining to happen right away (and so messages to be appended to the current `_print_inlining_list` entry) and some more late inlining to happen later on (and so a new entry to be added to `_print_inlining_list` right after the current one, possibly in the middle of the list). If I remember correctly I tried using `InlineTree` instead but that didn't work well. I don't remember the details though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2472855578 From chagedorn at openjdk.org Wed Nov 13 08:56:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 08:56:57 GMT Subject: RFR: 8344089: Fix wrong location of TestWrongMinLWiden.java Message-ID: Just noticed this a second too late. Somehow I must have applied the patch wrongly when moving the fix from one local repo to another. Anyway, this patch move the test to the proper location inside the `test` folder. Thanks, Christian ------------- Commit messages: - 8344089: Fix wrong location of TestWrongMinLWiden.java Changes: https://git.openjdk.org/jdk/pull/22060/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22060&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344089 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22060/head:pull/22060 PR: https://git.openjdk.org/jdk/pull/22060 From thartmann at openjdk.org Wed Nov 13 09:19:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 09:19:34 GMT Subject: RFR: 8344089: Fix wrong location of TestWrongMinLWiden.java In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 08:50:41 GMT, Christian Hagedorn wrote: > Just noticed this a second too late. Somehow I must have applied the patch wrongly when moving the fix from one local repo to another. Anyway, this patch move the test to the proper location inside the `test` folder. > > Thanks, > Christian Good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22060#pullrequestreview-2432362101 From rcastanedalo at openjdk.org Wed Nov 13 09:19:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 13 Nov 2024 09:19:36 GMT Subject: RFR: 8344089: Fix wrong location of TestWrongMinLWiden.java In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 08:50:41 GMT, Christian Hagedorn wrote: > Just noticed this a second too late. Somehow I must have applied the patch wrongly when moving the fix from one local repo to another. Anyway, this patch move the test to the proper location inside the `test` folder. > > Thanks, > Christian Trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22060#pullrequestreview-2432387951 From duke at openjdk.org Wed Nov 13 10:24:33 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 10:24:33 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v7] In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 10:19:10 GMT, Christian Hagedorn wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Cover another case > > src/hotspot/share/opto/loopTransform.cpp line 2054: > >> 2052: Node *newcle = old_new[loop_end->_idx]; >> 2053: _igvn.hash_delete(newcle); >> 2054: Node *one = intcon(1); > > While at it, you can also fix the wrong `*` placement (should be at type): > Suggestion: > > Node* one = intcon(1); Thanks, Christian, I have now configured my idea to take care of fixing the formatting of changed lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1839911939 From duke at openjdk.org Wed Nov 13 10:24:32 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 10:24:32 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v8] In-Reply-To: References: Message-ID: <8pnfJI4IfgdC6YHouqDb3rso-3X4xsvr1KYrZl_BxPI=.f10811b5-620e-416b-8585-7bd74ee575e8@github.com> > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/8c51ec99..38c6f510 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From stuefe at openjdk.org Wed Nov 13 10:27:37 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 13 Nov 2024 10:27:37 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 20:55:51 GMT, Dean Long wrote: > This looks OK, but doesn't seem strictly necessary for JDK-8344009. We could get the PhaseTraceId from `accumulator - &Phase::timers[0]`. I thought so too, but it seemed a bit ugly and brittle, and the simplification seemed worth it anyhow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22029#issuecomment-2473105069 From epeter at openjdk.org Wed Nov 13 10:56:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 10:56:11 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps In-Reply-To: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Tue, 12 Nov 2024 15:06:52 GMT, Christian Hagedorn wrote: > This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. > > ### Goal of Assertion Predicates > #### Initialized Assertion Predicates > These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. > > #### Template Assertion Predicates > Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). > > ### Why Did we Use UCTs for Template Assertion Predicates? > When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. > > ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? > #### Missing UCTs for Predicates above Loops > Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. > > #### Missing UCTs to Create Template Assertion Predicates > Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. > > There ... Reasonable :) Looks like a nice simplicifaction that enables more improvements in the future! src/hotspot/share/opto/loopPredicate.cpp line 314: > 312: template_assertion_predicate_success_proj->has_out(j); > 313: j++) { > 314: Node* fast_node = template_assertion_predicate_success_proj->out(j); Suggestion: Node* true_path_node = template_assertion_predicate_success_proj->out(j); Now that you have renamed it `fast_proj` -> `true_path_loop_proj`. And why `true_path_loop_proj` and not just `true_path_proj`? src/hotspot/share/opto/loopPredicate.cpp line 332: > 330: } > 331: > 332: // Put all Assertion Predicate projections on a list, starting at 'predicate' and going up in the tree. If 'get_opaque' Suggestion: // Put all Template Assertion Predicate projections on a list, starting at 'predicate' and going up in the tree. If 'get_opaque' Would that be more accurate? src/hotspot/share/opto/loopPredicate.cpp line 362: > 360: void PhaseIdealLoop::clone_parse_and_assertion_predicates_to_unswitched_loop(IdealLoopTree* loop, Node_List& old_new, > 361: IfProjNode*& true_path_loop_entry, > 362: IfProjNode*& false_path_loop_entry) { Oh boy... passing pointers by reference... I suppose that was already here like this. Looks adventurous ? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22040#pullrequestreview-2432573068 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1839920839 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1839923320 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1839943652 From epeter at openjdk.org Wed Nov 13 10:56:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 10:56:11 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Wed, 13 Nov 2024 10:36:05 GMT, Emanuel Peter wrote: >> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. >> >> ### Goal of Assertion Predicates >> #### Initialized Assertion Predicates >> These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. >> >> #### Template Assertion Predicates >> Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). >> >> ### Why Did we Use UCTs for Template Assertion Predicates? >> When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. >> >> ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? >> #### Missing UCTs for Predicates above Loops >> Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. >> >> #### Missing UCTs to Create Template Assertion Predicates >> Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Te... > > src/hotspot/share/opto/loopPredicate.cpp line 362: > >> 360: void PhaseIdealLoop::clone_parse_and_assertion_predicates_to_unswitched_loop(IdealLoopTree* loop, Node_List& old_new, >> 361: IfProjNode*& true_path_loop_entry, >> 362: IfProjNode*& false_path_loop_entry) { > > Oh boy... passing pointers by reference... I suppose that was already here like this. Looks adventurous ? Maybe there should be some sort of `UnswitchingResult`, that has the projections and `old_new` mapping? That could then be passed as a result. What do you think? Can be a separate RFE of course. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1839946217 From mli at openjdk.org Wed Nov 13 11:09:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 13 Nov 2024 11:09:15 GMT Subject: RFR: 8344074: RISC-V: More accurate _exception_handler_size and _deopt_handler_size In-Reply-To: References: Message-ID: <_K07cPBxEj71Io0B_djd7lo4R1UddaaOO7Sr2DiJt8s=.cde56672-087a-4589-9fff-aa1e2202a9f4@github.com> On Wed, 13 Nov 2024 00:38:11 GMT, Fei Yang wrote: > Hi, please review this small change. > > I find that the reserved size for these two handlers are not accurate and are larger than needed. For `_exception_handler_size`, the used size is only 20 bytes for release build and about 60 bytes for debug build. Considering that the exception handler is not trivial, I reserved a little bit more than needed for release build (32 bytes). For `_deopt_handler_size`, `far_jump` will always emit two instructions. > > Testing on linux-riscv64: > - [x] tier1 (release) > - [x] hotspot:tier1 (fastdebug) Make sense to me. Thanks! Just an unrelated question, how did you get the instructions size of `emit_exception_handler`? ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22053#pullrequestreview-2432717535 PR Comment: https://git.openjdk.org/jdk/pull/22053#issuecomment-2473226208 From rcastanedalo at openjdk.org Wed Nov 13 11:18:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 13 Nov 2024 11:18:30 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: <6Fw6s8C3ovd8wuJEqp0CmvcjyUg_Ar-avXL_uVTyog4=.3aadfc7e-b92c-44f9-9ecf-cc3572ecf185@github.com> References: <6Fw6s8C3ovd8wuJEqp0CmvcjyUg_Ar-avXL_uVTyog4=.3aadfc7e-b92c-44f9-9ecf-cc3572ecf185@github.com> Message-ID: <-jvMMci4hwNOhW8hvxkAeJIe68QOqj7snIvueko7GVs=.892d4087-5355-4721-b715-c2f04074f284@github.com> On Tue, 12 Nov 2024 13:58:31 GMT, Quan Anh Mai wrote: > May I ask what's wrong with making `BoxLock` a subclass of `MachNode`? Thanks for the suggestion, this might also address the issue, but would break C2's "no Mach nodes before matching" invariant, require pretty invasive code changes, and incur a higher risk of introducing regressions. A more principled solution (out of scope for this bug fix IMO) would be to extend ADL with stack location operands, as hinted [here](https://github.com/openjdk/jdk/blob/79345bbbae2564f9f523859d1227a1784293b20f/src/hotspot/share/opto/matcher.cpp#L2278). This would make it possible to treat BoxLock as any other Ideal node and define platform-specific Mach nodes to replace it after matching. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2473269283 From fyang at openjdk.org Wed Nov 13 11:44:48 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 13 Nov 2024 11:44:48 GMT Subject: RFR: 8344074: RISC-V: More accurate _exception_handler_size and _deopt_handler_size In-Reply-To: <_K07cPBxEj71Io0B_djd7lo4R1UddaaOO7Sr2DiJt8s=.cde56672-087a-4589-9fff-aa1e2202a9f4@github.com> References: <_K07cPBxEj71Io0B_djd7lo4R1UddaaOO7Sr2DiJt8s=.cde56672-087a-4589-9fff-aa1e2202a9f4@github.com> Message-ID: On Wed, 13 Nov 2024 11:05:58 GMT, Hamlin Li wrote: > Just an unrelated question, how did you get the instructions size of `emit_exception_handler`? I simply modified the code dumping the real size (`code_offset() - offset`) immediately before the guarantee in `LIR_Assembler::emit_exception_handler` [1]. Thanks for the review! [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp#L310 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22053#issuecomment-2473336096 From chagedorn at openjdk.org Wed Nov 13 11:57:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 11:57:17 GMT Subject: Integrated: 8344089: Fix wrong location of TestWrongMinLWiden.java In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 08:50:41 GMT, Christian Hagedorn wrote: > Just noticed this a second too late. Somehow I must have applied the patch wrongly when moving the fix from one local repo to another. Anyway, this patch move the test to the proper location inside the `test` folder. > > Thanks, > Christian This pull request has now been integrated. Changeset: d334af08 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d334af084100133fd6186c9dec70ff01a3809a48 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod 8344089: Fix wrong location of TestWrongMinLWiden.java Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22060 From chagedorn at openjdk.org Wed Nov 13 11:57:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 11:57:17 GMT Subject: RFR: 8344089: Fix wrong location of TestWrongMinLWiden.java In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 08:50:41 GMT, Christian Hagedorn wrote: > Just noticed this a second too late. Somehow I must have applied the patch wrongly when moving the fix from one local repo to another. Anyway, this patch move the test to the proper location inside the `test` folder. > > Thanks, > Christian Thanks Tobias and Roberto for your quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22060#issuecomment-2473396348 From roland at openjdk.org Wed Nov 13 11:58:55 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 13 Nov 2024 11:58:55 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 07:56:24 GMT, Thomas Stuefe wrote: > As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. > > Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). > > There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. > > The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. > > Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22029#pullrequestreview-2432879408 From duke at openjdk.org Wed Nov 13 12:01:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 12:01:23 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes Message-ID: This PR introduces - several new optimizations to unsigned division and modulo - x % 1, x % x, x % 2^k - x / 1, x / x, x / 2^k - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. - tests to test existing optimizations for signed division and modulo - does not test the Granlund and Montgomery algorithm directly ------------- Commit messages: - Remove transform_unsigned_* and inline - Fix test comments - Minor fixes - Add 2^k-1 test - Fix code style - Move is into Type - Add more ModI/L tests - DRY unsigned div - Improve UModL test - Add long - ... and 6 more: https://git.openjdk.org/jdk/compare/8cb12211...117d1f41 Changes: https://git.openjdk.org/jdk/pull/22061/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332268 Stats: 858 lines in 11 files changed: 841 ins; 9 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From qamai at openjdk.org Wed Nov 13 12:01:23 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 12:01:23 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 09:45:37 GMT, theoweidmannoracle wrote: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly This seems similar to #9947 . Feel free to take over if you are working on this as I am not working on the PR right now and I forgot which state it is in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2473300847 From duke at openjdk.org Wed Nov 13 12:01:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 12:01:23 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: <7wuTjPJnAqjwAhIh5mUvk-pcsJGiXqZDWrXnXAmBJM8=.5d2d213a-5cb4-44fe-9776-69dc6f94db6d@github.com> On Wed, 13 Nov 2024 11:27:57 GMT, Quan Anh Mai wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > This seems similar to #9947 . Feel free to take over if you are working on this as I am not working on the PR right now and I forgot which state it is in. @merykitty Thanks for the pointer. If I understand correctly, your PR is more focused on improving division by applying numerical transformations, while the focus of this PR is to add basic optimizations to unsigned division and modulo (such as division by 1) and test the optimizations present for signed division. So I think your PR would indeed complement this PR very well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2473392391 From chagedorn at openjdk.org Wed Nov 13 12:07:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 12:07:27 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps In-Reply-To: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Tue, 12 Nov 2024 15:06:52 GMT, Christian Hagedorn wrote: > This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. > > ### Goal of Assertion Predicates > #### Initialized Assertion Predicates > These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. > > #### Template Assertion Predicates > Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). > > ### Why Did we Use UCTs for Template Assertion Predicates? > When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. > > ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? > #### Missing UCTs for Predicates above Loops > Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. > > #### Missing UCTs to Create Template Assertion Predicates > Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. > > There ... Just noticed that I forgot to submit my accompanying comment. src/hotspot/share/opto/loopPredicate.cpp line 290: > 288: // cloned predicates. > 289: void PhaseIdealLoop::clone_assertion_predicates_to_unswitched_loop(IdealLoopTree* loop, const Node_List& old_new, > 290: Deoptimization::DeoptReason reason, unused src/hotspot/share/opto/loopPredicate.cpp line 296: > 294: // original predicate order. > 295: Unique_Node_List list; > 296: get_template_assertion_predicates(old_parse_predicate_proj, list); More precise since we only care about Template Assertion Predicates. src/hotspot/share/opto/loopPredicate.cpp line 304: > 302: Node_List to_process; > 303: // Process in reverse order such that 'create_new_if_for_predicate' can be used in > 304: // 'clone_assertion_predicate_for_unswitched_loops' and the original order is maintained. Covered by updated comment on L293. src/hotspot/share/opto/loopPredicate.cpp line 313: > 311: for (DUIterator j = template_assertion_predicate_success_proj->outs(); > 312: template_assertion_predicate_success_proj->has_out(j); > 313: j++) { renaming `predicate` -> `template_assertion_predicate_success_proj` src/hotspot/share/opto/loopPredicate.cpp line 315: > 313: assert(assertion_predicate_has_loop_opaque_node(fast_proj->in(0)->as_If()), "must find Assertion Predicate for fast loop"); > 314: IfProjNode* slow_proj = clone_assertion_predicate_for_unswitched_loops(iff, predicate_proj, reason, slow_loop_parse_predicate_proj); > 315: assert(assertion_predicate_has_loop_opaque_node(slow_proj->in(0)->as_If()), "must find Assertion Predicate for slow loop"); `assert` moved to `clone_assertion_predicate_for_unswitched_loops()`. src/hotspot/share/opto/loopPredicate.cpp line 350: > 348: ParsePredicateNode* unswitched_loop_parse_predicate) { > 349: TemplateAssertionPredicate template_assertion_predicate(template_assertion_predicate_success_proj); > 350: IfTrueNode* template_success_proj = template_assertion_predicate.clone(unswitched_loop_parse_predicate->in(0), this); Introduced new `clone()` method to do the cloning without an UCT and thus we no longer use `create_new_if_for_predicate()`. src/hotspot/share/opto/loopPredicate.cpp line 362: > 360: void PhaseIdealLoop::clone_parse_and_assertion_predicates_to_unswitched_loop(IdealLoopTree* loop, Node_List& old_new, > 361: IfProjNode*& true_path_loop_entry, > 362: IfProjNode*& false_path_loop_entry) { Should have renamed these earlier: We no longer call the unswitched loop versions fast and slow but rather true path and false path. src/hotspot/share/opto/loopPredicate.cpp line 1292: > 1290: > 1291: TemplateAssertionPredicateCreator template_assertion_predicate_creator(loop_head, scale, offset, range, this); > 1292: IfTrueNode* template_success_proj = template_assertion_predicate_creator.create(new_control); Reuse existing `TemplateAssertionPredicateCreator.create()` (which was named `create_with_halt()` before) to create a new Template Assertion Predicate with a halt node. src/hotspot/share/opto/loopnode.cpp line 2837: > 2835: } > 2836: if (is_main_loop() || is_post_loop()) { > 2837: AssertionPredicates assertion_predicates(ctrl); Dropped the `WithHalt` postfix since we only have Assertion Predicate with Halt nodes now. src/hotspot/share/opto/predicates.cpp line 85: > 83: } > 84: > 85: Deoptimization::DeoptReason RuntimePredicate::uncommon_trap_reason(IfProjNode* if_proj) { We no longer have Template Assertion Predicates with UCTs. Thus, we could move this check to `RuntimePredicate` which use UCTs. src/hotspot/share/opto/predicates.cpp line 155: > 153: > 154: // Clone this Template Assertion Predicate and replace the OpaqueLoopInitNode with the provided 'new_opaque_init' node. > 155: IfTrueNode* TemplateAssertionPredicate::clone(Node* new_control, PhaseIdealLoop* phase) const { New `clone()` method which creates a Template Assertion Predicate with a halt node by reusing the existing `AssertionPredicateIfCreator` class. src/hotspot/share/opto/predicates.cpp line 622: > 620: // Creates an init and last value Template Assertion Predicate connected together from a Parse Predicate with an UCT on > 621: // the failing path. Returns the success projection of the last value Template Assertion Predicate. > 622: IfTrueNode* TemplateAssertionPredicateCreator::create_with_uncommon_trap( No longer required. src/hotspot/share/opto/predicates.cpp line 653: > 651: } > 652: > 653: IfTrueNode* TemplateAssertionPredicateCreator::create_if_node_with_uncommon_trap( No longer required. src/hotspot/share/opto/predicates.hpp line 300: > 298: static bool is_predicate(const Node* node, Deoptimization::DeoptReason deopt_reason); > 299: static bool has_valid_uncommon_trap(const Node* success_proj); > 300: }; No longer required. Most of these method have been moved to the `RuntimePredicate` class. src/hotspot/share/opto/predicates.hpp line 459: > 457: public: > 458: OpaqueTemplateAssertionPredicateNode* clone(Node* new_control, PhaseIdealLoop* phase); > 459: OpaqueTemplateAssertionPredicateNode* clone_and_replace_init(Node* new_control, Node* new_init, Renamed `new_ctrl` -> `new_control` and swapped order of parameters. src/hotspot/share/opto/predicates.hpp line 590: > 588: ParsePredicateSuccessProj* parse_predicate_success_proj, > 589: Deoptimization::DeoptReason deopt_reason, int if_opcode, > 590: bool does_overflow, AssertionPredicateType assertion_predicate_type); No longer required. src/hotspot/share/opto/predicates.hpp line 591: > 589: NONCOPYABLE(TemplateAssertionPredicateCreator); > 590: > 591: IfTrueNode* create(Node* new_control); Renamed to `create()`. src/hotspot/share/opto/predicates.hpp line 593: > 591: IfTrueNode* create_if_node_with_halt(Node* new_control, > 592: OpaqueTemplateAssertionPredicateNode* template_assertion_predicate_expression, > 593: bool does_overflow, AssertionPredicateType assertion_predicate_type); Renamed to `create_if_node()`. src/hotspot/share/opto/predicates.hpp line 606: > 604: > 605: IfTrueNode* create_with_uncommon_trap(Node* new_control, ParsePredicateSuccessProj* parse_predicate_success_proj, > 606: Deoptimization::DeoptReason deopt_reason, int if_opcode); No longer required. ------------- PR Review: https://git.openjdk.org/jdk/pull/22040#pullrequestreview-2429875583 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838268869 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838269937 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838270663 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838272261 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838271271 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838274015 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838275182 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838278848 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838279751 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838281378 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838306376 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838303944 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838303822 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838303121 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838300383 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838301020 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838302058 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838301285 PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1838302243 From duke at openjdk.org Wed Nov 13 12:14:59 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 12:14:59 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v9] In-Reply-To: References: Message-ID: <3qmQ2xqKOdepqTBQCFUv8oM5MxTbN9wGrQhRnqH3EHE=.8c43f2f2-5581-45a0-997f-20320dc2d7c9@github.com> > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/38c6f510..48ab32f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From chagedorn at openjdk.org Wed Nov 13 12:15:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 12:15:22 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v2] In-Reply-To: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: <1frynJJTu8tTrQCKvJ2jODFPuMYyuewpSFtLbHg2e58=.ebb995db-d9ea-4fa8-a06f-e0ad25e2e69d@github.com> > This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. > > ### Goal of Assertion Predicates > #### Initialized Assertion Predicates > These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. > > #### Template Assertion Predicates > Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). > > ### Why Did we Use UCTs for Template Assertion Predicates? > When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. > > ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? > #### Missing UCTs for Predicates above Loops > Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. > > #### Missing UCTs to Create Template Assertion Predicates > Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. > > There ... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopPredicate.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22040/files - new: https://git.openjdk.org/jdk/pull/22040/files/537251f1..1b0379b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22040/head:pull/22040 PR: https://git.openjdk.org/jdk/pull/22040 From chagedorn at openjdk.org Wed Nov 13 12:15:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 12:15:22 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v2] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: <6ryTQh4xRadQ1VP6kXPsMP8nUmjmQq8cq_5MWkWOZGM=.f9573b81-d6d2-4b52-a158-fd58c9ff1067@github.com> On Wed, 13 Nov 2024 10:22:40 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/loopPredicate.cpp >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/loopPredicate.cpp line 314: > >> 312: template_assertion_predicate_success_proj->has_out(j); >> 313: j++) { >> 314: Node* fast_node = template_assertion_predicate_success_proj->out(j); > > Suggestion: > > Node* true_path_node = template_assertion_predicate_success_proj->out(j); > > Now that you have renamed it `fast_proj` -> `true_path_loop_proj`. > And why `true_path_loop_proj` and not just `true_path_proj`? Good catch. I think `true_path_loop_node` would be the most accurate. I'd prefer to have "loop" in the name since `true_path` on its own could be a true path of any If node (though it could be inferred from the context). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1840141259 From chagedorn at openjdk.org Wed Nov 13 12:15:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 12:15:22 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v2] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Wed, 13 Nov 2024 10:37:49 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopPredicate.cpp line 362: >> >>> 360: void PhaseIdealLoop::clone_parse_and_assertion_predicates_to_unswitched_loop(IdealLoopTree* loop, Node_List& old_new, >>> 361: IfProjNode*& true_path_loop_entry, >>> 362: IfProjNode*& false_path_loop_entry) { >> >> Oh boy... passing pointers by reference... I suppose that was already here like this. Looks adventurous ? > > Maybe there should be some sort of `UnswitchingResult`, that has the projections and `old_new` mapping? That could then be passed as a result. What do you think? Can be a separate RFE of course. Adventurous indeed. However, I'm planning to get rid of all this predicate code for Loop Unswitching anyway with [JDK-8344035](https://bugs.openjdk.org/browse/JDK-8344035). So, I guess it's fine to not further update the code at this point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1840145772 From chagedorn at openjdk.org Wed Nov 13 12:19:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 12:19:12 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: > This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. > > ### Goal of Assertion Predicates > #### Initialized Assertion Predicates > These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. > > #### Template Assertion Predicates > Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). > > ### Why Did we Use UCTs for Template Assertion Predicates? > When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. > > ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? > #### Missing UCTs for Predicates above Loops > Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. > > #### Missing UCTs to Create Template Assertion Predicates > Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. > > There ... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: More renaming and comment fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22040/files - new: https://git.openjdk.org/jdk/pull/22040/files/1b0379b1..ce759f6b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22040&range=01-02 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/22040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22040/head:pull/22040 PR: https://git.openjdk.org/jdk/pull/22040 From stuefe at openjdk.org Wed Nov 13 12:32:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 13 Nov 2024 12:32:39 GMT Subject: RFR: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 20:55:51 GMT, Dean Long wrote: >> As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. >> >> Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). >> >> There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. >> >> The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. >> >> Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. > > This looks OK, but doesn't seem strictly necessary for JDK-8344009. We could get the PhaseTraceId from `accumulator - &Phase::timers[0]`. Thank you @dean-long and @rwestrel ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22029#issuecomment-2473476182 From stuefe at openjdk.org Wed Nov 13 12:32:41 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 13 Nov 2024 12:32:41 GMT Subject: Integrated: 8344014: Simplify TracePhase constructor In-Reply-To: References: Message-ID: <_204RQZNAueU66JnHQbhpQTj2II8wND55aT8_Agdyqc=.753744fc-10b9-48ee-acd5-fa3cf7a19df2@github.com> On Tue, 12 Nov 2024 07:56:24 GMT, Thomas Stuefe wrote: > As a prerequisite for [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009), `TracePhase` constructor needs to know the PhaseTraceId. And while we are at it, it can be simplified: trace strings can be kept in via x-macro with the IDs, and it is sufficient to pass in the IDs, no need to pass the pointer to the counters since we use the same counters anyway. > > Since this is a somewhat invasive but purely mechanical change, I separate this work from [JDK-8344009](https://bugs.openjdk.org/browse/JDK-8344009). > > There are no functional changes. Trace texts have been faithfully taken over, even in the case where the original TracePhase constructor invocation got fed an empty string (`_t_vector` and `_t_renumberLive`) - whether this was intentional or not, this patch does not change it. > > The patch preserves the possibility to override the phase name with an explicit argument to the constructor. This is used in one existing case ("computeLive (sbplr)"), again, to faithfully preserve the log format. > > Test: I checked manually with +CITimeVerbose with and without patch and compared the output; output format is preserved. This pull request has now been integrated. Changeset: 133f8f31 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/133f8f318675d5825defc8587911b53ecb9a7136 Stats: 199 lines in 18 files changed: 75 ins; 53 del; 71 mod 8344014: Simplify TracePhase constructor Reviewed-by: dlong, roland ------------- PR: https://git.openjdk.org/jdk/pull/22029 From dlunden at openjdk.org Wed Nov 13 13:17:10 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 13 Nov 2024 13:17:10 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Keep active. I intend to review this when time allows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2473573924 From epeter at openjdk.org Wed Nov 13 13:48:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 13:48:55 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: <2aeybw-uLPUUPw1vNvcfMDZCC0Y9Re5-of6jhPehrp4=.57665861-efbb-4dc7-b1ca-86b44cffbae5@github.com> On Wed, 13 Nov 2024 12:19:12 GMT, Christian Hagedorn wrote: >> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. >> >> ### Goal of Assertion Predicates >> #### Initialized Assertion Predicates >> These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. >> >> #### Template Assertion Predicates >> Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). >> >> ### Why Did we Use UCTs for Template Assertion Predicates? >> When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. >> >> ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? >> #### Missing UCTs for Predicates above Loops >> Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. >> >> #### Missing UCTs to Create Template Assertion Predicates >> Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Te... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > More renaming and comment fixes Thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22040#pullrequestreview-2433226757 From qamai at openjdk.org Wed Nov 13 13:48:58 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 13:48:58 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: <602Z14TWDGr8LttbyQ22jyrclZAORmCKgO2Qs7SHEnQ=.d8a7b429-be6d-456a-91fe-844abb8f8e94@github.com> On Mon, 4 Nov 2024 08:57:23 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge branch 'master' into unsignedbounds >> - address reviews >> - comment adjust_lo empty case >> - formality >> - address reviews >> - add comments, refactor functions to helper class >> - refine comments >> - remove leftover code >> - add doc to TypeInt, rename parameters, remove unused methods >> - change (~v & ones) == 0 to (v & ones) == ones >> - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa > > src/hotspot/share/opto/type.hpp line 620: > >> 618: * >> 619: * 2. Either _lo == jint(_ulo) and _hi == jint(_uhi), or all elements of a >> 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] > > The `[_lo, jint(_uhi)] or [jint(_ulo), _hi]` in english is not precise enough. > - Is it a mathematical `OR`: the element can also be in both? In that case I would add "or both". > - Is it a mathematical `XOR`? Then I would write "either ... or .. but not both" In this case the intervals are disjoint so it is equivalent whether it is a `OR` or a `XOR`. > src/hotspot/share/opto/type.hpp line 630: > >> 628: * For a TypeInt t, there are 3 possible cases: >> 629: * >> 630: * a. t._lo >= 0. Since 0 <= t._lo <= jint(t._ulo), we have: > > I think you should say why `t._lo <= jint(t._ulo)` ... it seems intuitively true... hmm It is because `t._lo` is the smallest element of `t` in the signed domain so `t._ulo` must be not less than `t._lo` in the signed domain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1840307875 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1840309915 From qamai at openjdk.org Wed Nov 13 13:48:59 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 13:48:59 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: <-d-R7jGoZ1OUrfIP23mumrC1L-WDQd3ylYoTf7TX6vs=.d83a9c38-c11b-4173-a1a9-bba2d691207a@github.com> References: <-d-R7jGoZ1OUrfIP23mumrC1L-WDQd3ylYoTf7TX6vs=.d83a9c38-c11b-4173-a1a9-bba2d691207a@github.com> Message-ID: <08DKSsZX3IMUOGnVZurgWC1taIW3mYPgslyj8x4dBuI=.6d9f2f34-36fd-49b9-8e37-59e17f3b4473@github.com> On Mon, 4 Nov 2024 12:41:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/type.hpp line 622: >> >>> 620: * TypeInt lie in the intervals [_lo, jint(_uhi)] or [jint(_ulo), _hi] >>> 621: * >>> 622: * Proof: For 2 jint value x, y such that they are both >= 0 or < 0. Then: >> >> Suggestion: >> >> * Proof: For 2 jint value x, y such that they are both >= 0 or both < 0. Then: >> >> Or are you allowing them to one be positive and one negative? > > Also: this is more of a "Lemma", and could be stated before the "Proof" of you property 2... it is property 2 that you are trying to prove here, right? The indentation would help for that as well. Yes I have changed it to lemma and done some indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1840308577 From chagedorn at openjdk.org Wed Nov 13 13:59:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 13:59:54 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: <7PO6v-Bfx9Scjs1eMBkxANLOYbAGdrSkSaQPCCL0pHY=.92d72004-118f-4a80-a419-e4873b765b9c@github.com> On Wed, 13 Nov 2024 12:19:12 GMT, Christian Hagedorn wrote: >> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. >> >> ### Goal of Assertion Predicates >> #### Initialized Assertion Predicates >> These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. >> >> #### Template Assertion Predicates >> Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). >> >> ### Why Did we Use UCTs for Template Assertion Predicates? >> When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. >> >> ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? >> #### Missing UCTs for Predicates above Loops >> Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. >> >> #### Missing UCTs to Create Template Assertion Predicates >> Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Te... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > More renaming and comment fixes Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22040#issuecomment-2473691556 From qamai at openjdk.org Wed Nov 13 14:03:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:03:52 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v26] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - whitespace - further reviews - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - address reviews - comment adjust_lo empty case - formality - address reviews - add comments, refactor functions to helper class - refine comments - ... and 25 more: https://git.openjdk.org/jdk/compare/889f9062...c2d7d36e ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=25 Stats: 1995 lines in 10 files changed: 1435 ins; 325 del; 235 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Wed Nov 13 14:03:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:03:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 16:08:52 GMT, Emanuel Peter wrote: >> @eme64 Thanks to your suggestions, I have managed to come up with a (fairly) formal proof for the algorithm here! > > @merykitty FYI: I'm going on vacation for 3 weeks, so I'll hope to come back to this afterward. @eme64 Thanks for your reviews, I have addressed those. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2473700226 From qamai at openjdk.org Wed Nov 13 14:03:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:03:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v25] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 12:51:49 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: >> >> - Merge branch 'master' into unsignedbounds >> - address reviews >> - comment adjust_lo empty case >> - formality >> - address reviews >> - add comments, refactor functions to helper class >> - refine comments >> - remove leftover code >> - add doc to TypeInt, rename parameters, remove unused methods >> - change (~v & ones) == 0 to (v & ones) == ones >> - ... and 22 more: https://git.openjdk.org/jdk/compare/309b9291...7f3316fa > > test/hotspot/gtest/opto/test_rangeinference.cpp line 33: > >> 31: #include >> 32: >> 33: #ifdef ASSERT > > Why do you have this here? Removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1840331286 From qamai at openjdk.org Wed Nov 13 14:09:59 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:09:59 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: References: Message-ID: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> > Hi, > > This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. > > Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. > > This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - indentation - Merge branch 'master' into constanttable - Merge branch 'master' into constanttable - refactor array constant, fix codebuffer reallocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21596/files - new: https://git.openjdk.org/jdk/pull/21596/files/2efa68db..bd0628ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=01-02 Stats: 236223 lines in 2368 files changed: 139416 ins; 72265 del; 24542 mod Patch: https://git.openjdk.org/jdk/pull/21596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21596/head:pull/21596 PR: https://git.openjdk.org/jdk/pull/21596 From qamai at openjdk.org Wed Nov 13 14:10:01 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:10:01 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v2] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 11:43:33 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into constanttable >> - refactor array constant, fix codebuffer reallocation > > src/hotspot/cpu/x86/x86.ad line 2743: > >> 2741: case T_BYTE: val->at(i) = con; break; >> 2742: case T_SHORT: { >> 2743: jshort c = con; > > Why are these casts needed? Isn't `T con` already of the appropriate j-type? No for example when `bt == T_BYTE`, `T` is actual `jint`. As a result, I do this for all the cases for uniformity, also a mismatch will not result in a crash but may silently write the wrong data so I'm extra cautious here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1840350024 From duke at openjdk.org Wed Nov 13 14:33:27 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 14:33:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build Message-ID: Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. ------------- Commit messages: - Fix merge issue Changes: https://git.openjdk.org/jdk/pull/22073/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22073&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344124 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22073.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22073/head:pull/22073 PR: https://git.openjdk.org/jdk/pull/22073 From thartmann at openjdk.org Wed Nov 13 14:33:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Looks good and trivial to me. FTR, the conflicting change was [JDK-8338383](https://bugs.openjdk.org/browse/JDK-8338383). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433368706 From chagedorn at openjdk.org Wed Nov 13 14:33:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: <099LGU_wYlNw91hFw5Qaw7I_tSNocXn-0CNB8TKmXTI=.32182ea2-821b-4d85-a057-806d0fc3a24d@github.com> On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Looks good to me, too. That was bad luck. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433387841 From epeter at openjdk.org Wed Nov 13 14:33:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: <49CbESM_VUW52TapqeGGSBiAUsecfqzF4jzagvVeT5w=.ddfcee57-136e-4425-b93d-ead5cda3fe17@github.com> On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433392753 From qamai at openjdk.org Wed Nov 13 14:41:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 14:41:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v27] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: build failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/c2d7d36e..dcc9030f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=25-26 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From duke at openjdk.org Wed Nov 13 14:52:21 2024 From: duke at openjdk.org (duke) Date: Wed, 13 Nov 2024 14:52:21 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. @theoweidmannoracle Your change (at version caaff97eebbfa11da3e7d713d71fdbbdf965b09c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22073#issuecomment-2473834454 From jwaters at openjdk.org Wed Nov 13 14:58:27 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433493345 From thartmann at openjdk.org Wed Nov 13 14:58:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Whoops, wrong command :) Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22073#issuecomment-2473850725 From duke at openjdk.org Wed Nov 13 14:58:27 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: Integrated: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. This pull request has now been integrated. Changeset: b80ca490 Author: theoweidmannoracle Committer: Julian Waters URL: https://git.openjdk.org/jdk/commit/b80ca4902af71938b32634d3fd230f4d65cde454 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344124: JDK-8341411 Broke the build Reviewed-by: thartmann, chagedorn, epeter, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/22073 From fjiang at openjdk.org Wed Nov 13 15:06:43 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 13 Nov 2024 15:06:43 GMT Subject: RFR: 8344074: RISC-V: C1: More accurate _exception_handler_size and _deopt_handler_size In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 00:38:11 GMT, Fei Yang wrote: > Hi, please review this small change. > > I find that the reserved size for these two handlers are not accurate and are larger than needed. For `_exception_handler_size`, the used size is only 20 bytes for release build and 126 bytes for debug build (with -XX:+VerifyOops). Considering that the exception handler is not trivial, I reserved a little bit more than needed for release build (32 bytes). For `_deopt_handler_size`, `far_jump` will always emit two instructions. > > Testing on linux-riscv64: > - [x] tier1 (release) > - [x] hotspot:tier1 (fastdebug) Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/22053#pullrequestreview-2433529963 From epeter at openjdk.org Wed Nov 13 15:42:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 15:42:16 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException Message-ID: Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. ------------- Commit messages: - JDK-8344104 Changes: https://git.openjdk.org/jdk/pull/22080/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22080&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344104 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22080.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22080/head:pull/22080 PR: https://git.openjdk.org/jdk/pull/22080 From qamai at openjdk.org Wed Nov 13 15:44:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 13 Nov 2024 15:44:18 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v28] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: build failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/dcc9030f..71646530 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=26-27 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From acobbs at openjdk.org Wed Nov 13 16:47:55 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Wed, 13 Nov 2024 16:47:55 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/a574dda6..64d958b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=02-03 Stats: 95175 lines in 2626 files changed: 19948 ins; 68244 del; 6983 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From swen at openjdk.org Wed Nov 13 17:06:12 2024 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 13 Nov 2024 17:06:12 GMT Subject: RFR: 8343629: More MergeStore benchmark [v4] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: fix jvmArgs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21659/files - new: https://git.openjdk.org/jdk/pull/21659/files/4293ced9..2e88b024 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From shade at openjdk.org Wed Nov 13 18:07:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 13 Nov 2024 18:07:24 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 15:34:07 GMT, Emanuel Peter wrote: > Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. `Math.abs(Integer.MIN_VALUE)` strikes *AGAIN*, it is a tremendous fun every time. Why not just `RANDOM.nextInt(100)`? ------------- PR Review: https://git.openjdk.org/jdk/pull/22080#pullrequestreview-2434063567 From rkennke at openjdk.org Wed Nov 13 18:41:49 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Nov 2024 18:41:49 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers Message-ID: We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. Testing: - [x] tier1 aarch64 +UCOH - [x] tier1 x86_64 +UCOH ------------- Commit messages: - x86 parts - 8340453: C2: Improve encoding of LoadNKlass for compact headers Changes: https://git.openjdk.org/jdk/pull/22078/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340453 Stats: 48 lines in 8 files changed: 8 ins; 33 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22078.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22078/head:pull/22078 PR: https://git.openjdk.org/jdk/pull/22078 From rkennke at openjdk.org Wed Nov 13 19:45:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 13 Nov 2024 19:45:36 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: > We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. > > However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. > > Testing: > - [x] tier1 aarch64 +UCOH > - [x] tier1 x86_64 +UCOH Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve opto asm of LoadNKlass ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22078/files - new: https://git.openjdk.org/jdk/pull/22078/files/4b734742..d2010d1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22078.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22078/head:pull/22078 PR: https://git.openjdk.org/jdk/pull/22078 From sviswanathan at openjdk.org Wed Nov 13 23:00:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 13 Nov 2024 23:00:19 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v4] In-Reply-To: <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> Message-ID: On Wed, 13 Nov 2024 02:43:12 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains seven commits: > > - Removing target specific hooks > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - Review resoultions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2434642849 From dlong at openjdk.org Wed Nov 13 23:40:55 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Nov 2024 23:40:55 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 15:34:07 GMT, Emanuel Peter wrote: > Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. test/hotspot/jtreg/compiler/c2/TestMergeStores.java line 344: > 342: > 343: offset1 = Math.abs(RANDOM.nextInt() & 0x0fffffff) % 100; > 344: offset2 = Math.abs(RANDOM.nextInt() & 0x0fffffff) % 100; Suggestion: offset1 = Integer.remainderUnsigned(RANDOM.nextInt(), 100); offset2 = Integer.remainderUnsigned(RANDOM.nextInt(), 100); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22080#discussion_r1841323146 From dlong at openjdk.org Wed Nov 13 23:52:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 13 Nov 2024 23:52:02 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 19:45:36 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Improve opto asm of LoadNKlass It looks like this only works for little-endian. Is that documented somewhere? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2475054880 From swen at openjdk.org Thu Nov 14 00:05:43 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 14 Nov 2024 00:05:43 GMT Subject: RFR: 8343629: More MergeStore benchmark [v4] In-Reply-To: References: Message-ID: <8H78dc4hfhKJvmojAR2WF6kSgI1HWJEeCKk2Y1FVqd0=.7c0590d5-aacb-4219-8c25-c0972bcbb79c@github.com> On Wed, 13 Nov 2024 17:06:12 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix jvmArgs After jvm_args is set correctly, +/-MergeStores has a significant performance difference. ## List of tests with significant performance improvements The putBytes series and LittleEndian-based tests show that +MergeStores has a significant performance improvement under x64. | | -MergeStores | +MergeStores | delta| | --- | --- | --- | --- | | MergeStoreBench.putBytes4 | 4475.891 | 929.327 | 381.63% | | MergeStoreBench.putBytes4U | 4479.133 | 928.502 | 382.40% | | MergeStoreBench.putBytes4X | 4477.133 | 929.183 | 381.84% | | MergeStoreBench.putChars4B | 9008.350 | 5638.550 | 59.76% | | MergeStoreBench.putChars4BU | 8961.671 | 1144.479 | 683.03% | | MergeStoreBench.putChars4C | 4485.308 | 1133.473 | 295.71% | | MergeStoreBench.putChars4L | 9013.570 | 5640.893 | 59.79% | | MergeStoreBench.putChars4LU | 8957.625 | 1142.796 | 683.83% | | MergeStoreBench.putChars4LV | 4488.698 | 1134.303 | 295.72% | | MergeStoreBench.putChars4S | 4485.836 | 1133.430 | 295.78% | | MergeStoreBench.setIntL | 15430.183 | 2113.544 | 630.06% | | MergeStoreBench.setIntLU | 17361.730 | 4783.040 | 262.99% | | MergeStoreBench.setIntRL | 16525.068 | 3244.126 | 409.38% | | MergeStoreBench.setIntRLU | 14401.071 | 5930.149 | 142.85% | | MergeStoreBench.setLongL | 31353.713 | 5405.189 | 480.07% | | MergeStoreBench.setLongLU | 26113.756 | 4287.166 | 509.11% | | MergeStoreBench.setLongRL | 27232.898 | 4523.658 | 502.01% | | MergeStoreBench.setLongRLU | 26196.973 | 4798.177 | 445.98% | | MergeStoreBench.setLongU | 4500.659 | 4271.225 | 5.37% | ## List of tests that were not improved On x64 machines, all tests of get operations have no significant improvement, and BigEndian put performance has not improved. It is also very common to use BigEndian byte content on little-endian machines. For example, most network protocols are big-endian. It is expected that MergeStore can be supported. | | -MergeStores | +MergeStores | delta| | --- | --- | --- | --- | | MergeStoreBench.getCharB | 5908.544 | 5903.216 | 0.09% | | MergeStoreBench.getCharBU | 4853.054 | 4861.850 | -0.18% | | MergeStoreBench.getCharBV | 3080.971 | 3081.138 | -0.01% | | MergeStoreBench.getCharC | 2235.832 | 2235.306 | 0.02% | | MergeStoreBench.getCharL | 6046.201 | 6034.378 | 0.20% | | MergeStoreBench.getCharLU | 4934.757 | 4494.743 | 9.79% | | MergeStoreBench.getCharLV | 2221.754 | 2222.086 | -0.01% | | MergeStoreBench.getIntB | 8002.830 | 8008.578 | -0.07% | | MergeStoreBench.getIntBU | 9054.151 | 9048.937 | 0.06% | | MergeStoreBench.getIntBV | 308.274 | 308.438 | -0.05% | | MergeStoreBench.getIntL | 7885.680 | 7875.204 | 0.13% | | MergeStoreBench.getIntLU | 8863.323 | 8866.561 | -0.04% | | MergeStoreBench.getIntLV | 2228.348 | 2228.067 | 0.01% | | MergeStoreBench.getIntRB | 8636.679 | 8633.762 | 0.03% | | MergeStoreBench.getIntRBU | 11102.938 | 11105.491 | -0.02% | | MergeStoreBench.getIntRL | 8975.416 | 8962.822 | 0.14% | | MergeStoreBench.getIntRLU | 9249.430 | 9258.589 | -0.10% | | MergeStoreBench.getIntRU | 2510.359 | 2505.505 | 0.19% | | MergeStoreBench.getIntU | 2493.932 | 2494.808 | -0.04% | | MergeStoreBench.getLongB | 24811.283 | 24804.034 | 0.03% | | MergeStoreBench.getLongBU | 14024.209 | 14013.247 | 0.08% | | MergeStoreBench.getLongBV | 601.852 | 602.426 | -0.10% | | MergeStoreBench.getLongL | 25073.219 | 25115.247 | -0.17% | | MergeStoreBench.getLongLU | 14483.618 | 14497.662 | -0.10% | | MergeStoreBench.getLongLV | 2225.597 | 2225.810 | -0.01% | | MergeStoreBench.getLongRB | 24832.411 | 24801.799 | 0.12% | | MergeStoreBench.getLongRBU | 14027.084 | 14026.284 | 0.01% | | MergeStoreBench.getLongRL | 25008.679 | 25113.927 | -0.42% | | MergeStoreBench.getLongRLU | 14425.883 | 14493.830 | -0.47% | | MergeStoreBench.getLongRU | 3059.614 | 3058.726 | 0.03% | | MergeStoreBench.getLongU | 3049.682 | 3048.266 | 0.05% | | MergeStoreBench.putBytes4GetBytes | 5880.164 | 5883.995 | -0.07% | | MergeStoreBench.putChars4BV | 4488.270 | 4486.457 | 0.04% | | MergeStoreBench.setCharBS | 6088.826 | 6085.857 | 0.05% | | MergeStoreBench.setCharBV | 3596.210 | 3595.236 | 0.03% | | MergeStoreBench.setCharC | 4519.981 | 4471.174 | 1.09% | | MergeStoreBench.setCharLS | 5619.414 | 5618.239 | 0.02% | | MergeStoreBench.setCharLV | 2248.493 | 2245.939 | 0.11% | | MergeStoreBench.setIntB | 8039.705 | 8045.113 | -0.07% | | MergeStoreBench.setIntBU | 17884.223 | 17764.347 | 0.67% | | MergeStoreBench.setIntBV | 3239.985 | 3227.997 | 0.37% | | MergeStoreBench.setIntLV | 2128.975 | 2126.187 | 0.13% | | MergeStoreBench.setIntRB | 13786.186 | 13815.759 | -0.21% | | MergeStoreBench.setIntRBU | 14747.463 | 14771.017 | -0.16% | | MergeStoreBench.setIntRU | 5898.169 | 5875.589 | 0.38% | | MergeStoreBench.setIntU | 4805.170 | 4784.162 | 0.44% | | MergeStoreBench.setLongB | 31674.058 | 31662.483 | 0.04% | | MergeStoreBench.setLongBU | 25696.702 | 25674.394 | 0.09% | | MergeStoreBench.setLongBV | 2168.387 | 2165.313 | 0.14% | | MergeStoreBench.setLongLV | 2048.737 | 2116.054 | -3.18% | | MergeStoreBench.setLongRB | 29901.778 | 29909.501 | -0.03% | | MergeStoreBench.setLongRBU | 24945.914 | 25005.171 | -0.24% | | MergeStoreBench.setLongRU | 4797.817 | 4795.018 | 0.06% | ## Full tests | | -MergeStores | +MergeStores | delta| | --- | --- | --- | --- | | MergeStoreBench.getCharB | 5908.544 | 5903.216 | 0.09% | | MergeStoreBench.getCharBU | 4853.054 | 4861.850 | -0.18% | | MergeStoreBench.getCharBV | 3080.971 | 3081.138 | -0.01% | | MergeStoreBench.getCharC | 2235.832 | 2235.306 | 0.02% | | MergeStoreBench.getCharL | 6046.201 | 6034.378 | 0.20% | | MergeStoreBench.getCharLU | 4934.757 | 4494.743 | 9.79% | | MergeStoreBench.getCharLV | 2221.754 | 2222.086 | -0.01% | | MergeStoreBench.getIntB | 8002.830 | 8008.578 | -0.07% | | MergeStoreBench.getIntBU | 9054.151 | 9048.937 | 0.06% | | MergeStoreBench.getIntBV | 308.274 | 308.438 | -0.05% | | MergeStoreBench.getIntL | 7885.680 | 7875.204 | 0.13% | | MergeStoreBench.getIntLU | 8863.323 | 8866.561 | -0.04% | | MergeStoreBench.getIntLV | 2228.348 | 2228.067 | 0.01% | | MergeStoreBench.getIntRB | 8636.679 | 8633.762 | 0.03% | | MergeStoreBench.getIntRBU | 11102.938 | 11105.491 | -0.02% | | MergeStoreBench.getIntRL | 8975.416 | 8962.822 | 0.14% | | MergeStoreBench.getIntRLU | 9249.430 | 9258.589 | -0.10% | | MergeStoreBench.getIntRU | 2510.359 | 2505.505 | 0.19% | | MergeStoreBench.getIntU | 2493.932 | 2494.808 | -0.04% | | MergeStoreBench.getLongB | 24811.283 | 24804.034 | 0.03% | | MergeStoreBench.getLongBU | 14024.209 | 14013.247 | 0.08% | | MergeStoreBench.getLongBV | 601.852 | 602.426 | -0.10% | | MergeStoreBench.getLongL | 25073.219 | 25115.247 | -0.17% | | MergeStoreBench.getLongLU | 14483.618 | 14497.662 | -0.10% | | MergeStoreBench.getLongLV | 2225.597 | 2225.810 | -0.01% | | MergeStoreBench.getLongRB | 24832.411 | 24801.799 | 0.12% | | MergeStoreBench.getLongRBU | 14027.084 | 14026.284 | 0.01% | | MergeStoreBench.getLongRL | 25008.679 | 25113.927 | -0.42% | | MergeStoreBench.getLongRLU | 14425.883 | 14493.830 | -0.47% | | MergeStoreBench.getLongRU | 3059.614 | 3058.726 | 0.03% | | MergeStoreBench.getLongU | 3049.682 | 3048.266 | 0.05% | | MergeStoreBench.putBytes4 | 4475.891 | 929.327 | 381.63% | | MergeStoreBench.putBytes4GetBytes | 5880.164 | 5883.995 | -0.07% | | MergeStoreBench.putBytes4U | 4479.133 | 928.502 | 382.40% | | MergeStoreBench.putBytes4X | 4477.133 | 929.183 | 381.84% | | MergeStoreBench.putChars4B | 9008.350 | 5638.550 | 59.76% | | MergeStoreBench.putChars4BU | 8961.671 | 1144.479 | 683.03% | | MergeStoreBench.putChars4BV | 4488.270 | 4486.457 | 0.04% | | MergeStoreBench.putChars4C | 4485.308 | 1133.473 | 295.71% | | MergeStoreBench.putChars4L | 9013.570 | 5640.893 | 59.79% | | MergeStoreBench.putChars4LU | 8957.625 | 1142.796 | 683.83% | | MergeStoreBench.putChars4LV | 4488.698 | 1134.303 | 295.72% | | MergeStoreBench.putChars4S | 4485.836 | 1133.430 | 295.78% | | MergeStoreBench.setCharBS | 6088.826 | 6085.857 | 0.05% | | MergeStoreBench.setCharBV | 3596.210 | 3595.236 | 0.03% | | MergeStoreBench.setCharC | 4519.981 | 4471.174 | 1.09% | | MergeStoreBench.setCharLS | 5619.414 | 5618.239 | 0.02% | | MergeStoreBench.setCharLV | 2248.493 | 2245.939 | 0.11% | | MergeStoreBench.setIntB | 8039.705 | 8045.113 | -0.07% | | MergeStoreBench.setIntBU | 17884.223 | 17764.347 | 0.67% | | MergeStoreBench.setIntBV | 3239.985 | 3227.997 | 0.37% | | MergeStoreBench.setIntL | 15430.183 | 2113.544 | 630.06% | | MergeStoreBench.setIntLU | 17361.730 | 4783.040 | 262.99% | | MergeStoreBench.setIntLV | 2128.975 | 2126.187 | 0.13% | | MergeStoreBench.setIntRB | 13786.186 | 13815.759 | -0.21% | | MergeStoreBench.setIntRBU | 14747.463 | 14771.017 | -0.16% | | MergeStoreBench.setIntRL | 16525.068 | 3244.126 | 409.38% | | MergeStoreBench.setIntRLU | 14401.071 | 5930.149 | 142.85% | | MergeStoreBench.setIntRU | 5898.169 | 5875.589 | 0.38% | | MergeStoreBench.setIntU | 4805.170 | 4784.162 | 0.44% | | MergeStoreBench.setLongB | 31674.058 | 31662.483 | 0.04% | | MergeStoreBench.setLongBU | 25696.702 | 25674.394 | 0.09% | | MergeStoreBench.setLongBV | 2168.387 | 2165.313 | 0.14% | | MergeStoreBench.setLongL | 31353.713 | 5405.189 | 480.07% | | MergeStoreBench.setLongLU | 26113.756 | 4287.166 | 509.11% | | MergeStoreBench.setLongLV | 2048.737 | 2116.054 | -3.18% | | MergeStoreBench.setLongRB | 29901.778 | 29909.501 | -0.03% | | MergeStoreBench.setLongRBU | 24945.914 | 25005.171 | -0.24% | | MergeStoreBench.setLongRL | 27232.898 | 4523.658 | 502.01% | | MergeStoreBench.setLongRLU | 26196.973 | 4798.177 | 445.98% | | MergeStoreBench.setLongRU | 4797.817 | 4795.018 | 0.06% | | MergeStoreBench.setLongU | 4500.659 | 4271.225 | 5.37% | ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2475069346 From fyang at openjdk.org Thu Nov 14 00:56:58 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Nov 2024 00:56:58 GMT Subject: RFR: 8344074: RISC-V: C1: More accurate _exception_handler_size and _deopt_handler_size In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 00:38:11 GMT, Fei Yang wrote: > Hi, please review this small change. > > I find that the reserved size for these two handlers are not accurate and are larger than needed. For `_exception_handler_size`, the used size is only 20 bytes for release build and 126 bytes for debug build (with -XX:+VerifyOops). Considering that the exception handler is not trivial, I reserved a little bit more than needed for release build (32 bytes). For `_deopt_handler_size`, `far_jump` will always emit two instructions. > > Testing on linux-riscv64: > - [x] tier1 (release) > - [x] hotspot:tier1 (fastdebug) Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22053#issuecomment-2475125986 From fyang at openjdk.org Thu Nov 14 00:58:26 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Nov 2024 00:58:26 GMT Subject: Integrated: 8344074: RISC-V: C1: More accurate _exception_handler_size and _deopt_handler_size In-Reply-To: References: Message-ID: <4Eh9pp7VRXPC24K0NohhAQ9JUuo6RBiomVrYlr5fvXA=.008cab91-6f96-47ea-b145-5292c9fc6ee8@github.com> On Wed, 13 Nov 2024 00:38:11 GMT, Fei Yang wrote: > Hi, please review this small change. > > I find that the reserved size for these two handlers are not accurate and are larger than needed. For `_exception_handler_size`, the used size is only 20 bytes for release build and 126 bytes for debug build (with -XX:+VerifyOops). Considering that the exception handler is not trivial, I reserved a little bit more than needed for release build (32 bytes). For `_deopt_handler_size`, `far_jump` will always emit two instructions. > > Testing on linux-riscv64: > - [x] tier1 (release) > - [x] hotspot:tier1 (fastdebug) This pull request has now been integrated. Changeset: 90e92342 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/90e92342fc26db4876e22e8379a2c803c9de232c Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod 8344074: RISC-V: C1: More accurate _exception_handler_size and _deopt_handler_size Reviewed-by: mli, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/22053 From vlivanov at openjdk.org Thu Nov 14 03:04:57 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 14 Nov 2024 03:04:57 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v4] In-Reply-To: <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> Message-ID: On Wed, 13 Nov 2024 02:43:12 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains seven commits: > > - Removing target specific hooks > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - Review resoultions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 > - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code > - Review resolutions > - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction Overall, looks good. Some minor refactoring suggestions follow. src/hotspot/share/opto/vectornode.cpp line 2092: > 2090: n->in(1)->is_Con() && > 2091: n->in(1)->bottom_type()->isa_long() && > 2092: n->in(1)->bottom_type()->is_long()->get_con() <= 4294967295L; Is it clearer to use `0xFFFFFFFFL` representation here? src/hotspot/share/opto/vectornode.cpp line 2114: > 2112: > 2113: static bool has_vector_elements_fit_int(Node* n) { > 2114: auto is_cast_integer_to_long_pattern = [](const Node* n) { I like how you use lambda expressions for node predicates. Please, shape `has_vector_elements_fit_uint()` in a similar fashion. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 51: > 49: > 50: public static final int SIZE = 1024; > 51: public static final Random r = new Random(1024); For reproducibility purposes, it's better to use `jdk.test.lib.Utils.getRandomInstance()`. It reports the seed and supports overriding it. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 105: > 103: LongVector vsrc2 = LongVector.fromArray(LSP, lsrc2, i); > 104: vsrc1.lanewise(VectorOperators.AND, 0xFFFFFFFFL) > 105: .lanewise(VectorOperators.MUL, vsrc2.lanewise(VectorOperators.AND, 0xFFFFFFFFL)) It would be nice to randomize the constants (masks and shifts) to improve test coverage. ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2434942773 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1841495626 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1841496794 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1841491826 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1841492208 From swen at openjdk.org Thu Nov 14 05:58:11 2024 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 14 Nov 2024 05:58:11 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> References: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> Message-ID: <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> On Mon, 11 Nov 2024 07:24:24 GMT, Emanuel Peter wrote: >> You can find an example of how to do that easily here: >> https://github.com/openjdk/jdk/pull/19970/files#diff-9072c369f5b541ef9fca3ad8320aa59e88cc72f203c03da58100b1d111ffc324R746-R749 > >> @eme64 Why is there no noticeable difference in the performance of +/-MergeStores > > What did you do to find out yourself? Did you use the trace flags to see if there is a difference in what is optimized / the output assembly code? @eme64 Are there plans to support MergeLoad, and big-endian MergeStore on little-endian machines? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2475478884 From thartmann at openjdk.org Thu Nov 14 06:34:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 14 Nov 2024 06:34:25 GMT Subject: RFR: 8326369: Bimorphic inlining not applied at a call site that was initially monomorphic [v3] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: <_R4wKwGxFzXPVdXRvCbG5XL50DDBZxnodmZMmbxnW9E=.96900e4d-3967-4f1f-8750-5a4a1fa8770d@github.com> On Tue, 12 Nov 2024 12:36:38 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Added Jetbrains copyright Looks good to me otherwise. test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java line 27: > 25: /** > 26: * @test > 27: * @bug 8326369 8339299 This is only a regression test for JDK-8339299: Suggestion: * @bug 8339299 ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21920#pullrequestreview-2435156192 PR Review Comment: https://git.openjdk.org/jdk/pull/21920#discussion_r1841636400 From thartmann at openjdk.org Thu Nov 14 06:46:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 14 Nov 2024 06:46:14 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Wed, 13 Nov 2024 12:19:12 GMT, Christian Hagedorn wrote: >> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. >> >> ### Goal of Assertion Predicates >> #### Initialized Assertion Predicates >> These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. >> >> #### Template Assertion Predicates >> Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). >> >> ### Why Did we Use UCTs for Template Assertion Predicates? >> When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. >> >> ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? >> #### Missing UCTs for Predicates above Loops >> Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. >> >> #### Missing UCTs to Create Template Assertion Predicates >> Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Te... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > More renaming and comment fixes Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22040#pullrequestreview-2435171664 From thartmann at openjdk.org Thu Nov 14 06:46:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 14 Nov 2024 06:46:14 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Wed, 13 Nov 2024 12:07:42 GMT, Christian Hagedorn wrote: >> Maybe there should be some sort of `UnswitchingResult`, that has the projections and `old_new` mapping? That could then be passed as a result. What do you think? Can be a separate RFE of course. > > Adventurous indeed. However, I'm planning to get rid of all this predicate code for Loop Unswitching anyway with [JDK-8344035](https://bugs.openjdk.org/browse/JDK-8344035). So, I guess it's fine to not further update the code at this point. It's quite common to pass a pointer by reference in C2 code, unfortunately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22040#discussion_r1841646298 From thartmann at openjdk.org Thu Nov 14 06:46:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 14 Nov 2024 06:46:53 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> Message-ID: On Wed, 13 Nov 2024 14:09:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2435175675 From thartmann at openjdk.org Thu Nov 14 06:46:54 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 14 Nov 2024 06:46:54 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v2] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:06:24 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/x86.ad line 2743: >> >>> 2741: case T_BYTE: val->at(i) = con; break; >>> 2742: case T_SHORT: { >>> 2743: jshort c = con; >> >> Why are these casts needed? Isn't `T con` already of the appropriate j-type? > > No for example when `bt == T_BYTE`, `T` is actual `jint`. As a result, I do this for all the cases for uniformity, also a mismatch will not result in a crash but may silently write the wrong data so I'm extra cautious here. Makes sense, thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1841648097 From epeter at openjdk.org Thu Nov 14 07:00:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 07:00:13 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> References: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> Message-ID: <0bDxaCchizjbuutDlPOANL0F8f9fftZy-XLJ2XodJlo=.c79d363e-a078-4538-9ff5-47087c60fe01@github.com> On Thu, 14 Nov 2024 05:55:24 GMT, Shaojin Wen wrote: >>> @eme64 Why is there no noticeable difference in the performance of +/-MergeStores >> >> What did you do to find out yourself? Did you use the trace flags to see if there is a difference in what is optimized / the output assembly code? > > @eme64 Are there plans to support MergeLoad, and big-endian MergeStore on little-endian machines? @wenshao I'll look at your results later. > @eme64 Are there plans to support MergeLoad, and big-endian MergeStore on little-endian machines? These are all good ideas, and I already discussed it offline with @cl4es . I have lots of tasks I'm working on, and this is on the lowest tier of priorities for me personally. But if someone else wants to jump on that, then I can coach and review. We could also be interested in "MergeCopy", i.e. load->store patterns. Maybe this just ends up being SuperWord again, but this time for straight line code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2475556984 From epeter at openjdk.org Thu Nov 14 07:05:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 07:05:59 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException [v2] In-Reply-To: References: Message-ID: > Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: do what shipilev said ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22080/files - new: https://git.openjdk.org/jdk/pull/22080/files/3e95dfbc..ebce52c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22080&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22080&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22080.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22080/head:pull/22080 PR: https://git.openjdk.org/jdk/pull/22080 From epeter at openjdk.org Thu Nov 14 07:05:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 07:05:59 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 15:34:07 GMT, Emanuel Peter wrote: > Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. > `Math.abs(Integer.MIN_VALUE)` strikes _AGAIN_, it is a tremendous fun every time. Why not just `RANDOM.nextInt(100)`? Yeah, I thought of that too when falling asleep yesterday ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22080#issuecomment-2475560590 From chagedorn at openjdk.org Thu Nov 14 07:16:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Nov 2024 07:16:53 GMT Subject: RFR: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps [v3] In-Reply-To: References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Wed, 13 Nov 2024 12:19:12 GMT, Christian Hagedorn wrote: >> This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. >> >> ### Goal of Assertion Predicates >> #### Initialized Assertion Predicates >> These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. >> >> #### Template Assertion Predicates >> Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). >> >> ### Why Did we Use UCTs for Template Assertion Predicates? >> When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. >> >> ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? >> #### Missing UCTs for Predicates above Loops >> Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. >> >> #### Missing UCTs to Create Template Assertion Predicates >> Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Te... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > More renaming and comment fixes Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22040#issuecomment-2475582890 From chagedorn at openjdk.org Thu Nov 14 07:16:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Nov 2024 07:16:53 GMT Subject: Integrated: 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps In-Reply-To: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> References: <3YvrlrUxn0ytaz6AzcgpYP5LqmGuOucwR_1tEpyrX8A=.2af5bcd3-1870-48d9-a336-d06783035d93@github.com> Message-ID: On Tue, 12 Nov 2024 15:06:52 GMT, Christian Hagedorn wrote: > This patch replaces the creation of Template Assertion Predicates with uncommon traps with Halt nodes. > > ### Goal of Assertion Predicates > #### Initialized Assertion Predicates > These predicates ensure that control is properly folded when data is dying. They are **always true by design** and thus can never fail at runtime. We therefore put a halt node on the failing path. > > #### Template Assertion Predicates > Only serve as templates to create Initialized Assertion Predicates from. They are never executed and are always removed after loop opts are over. Conceptionally, it does not matter whether the failing path uses an UCT or a halt node (or something else completely - I plan to have a separate "no-op" `TemplateAssertionPredicateNode` at some point which only falls through to the next node and does not have a failing path at all). > > ### Why Did we Use UCTs for Template Assertion Predicates? > When the concept of Assertion Predicates was first introduced, it only covered a few edge cases. It was quite straight forward to reuse existing Loop Predication code which creates new predicates from a Parse Predicate by copying it and merging the UCTs on the failing paths with a region node. This was done with `PhaseIdealLoop::create_new_if_for_predicate()`. > > ### Why Do we Need to Use Halt Nodes for Template Assertion Predicates? > #### Missing UCTs for Predicates above Loops > Over time, we found more cases where we need to create Initialized Assertion Predicates from templates - including locations where we do not have Parse Predicates (and thus no safepoints). For example, when peeling one iteration off a loop with Parse Predicates, they will be kept at the peeled iteration and the remaining loop does not have any Parse Predicates anymore. > > #### Missing UCTs to Create Template Assertion Predicates > Whenever we split a loop with Template Assertion Predicates, we also need to ensure that they are copied to all split loop versions. Since they rely on using UCTs, we also need to make sure that an UCT/safepoint is available to be used. However, this is not always the case (for example, after peeling an iteration off as described in the last section). As a result, we cannot easily establish new Template Assertion Predicates anywhere. One could think about faking an UCT or doing other special logic. But this seems rather fragile and could introduce quite some complexity - especially since we conceptionally don't even need to use UCTs at all for Template Assertion Predicates. > > There ... This pull request has now been integrated. Changeset: c977ef7b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c977ef7b45c5ab7be37169d4b673134e49c40a41 Stats: 213 lines in 6 files changed: 30 ins; 77 del; 106 mod 8342047: Create Template Assertion Predicates with Halt nodes only instead of uncommon traps Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/22040 From rkennke at openjdk.org Thu Nov 14 07:24:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 14 Nov 2024 07:24:25 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 23:49:01 GMT, Dean Long wrote: > It looks like this only works for little-endian. Is that documented somewhere? I am not sure what you mean. This change is about x86_64 and aarch64, and both are little-endian. The layout of the mark-word is documented in markWord.hpp. Is that what you are looking for? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2475595691 From chagedorn at openjdk.org Thu Nov 14 07:30:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Nov 2024 07:30:40 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 17:07:33 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split MERGE_MULTIDEFS Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22017#pullrequestreview-2435242969 From chagedorn at openjdk.org Thu Nov 14 07:35:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 14 Nov 2024 07:35:13 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 07:05:59 GMT, Emanuel Peter wrote: >> Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > do what shipilev said Hitting the unlikely but at least it produced a 42 instead, though a negative one :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22080#pullrequestreview-2435247224 From epeter at openjdk.org Thu Nov 14 07:52:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 07:52:20 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> References: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> Message-ID: On Thu, 14 Nov 2024 05:55:24 GMT, Shaojin Wen wrote: >>> @eme64 Why is there no noticeable difference in the performance of +/-MergeStores >> >> What did you do to find out yourself? Did you use the trace flags to see if there is a difference in what is optimized / the output assembly code? > > @eme64 Are there plans to support MergeLoad, and big-endian MergeStore on little-endian machines? @wenshao Ah. I only just realized it: you have a lot of `get` benchmarks... they don't really belong to `MergeStores`... if anything you could put them in a separate `MergeLoads` benchmark! Also: now we have lots of data here. But data alone is kind of pointless. We need analysis to see **what patterns** and **why** they get speedups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2475640074 From shade at openjdk.org Thu Nov 14 07:56:15 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 14 Nov 2024 07:56:15 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException [v2] In-Reply-To: References: Message-ID: <8UHSRXVy2OR1scdznu0JbdW2WkOl9PD8a1lXDHnZeIY=.257f99a9-97a2-4a75-9a54-a08c74e9a96f@github.com> On Thu, 14 Nov 2024 07:05:59 GMT, Emanuel Peter wrote: >> Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > do what shipilev said Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22080#pullrequestreview-2435283199 From tholenstein at openjdk.org Thu Nov 14 08:45:59 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 08:45:59 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV Message-ID: IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network ### Add a new option "!" to dump_bfs The option ! send the printed nodes of dump_bfs to IGV and shows them p find_node(0)->dump_bfs(1,0,"dcmxo+!") dist dump --------------------------------------------- 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] 0 0 Root === 0 51 [[ 0 1 3 26 ]] Method printed over network stream to IGV dump ------------- Commit messages: - Update src/hotspot/share/opto/idealGraphPrinter.hpp - Update src/hotspot/share/opto/idealGraphPrinter.hpp - Update src/hotspot/share/opto/idealGraphPrinter.cpp - Update src/hotspot/share/opto/idealGraphPrinter.cpp - Update src/hotspot/share/opto/idealGraphPrinter.cpp - Update src/hotspot/share/opto/compile.hpp - Update src/hotspot/share/opto/compile.cpp - Update src/hotspot/share/opto/compile.cpp - Update src/hotspot/share/opto/node.cpp - JDK-8344122: IGV: Extends c2 IdealGraphPrinter to send subgraphs to IGV Changes: https://git.openjdk.org/jdk/pull/22076/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344122 Stats: 72 lines in 6 files changed: 51 ins; 2 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From epeter at openjdk.org Thu Nov 14 08:46:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 08:46:01 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV In-Reply-To: References: Message-ID: <2cSNg5EUSErn3fXs6GFaFl7YPQuXpJHJ47NhDK0ECrA=.a5b5b46f-51ba-4f59-98a2-544a5b8ba767@github.com> On Wed, 13 Nov 2024 14:41:24 GMT, Tobias Holenstein wrote: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Very nice. I manually verified it, and it works! I just have some code style suggestions. ![image](https://github.com/user-attachments/assets/5515a542-c8e1-487f-a17c-c6a558044cfa) You also need to fix this. src/hotspot/share/opto/compile.cpp line 5291: > 5289: } > 5290: > 5291: void Compile::igv_print_graph_to_network(const char *name, Node *node, GrowableArray &visible_nodes) { Suggestion: void Compile::igv_print_graph_to_network(const char* name, Node* node, GrowableArray & visible_nodes) { src/hotspot/share/opto/compile.cpp line 5298: > 5296: } > 5297: tty->print_cr("Method printed over network stream to IGV"); > 5298: _debug_network_printer->print(name, (Node *) Compile::current()->root(), visible_nodes); Suggestion: _debug_network_printer->print(name, (Node*) Compile::current()->root(), visible_nodes); src/hotspot/share/opto/compile.hpp line 721: > 719: void igv_print_method_to_file(const char* phase_name = "Debug", bool append = false); > 720: void igv_print_method_to_network(const char* phase_name = "Debug"); > 721: void igv_print_graph_to_network(const char *name, Node *node, GrowableArray &visible_nodes); Suggestion: void igv_print_graph_to_network(const char* name, Node* node, GrowableArray& visible_nodes); src/hotspot/share/opto/idealGraphPrinter.cpp line 359: > 357: } > 358: > 359: void IdealGraphPrinter::visit_node(Node *n, bool edges) { Suggestion: void IdealGraphPrinter::visit_node(Node* n, bool edges) { src/hotspot/share/opto/idealGraphPrinter.cpp line 825: > 823: ResourceMark rm; > 824: GrowableArray empty_list; > 825: print(name, (Node *) C->root(), empty_list); Suggestion: void IdealGraphPrinter::print_graph(const char* name) { ResourceMark rm; GrowableArray empty_list; print(name, (Node*) C->root(), empty_list); src/hotspot/share/opto/idealGraphPrinter.cpp line 829: > 827: > 828: // Print current ideal graph > 829: void IdealGraphPrinter::print(const char *name, Node *node, GrowableArray &visible_nodes) { Suggestion: void IdealGraphPrinter::print(const char* name, Node* node, GrowableArray& visible_nodes) { src/hotspot/share/opto/idealGraphPrinter.hpp line 117: > 115: ciField* find_source_field_of_array_access(const Node* node, uint& depth); > 116: static Node* get_load_node(const Node* node); > 117: void walk_nodes(Node *start, bool edges); Suggestion: void walk_nodes(Node* start, bool edges); src/hotspot/share/opto/idealGraphPrinter.hpp line 148: > 146: void end_method(); > 147: void print_graph(const char *name); > 148: void print(const char *name, Node *root, GrowableArray &hidden_nodes); Suggestion: void print_graph(const char* name); void print(const char* name, Node* root, GrowableArray& hidden_nodes); src/hotspot/share/opto/node.cpp line 2055: > 2053: Compile* C = Compile::current(); > 2054: if (C->should_print_igv(0)) { > 2055: C->igv_print_graph_to_network("PrintBFS", (Node *) Compile::current()->root(), _print_list); Suggestion: C->igv_print_graph_to_network("PrintBFS", (Node*) Compile::current()->root(), _print_list); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2435251368 PR Comment: https://git.openjdk.org/jdk/pull/22076#issuecomment-2475627084 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841697489 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841697713 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841698138 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841698710 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841699604 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841700130 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841700985 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841701436 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1841701973 From rcastanedalo at openjdk.org Thu Nov 14 08:57:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 08:57:51 GMT Subject: RFR: 8343941: IGV: dump graph at different register allocation steps [v3] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 17:07:33 GMT, Roberto Casta?eda Lozano wrote: >> This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: >> - Initial liveness: after initial liveness information is computed. >> - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. >> - Initial spilling: after initial round of spilling derived from physical interference graph construction. >> - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). >> - Iterative spilling: after each round of spilling. >> - After iterative spilling: after the main register allocation loop. >> - Post-allocation copy removal: after peephole copy removal. >> - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. >> - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. >> >> The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). >> >> #### Testing >> >> - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). >> - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Split MERGE_MULTIDEFS Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22017#issuecomment-2475763020 From rcastanedalo at openjdk.org Thu Nov 14 08:59:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 08:59:59 GMT Subject: Integrated: 8343941: IGV: dump graph at different register allocation steps In-Reply-To: References: Message-ID: On Mon, 11 Nov 2024 14:41:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset dumps C2's low-level intermediate representation at the following intermediate register allocation points: > - Initial liveness: after initial liveness information is computed. > - Aggressive coalescing: after aggressively coalescing live ranges and destructing SSA. > - Initial spilling: after initial round of spilling derived from physical interference graph construction. > - Conservative coalescing: after each round of conservative (colorability-preserving) coalescing (if `OptoCoalesce` is enabled). > - Iterative spilling: after each round of spilling. > - After iterative spilling: after the main register allocation loop. > - Post-allocation copy removal: after peephole copy removal. > - Merge multiple definitions: after local merging of equivalent nodes related by the same live range. > - Fix up spills: convert load-store spills into memory operand accesses ("CISC spilling") if allowed by the target platform and `UseCISCSpill` is enabled. > > The new dumps have already proved to be useful in the investigation of [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295). > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64). > - Tested automatically that dumping, scheduling, and viewing hundreds of the new graphs does not trigger any failure on HotSpot or IGV (by instrumenting IGV to schedule and view graphs eagerly and running `java -Xbatch -XX:PrintIdealGraphLevel=4`). This pull request has now been integrated. Changeset: a8152bdb Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/a8152bdb9a52d902b8e710626317e0f944cf2769 Stats: 46 lines in 3 files changed: 46 ins; 0 del; 0 mod 8343941: IGV: dump graph at different register allocation steps Reviewed-by: chagedorn, dfenacci, dlunden ------------- PR: https://git.openjdk.org/jdk/pull/22017 From rcastanedalo at openjdk.org Thu Nov 14 09:38:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 09:38:19 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 19:45:36 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Improve opto asm of LoadNKlass While this does not address IMO the conceptual problem of bending the meaning of `oopDesc::klass_offset_in_bytes()` through the C2 code base, it is a clear improvement over the existing model, thanks. An advantage of this model over the one proposed in the JBS issue is that it exposes to C2 the actual address that will be loaded. This improves the confidence that there will not be any issue when matching complex addressing modes, etc. due to a mismatch between what C2 sees and what finally gets emitted. It would be good to document the overloaded semantics of LoadNKlass for compact headers. Here is a suggestion: https://github.com/openjdk/jdk/commit/042317434d4644ac8f3591c8b1021e5651b5ed6d. If you agree, feel free to incorporate it as-is or edit it to your liking. src/hotspot/cpu/aarch64/aarch64.ad line 6702: > 6700: format %{ > 6701: "ldrw $dst, $mem\t# compressed class ptr, shifted" > 6702: "lsrw $dst, markWord::klass_shift_at_offset" Suggestion: "lsrw $dst, $dst, markWord::klass_shift_at_offset" ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22078#pullrequestreview-2435462969 PR Review Comment: https://git.openjdk.org/jdk/pull/22078#discussion_r1841832635 From rcastanedalo at openjdk.org Thu Nov 14 09:51:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 09:51:30 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 19:45:36 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Improve opto asm of LoadNKlass src/hotspot/cpu/aarch64/aarch64.ad line 6701: > 6699: ins_cost(4 * INSN_COST); > 6700: format %{ > 6701: "ldrw $dst, $mem\t# compressed class ptr, shifted" Suggestion: "ldrw $dst, $mem\t# compressed class ptr, shifted\n\t" src/hotspot/cpu/x86/x86_64.ad line 4372: > 4370: ins_cost(125); // XXX > 4371: format %{ > 4372: "movl $dst, $mem\t# compressed klass ptr, shifted" Suggestion: "movl $dst, $mem\t# compressed klass ptr, shifted\n\t" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22078#discussion_r1841885880 PR Review Comment: https://git.openjdk.org/jdk/pull/22078#discussion_r1841886831 From epeter at openjdk.org Thu Nov 14 10:21:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 14 Nov 2024 10:21:56 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:41:24 GMT, Tobias Holenstein wrote: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Nice Work, looks great :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2435664124 From duke at openjdk.org Thu Nov 14 10:51:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 14 Nov 2024 10:51:03 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: - Update copyright - Remove useless comment - Append late inline messages to existing inline messages ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/79246784..5a4e0fbe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=00-01 Stats: 72 lines in 6 files changed: 44 ins; 21 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 14 10:51:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 14 Nov 2024 10:51:03 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 08:43:09 GMT, Roland Westrelin wrote: >> I've been trying to understand all these print_inlining_*() functions for a few days now, and I still don't understand the rules about when each can be called, when we should overwrite and when we should append, when the stringStream should be empty or not empty, and how _print_inlining_list works. Then there is the parallel InlineTree that we build, and it has a success/fail message attached too. > >> I've been trying to understand all these print_inlining_*() functions for a few days now, and I still don't understand the rules about when each can be called, when we should overwrite and when we should append, when the stringStream should be empty or not empty, and how _print_inlining_list works. Then there is the parallel InlineTree that we build, and it has a success/fail message attached too. > > It is indeed a mess. > The way this work, I think, is that the message for the inlining that's currently happening is accumulated in `_print_inlining_stream`. > `_print_inlining_list` is the list of inlining messages. A single entry of `_print_inlining_list` may contain the aggregated messages for multiple call sites. So once we are done, we simply iterate over the list and output each entry. > If there's no late inlining involved, then `_print_inlining_list` only has a single entry. > When a call site is a candidate for late inlining (i.e. there is a chance that some messages need to be inserted at the current point at a later time), then a new element is added to `_print_inlining_list`. > If late inlining does happen at that call site, the logic iterates over `_print_inlining_list` and finds the entry with the matching `CallGenerator`. When the call site is inlined, it's possible that this will cause some inlining to happen right away (and so messages to be appended to the current `_print_inlining_list` entry) and some more late inlining to happen later on (and so a new entry to be added to `_print_inlining_list` right after the current one, possibly in the middle of the list). > > If I remember correctly I tried using `InlineTree` instead but that didn't work well. I don't remember the details though. I have made changes such that previous information is no longer lost. So for @rwestrel's example the output is now: 200 24 n jdk.internal.vm.Continuation::doYield (native) (static) 208 25 b TestLateInlining::test1 (4 bytes) @ 0 TestLateInlining::inlined1 (1 bytes) inline (hot); late inline succeeded I will work on adding a test for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2476015526 From rkennke at openjdk.org Thu Nov 14 11:26:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 14 Nov 2024 11:26:31 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v3] In-Reply-To: References: Message-ID: > We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. > > However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. > > Testing: > - [x] tier1 aarch64 +UCOH > - [x] tier1 x86_64 +UCOH Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Fix aarch64 opto output - Clarify semantics of LoadNKlassNode and effect on C2's type system ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22078/files - new: https://git.openjdk.org/jdk/pull/22078/files/d2010d1e..7d32a835 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=01-02 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22078.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22078/head:pull/22078 PR: https://git.openjdk.org/jdk/pull/22078 From rkennke at openjdk.org Thu Nov 14 11:32:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 14 Nov 2024 11:32:47 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v4] In-Reply-To: References: Message-ID: > We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. > > However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. > > Testing: > - [x] tier1 aarch64 +UCOH > - [x] tier1 x86_64 +UCOH Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add missing newline in opto output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22078/files - new: https://git.openjdk.org/jdk/pull/22078/files/7d32a835..441d074a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22078&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22078.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22078/head:pull/22078 PR: https://git.openjdk.org/jdk/pull/22078 From rkennke at openjdk.org Thu Nov 14 11:36:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 14 Nov 2024 11:36:02 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 09:34:02 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve opto asm of LoadNKlass > > While this does not address IMO the conceptual problem of bending the meaning of `oopDesc::klass_offset_in_bytes()` through the C2 code base, it is a clear improvement over the existing model, thanks. > > An advantage of this model over the one proposed in the JBS issue is that it exposes to C2 the actual address that will be loaded. This improves the confidence that there will not be any issue when matching complex addressing modes, etc. due to a mismatch between what C2 sees and what finally gets emitted. > > It would be good to document the overloaded semantics of LoadNKlass for compact headers. Here is a suggestion: https://github.com/openjdk/jdk/commit/042317434d4644ac8f3591c8b1021e5651b5ed6d. If you agree, feel free to incorporate it as-is or edit it to your liking. Thanks, @robcasloz ! I made the suggested changes. Yeah, I agree, it is still not ideal. But it seems the most-correct way to handle it. Re-shaping LoadNKlass as you suggested would break the semantics of a LoadNode and I am not sure about the subtle or not-so-subtle consequences of that. It seems most-correct to have LoadNKlass use offset 0, but that also sounds scary and potentially affecting all mark-word accesses. Actually using the offset 4 currently seems the sanest approach to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2476121289 From mli at openjdk.org Thu Nov 14 11:50:47 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 14 Nov 2024 11:50:47 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) Message-ID: Hi, Can you help to review the patch? It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. Thanks ## Performance Tests on bananapi, for other platform, please check jbs issue for test data. ### Before data Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units -- | -- | -- | -- | -- | -- | -- | -- o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op ### After data Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units -- | -- | -- | -- | -- | -- | -- | -- o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/22102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334474 Stats: 262 lines in 5 files changed: 0 ins; 261 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22102/head:pull/22102 PR: https://git.openjdk.org/jdk/pull/22102 From fyang at openjdk.org Thu Nov 14 12:20:20 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Nov 2024 12:20:20 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) In-Reply-To: References: Message-ID: <5rvGx0VaBS0UPWmE-YXv3uIhbO7RO2jvuJgOysrv5is=.3fa5b6e6-af89-4540-95ac-b256054ab8d7@github.com> On Thu, 14 Nov 2024 11:45:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. > > Thanks > > ## Performance > Tests on bananapi, for other platform, please check jbs issue for test data. > > ### Before > data > > Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op > > > > > ### After > data > > Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op > > Thanks for performing the test. Revert change looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22102#pullrequestreview-2435923360 From mli at openjdk.org Thu Nov 14 12:34:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 14 Nov 2024 12:34:43 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 11:32:47 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline in opto output Looks good. I'll do a cleanup accordingly on riscv later. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22078#pullrequestreview-2435954278 From rcastanedalo at openjdk.org Thu Nov 14 12:39:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 12:39:57 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV In-Reply-To: References: Message-ID: <5REYqEyj_j25lTRZqHyaZuqplfXKsNth8ArhohqGBW0=.7a8d2a29-e94a-4721-a841-6640c5aa1a7d@github.com> On Wed, 13 Nov 2024 14:41:24 GMT, Tobias Holenstein wrote: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Great improvement, thanks Toby! Please add a line to `PrintBFS::print_options_help` documenting the new option, something like: ` _output->print_cr(" !: show nodes on IGV (sent over network stream)");`. Otherwise, I just have a couple of minor suggestions and comments. As a follow-up improvement, it would be great if we could similarly extend `igv_print(bool network)`, when `network == true`, to tell IGV to automatically open and focus the graph sent from the debugger. src/hotspot/share/opto/compile.cpp line 5288: > 5286: ResourceMark rm; > 5287: GrowableArray empty_list; > 5288: igv_print_graph_to_network(phase_name, (Node *) C->root(), empty_list); Suggestion: igv_print_graph_to_network(phase_name, (Node*) C->root(), empty_list); src/hotspot/share/opto/compile.cpp line 5291: > 5289: } > 5290: > 5291: void Compile::igv_print_graph_to_network(const char* name, Node* node, GrowableArray & visible_nodes) { Suggestion: void Compile::igv_print_graph_to_network(const char* name, Node* node, GrowableArray& visible_nodes) { src/hotspot/share/opto/node.cpp line 2054: > 2052: if (_print_igv) { > 2053: Compile* C = Compile::current(); > 2054: if (C->should_print_igv(0)) { I guess the reason to call `should_print_igv` here is to initialize `Compile::_igv_printer` if necessary. Would it be possible to factor out the initialization part https://github.com/openjdk/jdk/blob/2145ace384137b1c028a68dc34a8800577c7a43e/src/hotspot/share/opto/compile.cpp#L5214-L5217 into a separate function that only checks if `_igv_printer` is `nullptr` and, if so, initializes it, and only call that function from here? src/hotspot/share/opto/node.cpp line 2055: > 2053: Compile* C = Compile::current(); > 2054: if (C->should_print_igv(0)) { > 2055: C->igv_print_graph_to_network("PrintBFS", (Node*) Compile::current()->root(), _print_list); Suggestion: C->igv_print_graph_to_network("PrintBFS", (Node*) C->root(), _print_list); ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2435915234 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1842118240 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1842121007 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1842136714 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1842130090 From duke at openjdk.org Thu Nov 14 12:55:55 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 14 Nov 2024 12:55:55 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/5a4e0fbe..8a78e358 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=01-02 Stats: 77 lines in 1 file changed: 77 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 14 12:55:55 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 14 Nov 2024 12:55:55 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 10:51:03 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Update copyright > - Remove useless comment > - Append late inline messages to existing inline messages The test has been added. We should consider to open a RFE to refactor the way inline printing in general works in the future, currently it's really messy. @vnkozlov @chhagedorn @rwestrel Would you like to take another look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2476281326 From syan at openjdk.org Thu Nov 14 12:58:50 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 14 Nov 2024 12:58:50 GMT Subject: RFR: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21968#issuecomment-2476288462 From syan at openjdk.org Thu Nov 14 12:58:51 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 14 Nov 2024 12:58:51 GMT Subject: Integrated: 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt In-Reply-To: References: Message-ID: <3BObw7-fseFOnVLIsCBeAtaizq42wujMW0bkeEQ6NWg=.73731f9b-6897-41ca-ab9a-b0007b0ea359@github.com> On Fri, 8 Nov 2024 06:39:24 GMT, SendaoYan wrote: > Hi all, > The test `test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java` can't exclude from `test/hotspot/jtreg/ProblemList.txt` correctly. The test only contains a single test, so it do not need to set test suffix. > This PR remove the test suffix to make the Problemlist work normally, trivial fix, no risk. This pull request has now been integrated. Changeset: 6e28cd3b Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/6e28cd3b795e6538b5b5542595103588dd434559 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8343488: Test VectorRebracket128Test.java can't exclude by test/hotspot/jtreg/ProblemList.txt Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21968 From rcastanedalo at openjdk.org Thu Nov 14 13:50:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 14 Nov 2024 13:50:25 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 11:32:47 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline in opto output Thanks for applying the suggestions. I agree that there is no obvious better solution, at least while both original and compact object headers have to coexist. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22078#pullrequestreview-2436131155 From mli at openjdk.org Thu Nov 14 14:05:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 14 Nov 2024 14:05:42 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: References: Message-ID: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> > Hi, > Can you help to review the patch? > It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. > > Thanks > > ## Performance > Tests on bananapi, for other platform, please check jbs issue for test data. > > ### Before > data > > Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op > > > > > ### After > data > > Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix test typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22102/files - new: https://git.openjdk.org/jdk/pull/22102/files/8cd4e1e9..d1546433 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22102&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22102&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22102/head:pull/22102 PR: https://git.openjdk.org/jdk/pull/22102 From mli at openjdk.org Thu Nov 14 14:05:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 14 Nov 2024 14:05:43 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <5rvGx0VaBS0UPWmE-YXv3uIhbO7RO2jvuJgOysrv5is=.3fa5b6e6-af89-4540-95ac-b256054ab8d7@github.com> References: <5rvGx0VaBS0UPWmE-YXv3uIhbO7RO2jvuJgOysrv5is=.3fa5b6e6-af89-4540-95ac-b256054ab8d7@github.com> Message-ID: On Thu, 14 Nov 2024 12:16:39 GMT, Fei Yang wrote: > Thanks for performing the test. Revert change looks good. Thanks for the review. There was a local typo fix not commit somehow, I just added the fix, please take a another look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22102#issuecomment-2476433239 From galder at openjdk.org Thu Nov 14 14:31:54 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 14 Nov 2024 14:31:54 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v4] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: <9_XMv1F4TFlGpV0bYHRZQC7S8g-nHUHPUJeEYtqeAt8=.6677fce1-6350-42a9-bebe-15254b954a52@github.com> > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/9d9909f8..9945e03b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Thu Nov 14 14:36:16 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 14 Nov 2024 14:36:16 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v5] In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into topic.bimorphic-inlining - Update test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java Co-authored-by: Tobias Hartmann - Added Jetbrains copyright - Added copyright and @bug identifiers - Fix formatting - Fix more formatting issues - Fix formatting - Add test that replicates issue Co-authored-by: Filipp Zhinkin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21920/files - new: https://git.openjdk.org/jdk/pull/21920/files/9945e03b..42152a1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21920&range=03-04 Stats: 121452 lines in 3119 files changed: 37834 ins; 72764 del; 10854 mod Patch: https://git.openjdk.org/jdk/pull/21920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21920/head:pull/21920 PR: https://git.openjdk.org/jdk/pull/21920 From galder at openjdk.org Thu Nov 14 14:36:16 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 14 Nov 2024 14:36:16 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v3] In-Reply-To: <_R4wKwGxFzXPVdXRvCbG5XL50DDBZxnodmZMmbxnW9E=.96900e4d-3967-4f1f-8750-5a4a1fa8770d@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> <_R4wKwGxFzXPVdXRvCbG5XL50DDBZxnodmZMmbxnW9E=.96900e4d-3967-4f1f-8750-5a4a1fa8770d@github.com> Message-ID: On Thu, 14 Nov 2024 06:31:08 GMT, Tobias Hartmann wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Added Jetbrains copyright > > Looks good to me otherwise. @TobiHartmann @eme64 I've fixed title and bug numbers. I've also merged latest master to see if the macos CI issue goes away. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2476504510 From tholenstein at openjdk.org Thu Nov 14 15:01:12 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 15:01:12 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges Message-ID: Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting cut Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` ------------- Commit messages: - remove hide duplicates - CutEdges working - setCutEdges - button added Changes: https://git.openjdk.org/jdk/pull/22108/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22108&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344204 Stats: 269 lines in 12 files changed: 163 ins; 93 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22108/head:pull/22108 PR: https://git.openjdk.org/jdk/pull/22108 From amitkumar at openjdk.org Thu Nov 14 15:01:22 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 14 Nov 2024 15:01:22 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: <8Nft36Wzpo4w9QJUJWUf2GRwEzxOVeLSi6u_n_0ZxDs=.1fc324d7-4d0b-4ec2-8480-31cfe486cd4f@github.com> On Thu, 7 Nov 2024 03:24:31 GMT, Dean Long wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > If you move all these accessor functions into the .hpp or .inline.hpp file, so they can be inlined, then I think the benefit of a macro will be come more apparent, but I won't insist. Let's see what other reviewers think. @dean-long can I get review for this one ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2476592883 From tholenstein at openjdk.org Thu Nov 14 15:08:00 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 15:08:00 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v2] In-Reply-To: References: Message-ID: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/node.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/compile.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22076/files - new: https://git.openjdk.org/jdk/pull/22076/files/e73b87b7..b7dd22b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From tholenstein at openjdk.org Thu Nov 14 15:11:44 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 15:11:44 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v3] In-Reply-To: References: Message-ID: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: added ! to PrintBFS::print_options_help ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22076/files - new: https://git.openjdk.org/jdk/pull/22076/files/b7dd22b4..34b09e47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From fyang at openjdk.org Thu Nov 14 15:24:17 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 14 Nov 2024 15:24:17 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Thu, 14 Nov 2024 14:05:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. >> >> Thanks >> >> ## Performance >> Tests on bananapi, for other platform, please check jbs issue for test data. >> >> ### Before >> data >> >> Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op >> >> >> >> >> ### After >> data >> >> Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test typo Still good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22102#pullrequestreview-2436396839 From tholenstein at openjdk.org Thu Nov 14 15:33:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 15:33:32 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v4] In-Reply-To: References: Message-ID: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: create Compile::init_igv() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22076/files - new: https://git.openjdk.org/jdk/pull/22076/files/34b09e47..66720072 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=02-03 Stats: 16 lines in 3 files changed: 10 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From tholenstein at openjdk.org Thu Nov 14 15:36:48 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 14 Nov 2024 15:36:48 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v4] In-Reply-To: <5REYqEyj_j25lTRZqHyaZuqplfXKsNth8ArhohqGBW0=.7a8d2a29-e94a-4721-a841-6640c5aa1a7d@github.com> References: <5REYqEyj_j25lTRZqHyaZuqplfXKsNth8ArhohqGBW0=.7a8d2a29-e94a-4721-a841-6640c5aa1a7d@github.com> Message-ID: On Thu, 14 Nov 2024 12:27:47 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> create Compile::init_igv() > > src/hotspot/share/opto/node.cpp line 2054: > >> 2052: if (_print_igv) { >> 2053: Compile* C = Compile::current(); >> 2054: if (C->should_print_igv(0)) { > > I guess the reason to call `should_print_igv` here is to initialize `Compile::_igv_printer` if necessary. Would it be possible to factor out the initialization part https://github.com/openjdk/jdk/blob/2145ace384137b1c028a68dc34a8800577c7a43e/src/hotspot/share/opto/compile.cpp#L5214-L5217 into a separate function that only checks if `_igv_printer` is `nullptr` and, if so, initializes it, and only call that function from here? sure. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1842433726 From yzheng at openjdk.org Thu Nov 14 16:49:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 14 Nov 2024 16:49:23 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() Message-ID: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can have its instance. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. ------------- Commit messages: - Override ModifiersProvider.isConcrete in ResolvedJavaType Changes: https://git.openjdk.org/jdk/pull/22111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343693 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From jbhateja at openjdk.org Thu Nov 14 18:24:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 14 Nov 2024 18:24:59 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/43320063..84f2e04f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=03-04 Stats: 44 lines in 2 files changed: 12 ins; 14 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Thu Nov 14 18:24:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 14 Nov 2024 18:24:59 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 12 Nov 2024 21:49:22 GMT, Vladimir Ivanov wrote: >>> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. >>> >>> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. >> >> Hi @iwanowww , >> Problem occurs only if AndV gets shared; in such a case, matcher will not be able to identify the constrained multiplication pattern and absorb the masking pattern. Specialized IR overrules such limitations and shields the pattern from downstream optimization passes, thereby removing any non-determinism. In addition, it facilitates forwarding inputs to the multiplier, the new IR is explicit in its semantics of considering only lower doublewords of quadword lanes for multiplication, hence we can safely save emitting redundant input masking instructions. We already have specialized IR nodes like MulAddVS2VINode and I see these new IR nodes similar to it. > > @jatin-bhateja in case when `AndV` is shared, it can't be eliminated unless all users absorb it. For such cases, matcher can perform adhoc node cloning, but in this particular case it looks like an overkill either way. IMO the pattern is too niche to focus on it (either to justify input forwarding or adhoc handling on matcher side). > > It's good you mentioned `MulAddVS2VI`. On one hand, VNNI operations are more complex (similar to FMA), so such complexity *may* be justified there. On the other hand, it doesn't look like VNNI support in C2 age well. It is tied to auto-vectorizer and, by now, Vector API doesn't benefit from it. So, instead of doubling down on `MulAddVS2VI` path, I'd prefer to leave it aside and reimplement it later in a more maintainable manner. Thanks @iwanowww , your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2477119879 From jbhateja at openjdk.org Thu Nov 14 18:25:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 14 Nov 2024 18:25:00 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v4] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> <6fxu6YabwpKc13hCZ7Aw46C02K68kozOCBZY3Rn8R8g=.c42f98dc-c253-4972-b2a5-ea8ff5e6061b@github.com> Message-ID: On Thu, 14 Nov 2024 02:52:26 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. The pull request now contains seven commits: >> >> - Removing target specific hooks >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 >> - Review resoultions >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8341137 >> - Handle new I2L pattern, IR tests, Rewiring pattern inputs to MulVL further optimizes JIT code >> - Review resolutions >> - 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction > > test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 105: > >> 103: LongVector vsrc2 = LongVector.fromArray(LSP, lsrc2, i); >> 104: vsrc1.lanewise(VectorOperators.AND, 0xFFFFFFFFL) >> 105: .lanewise(VectorOperators.MUL, vsrc2.lanewise(VectorOperators.AND, 0xFFFFFFFFL)) > > It would be nice to randomize the constants (masks and shifts) to improve test coverage. Pure randomization will ditch the pattern detection since we expect a constant, I have now varied the constant mask in different test points. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1842704791 From vlivanov at openjdk.org Thu Nov 14 19:35:55 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 14 Nov 2024 19:35:55 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. Looks good. I'll submit it for testing. src/hotspot/cpu/x86/x86.ad line 6176: > 6174: %} > 6175: > 6176: Redundant new line. ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2437027064 PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1842785430 From vlivanov at openjdk.org Thu Nov 14 19:45:29 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 14 Nov 2024 19:45:29 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <5cavGY87A5jPQUOnM6aDR1QyceunoZXGqyeu24AxVEU=.df5700cf-19e3-4a8d-84e7-784d3ee0c61d@github.com> On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. test/hotspot/jtreg/compiler/vectorapi/VectorMultiplyOpt.java line 125: > 123: LongVector vsrc1 = LongVector.fromArray(LSP, lsrc1, i); > 124: LongVector vsrc2 = LongVector.fromArray(LSP, lsrc2, i); > 125: vsrc1.lanewise(VectorOperators.AND, 0xFFFFFFL) Alternatively, you could populate the constants in randomized manner and put them into static final fields during class initialization. Then just load them from there in test code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21244#discussion_r1842799093 From dlong at openjdk.org Thu Nov 14 23:32:18 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 14 Nov 2024 23:32:18 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 07:05:59 GMT, Emanuel Peter wrote: >> Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > do what shipilev said Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22080#pullrequestreview-2437392135 From gcao at openjdk.org Fri Nov 15 01:40:02 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 15 Nov 2024 01:40:02 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry Message-ID: Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. ### Testing - [x] release & fastdebug cross-build linux-riscv64 OK - [x] release & fastdebug build on SOPHON SG2042 - [ ] Run tier1 tests on SOPHON SG2042 (release) ------------- Commit messages: - 8344265: RISC-V: Remove unused function get_previous_sp_entry Changes: https://git.openjdk.org/jdk/pull/22130/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22130&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344265 Stats: 58 lines in 2 files changed: 0 ins; 58 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22130/head:pull/22130 PR: https://git.openjdk.org/jdk/pull/22130 From dlong at openjdk.org Fri Nov 15 02:09:23 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 02:09:23 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 12:55:55 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add test I don't see that print_inlining_append_late() with InliningResult::FAILURE is being tested. src/hotspot/share/opto/compile.cpp line 4470: > 4468: } > 4469: old_size--; > 4470: } Isn't the '\n' always at the end? Why do we need to search for it? If we do need to search for it, then we could use `strchr` or `strrchr`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2477803315 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1843099457 From dlong at openjdk.org Fri Nov 15 02:11:58 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 02:11:58 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 12:55:55 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add test src/hotspot/share/opto/compile.cpp line 4480: > 4478: // we are copying the old contents without the line break. > 4479: > 4480: auto buffer = new PrintInliningBuffer(); This function is a lot more complicated than I expected. I expected it to look more like Compile::print_inlining_commit() but writing into the existing PrintInliningBuffer(). If we didn't terminate lines in the _print_inlining_list with '\n' then we wouldn't need to remove it before appending. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1843101277 From dlong at openjdk.org Fri Nov 15 02:38:18 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 02:38:18 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: <8xaTb0Jib53MCIW91SpNPtMy-TLmiHrqM9L4bEQBz-4=.6704ce8a-98ec-4bcb-951b-6d8f7bd6e41a@github.com> On Thu, 14 Nov 2024 12:55:55 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add test src/hotspot/share/opto/compile.cpp line 4480: > 4478: // we are copying the old contents without the line break. > 4479: > 4480: auto buffer = new PrintInliningBuffer(); Don't we need to set the cg in the new buffer? I would prefer having a constructor that takes a cg rather than having to call set_gc(). Or maybe reuse the old buffer after calling old->reset(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1843115776 From fyang at openjdk.org Fri Nov 15 02:40:14 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 15 Nov 2024 02:40:14 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 01:35:05 GMT, Gui Cao wrote: > Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. > > ### Testing > - [x] release & fastdebug cross-build linux-riscv64 OK > - [x] release & fastdebug build on SOPHON SG2042 > - [ ] Run tier1 tests on SOPHON SG2042 (release) src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 46: > 44: address StubRoutines::riscv::_float_sign_flip = nullptr; > 45: address StubRoutines::riscv::_double_sign_mask = nullptr; > 46: address StubRoutines::riscv::_double_sign_flip = nullptr; Seems that `address StubRoutines::riscv::_large_byte_array_inflate` is also not used? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22130#discussion_r1843116236 From dlong at openjdk.org Fri Nov 15 03:12:14 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 03:12:14 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 12:55:55 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add test src/hotspot/share/opto/compile.cpp line 4490: > 4488: > 4489: _print_inlining_list->at_put(_print_inlining_idx, buffer); > 4490: print_inlining_reset(); Reset seems out of place here. We already checked it was empty when we called print_inlining_assert_ready(), and we never printed to that buffer. Maybe this is left over from before the print_inlining_inner_message() refactor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1843133095 From gcao at openjdk.org Fri Nov 15 03:30:11 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 15 Nov 2024 03:30:11 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry [v2] In-Reply-To: References: Message-ID: > Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. > > ### Testing > - [x] release & fastdebug cross-build linux-riscv64 OK > - [x] release & fastdebug build on SOPHON SG2042 > - [ ] Run tier1 tests on SOPHON SG2042 (release) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Remove large_byte_array_inflate function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22130/files - new: https://git.openjdk.org/jdk/pull/22130/files/446c8c28..f44aabd4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22130&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22130&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22130.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22130/head:pull/22130 PR: https://git.openjdk.org/jdk/pull/22130 From gcao at openjdk.org Fri Nov 15 03:30:11 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 15 Nov 2024 03:30:11 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry [v2] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 02:36:32 GMT, Fei Yang wrote: > Seems that `address StubRoutines::riscv::_large_byte_array_inflate` is also not used? Yes, it's not used. Deleted ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22130#discussion_r1843141395 From fyang at openjdk.org Fri Nov 15 04:12:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 15 Nov 2024 04:12:16 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry [v2] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 03:30:11 GMT, Gui Cao wrote: >> Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. >> >> ### Testing >> - [x] release & fastdebug cross-build linux-riscv64 OK >> - [x] release & fastdebug build on SOPHON SG2042 >> - [ ] Run tier1 tests on SOPHON SG2042 (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Remove large_byte_array_inflate function Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22130#pullrequestreview-2437614115 From swen at openjdk.org Fri Nov 15 04:15:36 2024 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 15 Nov 2024 04:15:36 GMT Subject: RFR: 8343629: More MergeStore benchmark [v5] In-Reply-To: References: Message-ID: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: seperate MergeStoreBench and MergeLoadBench ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21659/files - new: https://git.openjdk.org/jdk/pull/21659/files/2e88b024..dce66dae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21659&range=03-04 Stats: 943 lines in 2 files changed: 515 ins; 428 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21659.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21659/head:pull/21659 PR: https://git.openjdk.org/jdk/pull/21659 From duke at openjdk.org Fri Nov 15 07:20:19 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 07:20:19 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 03:08:38 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test > > src/hotspot/share/opto/compile.cpp line 4490: > >> 4488: >> 4489: _print_inlining_list->at_put(_print_inlining_idx, buffer); >> 4490: print_inlining_reset(); > > Reset seems out of place here. We already checked it was empty when we called print_inlining_assert_ready(), and we never printed to that buffer. Maybe this is left over from before the print_inlining_inner_message() refactor? I think you are right and reset can be removed now. An earlier version was using the internal buffer, which made this call necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1843290511 From epeter at openjdk.org Fri Nov 15 07:34:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 15 Nov 2024 07:34:44 GMT Subject: RFR: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException [v2] In-Reply-To: <8UHSRXVy2OR1scdznu0JbdW2WkOl9PD8a1lXDHnZeIY=.257f99a9-97a2-4a75-9a54-a08c74e9a96f@github.com> References: <8UHSRXVy2OR1scdznu0JbdW2WkOl9PD8a1lXDHnZeIY=.257f99a9-97a2-4a75-9a54-a08c74e9a96f@github.com> Message-ID: <7pPVd4MLKwe2-GIsfS5KmHjFAE5L0PKXHT8yYAnn6kY=.056e3a4d-9fe3-47b6-86c2-a9cfaa330bc0@github.com> On Thu, 14 Nov 2024 07:52:46 GMT, Aleksey Shipilev wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> do what shipilev said > > Marked as reviewed by shade (Reviewer). @shipilev @dean-long @chhagedorn thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22080#issuecomment-2478134622 From epeter at openjdk.org Fri Nov 15 07:36:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 15 Nov 2024 07:36:24 GMT Subject: Integrated: 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 15:34:07 GMT, Emanuel Peter wrote: > Test-bug: `RANDOM.nextInt()` would occasionally return a `min_int`. And sadly this overflows: `Math.abs(min_int) == min_int`. Wen we calculate it `% 100`, it still gives us a negative value, and we end up out of bounds. Fixed with a mask. This pull request has now been integrated. Changeset: 21966942 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/21966942b6b5341d0d221d10c3eaa629e543d017 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8344104: TestMergeStores fails with ArrayIndexOutOfBoundException Reviewed-by: shade, chagedorn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/22080 From epeter at openjdk.org Fri Nov 15 07:37:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 15 Nov 2024 07:37:50 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v5] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 14 Nov 2024 14:36:16 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into topic.bimorphic-inlining > - Update test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > Co-authored-by: Tobias Hartmann > - Added Jetbrains copyright > - Added copyright and @bug identifiers > - Fix formatting > - Fix more formatting issues > - Fix formatting > - Add test that replicates issue > > Co-authored-by: Filipp Zhinkin Looks good to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21920#pullrequestreview-2437866731 From duke at openjdk.org Fri Nov 15 07:41:00 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 07:41:00 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v4] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Remove unnecssary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/8a78e358..f4cad4ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From rrich at openjdk.org Fri Nov 15 07:54:51 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 15 Nov 2024 07:54:51 GMT Subject: RFR: 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found Message-ID: This PR removes the bad assertion that fails when leaving a continuation because the cookie value is not found in `frame::common_abi::cr`. This is because after the cookie is stored there when entering the continuation it is overridden by the runtime call to thaw frames. This is compliant with the abi. Strangely the assertion only ever failed on aix. Testing: compiler/codecache/stress/UnexpectedDeoptimizationTest.java always failed since the bad assertion was introduced recently. It succeeds after removal. The fix passed our CI testing: Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. Testing was done on the main platforms and also on Linux/PPC64le and AIX. ------------- Commit messages: - Remove bad assertion Changes: https://git.openjdk.org/jdk/pull/22109/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22109&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344205 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22109.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22109/head:pull/22109 PR: https://git.openjdk.org/jdk/pull/22109 From rrich at openjdk.org Fri Nov 15 07:54:51 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 15 Nov 2024 07:54:51 GMT Subject: RFR: 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:29:02 GMT, Richard Reingruber wrote: > This PR removes the bad assertion that fails when leaving a continuation because the cookie value is not found in `frame::common_abi::cr`. > This is because after the cookie is stored there when entering the continuation it is overridden by the runtime call to thaw frames. This is compliant with the abi. > Strangely the assertion only ever failed on aix. > > Testing: compiler/codecache/stress/UnexpectedDeoptimizationTest.java always failed since the bad assertion was introduced recently. It succeeds after removal. > > The fix passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. The [macos-x64 test failure](https://github.com/reinrich/jdk/actions/runs/11840204418/job/32996332320#step:9:6280) is unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22109#issuecomment-2478157048 From epeter at openjdk.org Fri Nov 15 08:23:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 15 Nov 2024 08:23:29 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:33:32 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > create Compile::init_igv() Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2437950839 From chagedorn at openjdk.org Fri Nov 15 08:30:07 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 08:30:07 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates Message-ID: This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. There are some places where the verification code is - missing - called twice in row with different methods - unnecessarily called This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. #### Details of this Patch - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. - One can implement the new `BFSActions` interface to define - Whether a node's input should be further visited. - Whether a node is a target node for this BFS. - What action that should be performed with the target node. - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: - Verify Template Assertion Predicates: - For init value: Only `OpaqueLoopInit` - For last value: Both `OpaqueLoop*Nodes` - Verify Initialized Assertion Predicates: - No `OpaqueLoop*Nodes` Thanks, Christian ------------- Commit messages: - 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates Changes: https://git.openjdk.org/jdk/pull/22136/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344213 Stats: 275 lines in 7 files changed: 150 ins; 92 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/22136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22136/head:pull/22136 PR: https://git.openjdk.org/jdk/pull/22136 From chagedorn at openjdk.org Fri Nov 15 08:30:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 08:30:08 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates In-Reply-To: References: Message-ID: <_-X8gmWoyJef3Xya1WcQ3aVaUNMXzLUHf2kUriOgi-M=.c8710f14-54ff-4e21-b9e8-3be457ee8394@github.com> On Fri, 15 Nov 2024 08:17:22 GMT, Christian Hagedorn wrote: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian src/hotspot/share/opto/loopPredicate.cpp line 352: > 350: IfTrueNode* template_success_proj = template_assertion_predicate.clone(unswitched_loop_parse_predicate->in(0), this); > 351: assert(assertion_predicate_has_loop_opaque_node(template_success_proj->in(0)->as_If()), > 352: "must find Assertion Predicate for fast loop"); Verification done in `TemplateAssertionPredicate::clone()`. src/hotspot/share/opto/loopTransform.cpp line 1377: > 1375: // Create an Initialized Assertion Predicate from the template_assertion_predicate > 1376: IfTrueNode* PhaseIdealLoop::create_initialized_assertion_predicate(IfNode* template_assertion_predicate, Node* new_init, > 1377: Node* new_stride, Node* new_control) { Inlined method in last usage into `CreateAssertionPredicatesVisitor::initialize_from_template()`. src/hotspot/share/opto/loopTransform.cpp line 2764: > 2762: scale_con, int_offset, int_limit, > 2763: AssertionPredicateType::FinalIv); > 2764: assert(!assertion_predicate_has_loop_opaque_node(loop_entry->in(0)->as_If()), "unexpected"); Verification done in `InitializedAssertionPredicateCreator::create()`. src/hotspot/share/opto/loopTransform.cpp line 2772: > 2770: this); > 2771: loop_entry = template_assertion_predicate_creator.create(loop_entry); > 2772: assert(assertion_predicate_has_loop_opaque_node(loop_entry->in(0)->as_If()), "unexpected"); Verification done in `TemplateAssertionPredicateCreator::create()`. src/hotspot/share/opto/loopTransform.cpp line 2778: > 2776: int_offset, int_limit, > 2777: AssertionPredicateType::InitValue); > 2778: assert(!assertion_predicate_has_loop_opaque_node(loop_entry->in(0)->as_If()), "unexpected"); Verification done in `InitializedAssertionPredicateCreator::create()`. src/hotspot/share/opto/loopopts.cpp line 792: > 790: if (bol->is_OpaqueTemplateAssertionPredicate()) { > 791: // Ignore Template Assertion Predicates with OpaqueTemplateAssertionPredicate nodes. > 792: assert(assertion_predicate_has_loop_opaque_node(iff), "must find OpaqueLoop* nodes"); I don't think we are required to do this verification here since we are now always verifying these nodes when creating and cloning them. Here we just want to bail out if we find a Template Assertion Predicate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843362547 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843363553 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843363983 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843364150 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843364422 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1843366157 From dnsimon at openjdk.org Fri Nov 15 08:32:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Nov 2024 08:32:22 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: <8KuUcHgRLPkS1g3GF6a_l9PwbD0OiAFCxzz1zLpyNio=.b98d35a2-f1df-488d-99f0-b3f5ee887b09@github.com> On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can have its instance. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Please add a test for this in `TestResolvedJavaType.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2478221660 From rcastanedalo at openjdk.org Fri Nov 15 08:32:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 15 Nov 2024 08:32:25 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:33:32 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > create Compile::init_igv() Looks good, thanks for addressing my comments! I just have one final comment/suggestion I missed yesterday. src/hotspot/share/opto/compile.cpp line 5306: > 5304: } > 5305: tty->print_cr("Method printed over network stream to IGV"); > 5306: _debug_network_printer->print(name, (Node*) Compile::current()->root(), visible_nodes); Is there any reason why you want to use `Compile::current()` here instead of the readily available `C`? Suggestion: _debug_network_printer->print(name, (Node*)C->root(), visible_nodes); ------------- PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2437966154 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1843374117 From tholenstein at openjdk.org Fri Nov 15 09:22:05 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 15 Nov 2024 09:22:05 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v5] In-Reply-To: References: Message-ID: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22076/files - new: https://git.openjdk.org/jdk/pull/22076/files/66720072..17765b07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From mdoerr at openjdk.org Fri Nov 15 09:59:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 15 Nov 2024 09:59:14 GMT Subject: RFR: 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:29:02 GMT, Richard Reingruber wrote: > This PR removes the bad assertion that fails when leaving a continuation because the cookie value is not found in `frame::common_abi::cr`. > This is because after the cookie is stored there when entering the continuation it is overridden by the runtime call to thaw frames. This is compliant with the abi. > Strangely the assertion only ever failed on aix. > > Testing: compiler/codecache/stress/UnexpectedDeoptimizationTest.java always failed since the bad assertion was introduced recently. It succeeds after removal. > > The fix passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Looks good and trivial. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22109#pullrequestreview-2438173831 From rcastanedalo at openjdk.org Fri Nov 15 10:52:44 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 15 Nov 2024 10:52:44 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v5] In-Reply-To: References: Message-ID: <9qH57AiJm5OJTnI_M_DBOSyFjgM5Ag1ktpxB4rNK2ME=.602b90a7-cddb-4f5e-9e06-884c4f966556@github.com> On Fri, 15 Nov 2024 09:22:05 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Roberto Casta?eda Lozano Thanks Toby! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2438280689 From rcastanedalo at openjdk.org Fri Nov 15 11:22:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 15 Nov 2024 11:22:19 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 14:51:00 GMT, Tobias Holenstein wrote: > Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting > > cut > > Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` Looks good, thanks for doing this! Edge cutting does more harm than good in most cases IMO (especially in the CFG view), so agree with disabling it by default. Please remove the leftover icon `hideDuplicates.png`. src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/CutEdgesAction.java line 2: > 1: /* > 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/22108#pullrequestreview-2438289243 PR Review Comment: https://git.openjdk.org/jdk/pull/22108#discussion_r1843589441 From amitkumar at openjdk.org Fri Nov 15 12:44:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 15 Nov 2024 12:44:53 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp Message-ID: This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. ------------- Commit messages: - parentheses around || - s390x: fix - Revert "add safety net" - add safety net Changes: https://git.openjdk.org/jdk/pull/22144/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344026 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From mdoerr at openjdk.org Fri Nov 15 12:44:54 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 15 Nov 2024 12:44:54 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 10:04:51 GMT, Amit Kumar wrote: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. Are more simple solution would be to use unsigned arithmetic which avoids UB: `is_power_of_2((juint)c + 1)` > > Are more simple solution would be to use unsigned arithmetic which avoids UB: `is_power_of_2((juint)c + 1)` > > On s390x then we have another issue: > > ```c++ > if (tmp->is_valid()) { > if (is_power_of_2(c + 1)) { > __ move(left, tmp); > __ shift_left(left, log2i_exact(c + 1), left); > __ sub(left, tmp, result); > return true; > } else if (is_power_of_2(c - 1)) { > __ move(left, tmp); > __ shift_left(left, log2i_exact(c - 1), left); > __ add(left, tmp, result); > return true; > } > } > ``` > > `__ sub(left, tmp, result);` will then contain `INT_MAX`+1 value. Ok. That would need more adaptations. I'm also ok with `c > 0 && c < max_jint`. The computation would still be correct with `is_power_of_2((juint)c + 1)` and `log2i_exact((juint)c + 1)`. Integer shift left, add, sub and multiply are exactly the same operations regardless of signed or unsigned. (Except regarding flags which are not relevant here.) The only thing we need to fix is UB. So, both solutions should work. src/hotspot/share/c1/c1_LIRGenerator.cpp line 533: > 531: if (right->is_constant()) { > 532: jint c = right->as_jint(); > 533: if (c > 0 && c < max_jint) { This prevents platform specific optimizations for negative `c`. E.g. multiplication by -1 could be strength reduced to a negation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2478565533 PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2478604647 PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2478627490 PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843594609 From amitkumar at openjdk.org Fri Nov 15 12:44:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 15 Nov 2024 12:44:54 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 11:01:12 GMT, Martin Doerr wrote: > Are more simple solution would be to use unsigned arithmetic which avoids UB: `is_power_of_2((juint)c + 1)` On s390x then we have another issue: if (tmp->is_valid()) { if (is_power_of_2(c + 1)) { __ move(left, tmp); __ shift_left(left, log2i_exact(c + 1), left); __ sub(left, tmp, result); return true; } else if (is_power_of_2(c - 1)) { __ move(left, tmp); __ shift_left(left, log2i_exact(c - 1), left); __ add(left, tmp, result); return true; } } `__ sub(left, tmp, result);` will then contain `INT_MAX`+1 value. > src/hotspot/share/c1/c1_LIRGenerator.cpp line 533: > >> 531: if (right->is_constant()) { >> 532: jint c = right->as_jint(); >> 533: if (c > 0 && c < max_jint) { > > This prevents platform specific optimizations for negative `c`. E.g. multiplication by -1 could be strength reduced to a negation. I didn't get it. How this will affect -1 case ? I see that this is implementation for `strength_reduce_multiply`: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { assert(left != result, "should be different registers"); if (is_power_of_2(c + 1)) { __ shift_left(left, log2i_exact(c + 1), result); __ sub(result, left, result); return true; } else if (is_power_of_2(c - 1)) { __ shift_left(left, log2i_exact(c - 1), result); __ add(result, left, result); return true; } return false; } which will return false in case of `-1`. Basically even without my change, `c = -1` will set `did_strength_reduce` to `false`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2478569235 PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843606825 From mdoerr at openjdk.org Fri Nov 15 12:44:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 15 Nov 2024 12:44:55 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: <26J4ykOybFmva-dQDbWly6cjyKgQor8Y-wVSmxbnsGg=.b91eb53c-0831-4ba7-9715-b0ace0f28293@github.com> On Fri, 15 Nov 2024 11:10:42 GMT, Amit Kumar wrote: >> src/hotspot/share/c1/c1_LIRGenerator.cpp line 533: >> >>> 531: if (right->is_constant()) { >>> 532: jint c = right->as_jint(); >>> 533: if (c > 0 && c < max_jint) { >> >> This prevents platform specific optimizations for negative `c`. E.g. multiplication by -1 could be strength reduced to a negation. > > I didn't get it. How this will affect -1 case ? > > I see that this is implementation for `strength_reduce_multiply`: > > bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { > assert(left != result, "should be different registers"); > if (is_power_of_2(c + 1)) { > __ shift_left(left, log2i_exact(c + 1), result); > __ sub(result, left, result); > return true; > } else if (is_power_of_2(c - 1)) { > __ shift_left(left, log2i_exact(c - 1), result); > __ add(result, left, result); > return true; > } > return false; > } > > which will return false in case of `-1`. > > Basically even without my change, `c = -1` will set `did_strength_reduce` to `false`. I'm not talking about existing code. I'm talking about possibilities which you prevent. With your code `strength_reduce_multiply` will no longer be called with negative `c` preventing possible optimizations inside of it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843611142 From amitkumar at openjdk.org Fri Nov 15 12:44:55 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 15 Nov 2024 12:44:55 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: <26J4ykOybFmva-dQDbWly6cjyKgQor8Y-wVSmxbnsGg=.b91eb53c-0831-4ba7-9715-b0ace0f28293@github.com> References: <26J4ykOybFmva-dQDbWly6cjyKgQor8Y-wVSmxbnsGg=.b91eb53c-0831-4ba7-9715-b0ace0f28293@github.com> Message-ID: On Fri, 15 Nov 2024 11:14:58 GMT, Martin Doerr wrote: >With your code strength_reduce_multiply will no longer be called with negative c preventing possible optimizations inside of it. My thoughts are that even if `strength_reduce_multiply` is being called with `-ve` values. It's almost doing nothing. At the end we are falling back to this code: // we couldn't strength reduce so just emit the multiply if (!did_strength_reduce) { __ mul(left_op, right_op, result_op); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843629774 From mdoerr at openjdk.org Fri Nov 15 12:44:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 15 Nov 2024 12:44:55 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: <26J4ykOybFmva-dQDbWly6cjyKgQor8Y-wVSmxbnsGg=.b91eb53c-0831-4ba7-9715-b0ace0f28293@github.com> Message-ID: On Fri, 15 Nov 2024 11:27:39 GMT, Amit Kumar wrote: >> I'm not talking about existing code. I'm talking about possibilities which you prevent. With your code `strength_reduce_multiply` will no longer be called with negative `c` preventing possible optimizations inside of it. > >>With your code strength_reduce_multiply will no longer be called with negative c preventing possible optimizations inside of it. > > My thoughts are that even if `strength_reduce_multiply` is being called with `-ve` values. It's almost doing nothing. At the end we are falling back to this code: > > // we couldn't strength reduce so just emit the multiply > if (!did_strength_reduce) { > __ mul(left_op, right_op, result_op); > } I didn't get your point. Some platform owners may want to use `strength_reduce_multiply` for more cases in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843652037 From amitkumar at openjdk.org Fri Nov 15 12:44:56 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 15 Nov 2024 12:44:56 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: <26J4ykOybFmva-dQDbWly6cjyKgQor8Y-wVSmxbnsGg=.b91eb53c-0831-4ba7-9715-b0ace0f28293@github.com> Message-ID: On Fri, 15 Nov 2024 11:50:04 GMT, Martin Doerr wrote: >>>With your code strength_reduce_multiply will no longer be called with negative c preventing possible optimizations inside of it. >> >> My thoughts are that even if `strength_reduce_multiply` is being called with `-ve` values. It's almost doing nothing. At the end we are falling back to this code: >> >> // we couldn't strength reduce so just emit the multiply >> if (!did_strength_reduce) { >> __ mul(left_op, right_op, result_op); >> } > > I didn't get your point. Some platform owners may want to use `strength_reduce_multiply` for more cases in the future. That is possible. Thanks for the suggestion. I have reverted the commit :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1843700689 From tholenstein at openjdk.org Fri Nov 15 12:58:34 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 15 Nov 2024 12:58:34 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges [v2] In-Reply-To: References: Message-ID: > Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting > > cut > > Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/View/src/main/java/com/sun/hotspot/igv/view/actions/CutEdgesAction.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22108/files - new: https://git.openjdk.org/jdk/pull/22108/files/034ecf99..f9b9ae8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22108&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22108&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22108/head:pull/22108 PR: https://git.openjdk.org/jdk/pull/22108 From tholenstein at openjdk.org Fri Nov 15 13:01:05 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 15 Nov 2024 13:01:05 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges [v3] In-Reply-To: References: Message-ID: > Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting > > cut > > Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove hideDuplicate.png ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22108/files - new: https://git.openjdk.org/jdk/pull/22108/files/f9b9ae8d..0200d636 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22108&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22108&range=01-02 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22108.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22108/head:pull/22108 PR: https://git.openjdk.org/jdk/pull/22108 From rcastanedalo at openjdk.org Fri Nov 15 13:08:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 15 Nov 2024 13:08:43 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 13:01:05 GMT, Tobias Holenstein wrote: >> Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting >> >> cut >> >> Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove hideDuplicate.png Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22108#pullrequestreview-2438515830 From dlunden at openjdk.org Fri Nov 15 14:08:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 15 Nov 2024 14:08:51 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v5] In-Reply-To: References: Message-ID: <2vjgYZf_Wu_dBDioo-F0mZk96Pn5YnX6yolKRBtdO58=.eb80f40c-4cac-4723-9975-ff5ec8874f09@github.com> On Fri, 15 Nov 2024 09:22:05 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Roberto Casta?eda Lozano Very useful feature! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22076#issuecomment-2478928603 From mdoerr at openjdk.org Fri Nov 15 14:29:53 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 15 Nov 2024 14:29:53 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 10:04:51 GMT, Amit Kumar wrote: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. One more remark: Your current solution restricts the scope of the optimization while the unsigned solution (`is_power_of_2((juint)c + 1)`, `log2i_exact((juint)c + 1)`, ...) only fixes UB. I don't think optimizing multiplication by these special values is particularly important. So, I'm ok with either version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2479013713 From duke at openjdk.org Fri Nov 15 14:30:14 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 14:30:14 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v10] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Replace set_root_as_ctrl with assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/48ab32f8..44eaf101 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From duke at openjdk.org Fri Nov 15 14:30:14 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 14:30:14 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: <7-rX03iJvNPTk_sjMgAEI4ki96PwMO3jt1YTDuddgkE=.e98d59ff-d768-407a-abb2-ddc2416a3b06@github.com> References: <7-rX03iJvNPTk_sjMgAEI4ki96PwMO3jt1YTDuddgkE=.e98d59ff-d768-407a-abb2-ddc2416a3b06@github.com> Message-ID: On Mon, 11 Nov 2024 11:46:38 GMT, Christian Hagedorn wrote: >> I opened an RFE for this https://bugs.openjdk.org/browse/JDK-8343907 > > If you modify the following code above to use your new `makecon()` (could be done either way), could this then be turned into an assert? By looking at the code, it suggests that we only miss to set ctrl in the `singleton` case which would then be covered. > https://github.com/openjdk/jdk/blob/5ca6698ba418e82ff93471fbb495759850f26f63/src/hotspot/share/opto/loopopts.cpp#L123-L125 > You could also only change `makecon()` above and revisit this code later again to remove the `set_root_as_ctrl()` and add an assert. I added the asserts as @chhagedorn suggested. All tests in the internal testing passed with this new assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1843911132 From mli at openjdk.org Fri Nov 15 14:55:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 15 Nov 2024 14:55:01 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry [v2] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 03:30:11 GMT, Gui Cao wrote: >> Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. >> >> ### Testing >> - [x] release & fastdebug cross-build linux-riscv64 OK >> - [x] release & fastdebug build on SOPHON SG2042 >> - [ ] Run tier1 tests on SOPHON SG2042 (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Remove large_byte_array_inflate function Thanks for catching. Looks good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22130#pullrequestreview-2438821805 From duke at openjdk.org Fri Nov 15 15:09:25 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 15:09:25 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v11] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add continue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/44eaf101..8fd2875d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=09-10 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From chagedorn at openjdk.org Fri Nov 15 15:09:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 15:09:26 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v10] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 14:30:14 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Replace set_root_as_ctrl with assert Changes requested by chagedorn (Reviewer). src/hotspot/share/opto/loopopts.cpp line 192: > 190: > 191: if (x->is_Con()) { > 192: assert(get_ctrl(x) == C->root(), "constant control is not root"); I think we should still execute `continue` here because we want to skip constants. Otherwise, the updates look good! ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2438842797 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1843960883 From duke at openjdk.org Fri Nov 15 15:09:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 15 Nov 2024 15:09:26 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v10] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 15:02:00 GMT, Christian Hagedorn wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace set_root_as_ctrl with assert > > src/hotspot/share/opto/loopopts.cpp line 192: > >> 190: >> 191: if (x->is_Con()) { >> 192: assert(get_ctrl(x) == C->root(), "constant control is not root"); > > I think we should still execute `continue` here because we want to skip constants. Otherwise, the updates look good! Oops, that slipped through. Thanks! I fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1843966763 From chagedorn at openjdk.org Fri Nov 15 16:08:50 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 16:08:50 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v11] In-Reply-To: References: Message-ID: <3Ee9iv66ylVfmaCrrxsXKVpMYGNP86hF1kY7mlZP2Gg=.87788693-489f-4934-8cf5-c35f55ff0968@github.com> On Fri, 15 Nov 2024 15:09:25 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add continue Some last minor comments, otherwise the updates look good to me. Thanks for doing this extended refactoring! src/hotspot/share/opto/loopnode.cpp line 6862: > 6860: } > 6861: > 6862: ConLNode *PhaseIdealLoop::longcon(jlong i) { Suggestion: ConLNode* PhaseIdealLoop::longcon(jlong i) { src/hotspot/share/opto/loopnode.cpp line 6868: > 6866: } > 6867: > 6868: ConNode *PhaseIdealLoop::makecon(const Type* t) { Suggestion: ConNode* PhaseIdealLoop::makecon(const Type* t) { src/hotspot/share/opto/loopnode.cpp line 6880: > 6878: } > 6879: > 6880: ConNode *PhaseIdealLoop::zerocon(BasicType bt) { Suggestion: ConNode* PhaseIdealLoop::zerocon(BasicType bt) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2439045051 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1844107647 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1844107789 PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1844107950 From chagedorn at openjdk.org Fri Nov 15 16:33:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 16:33:47 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 13:01:05 GMT, Tobias Holenstein wrote: >> Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting >> >> cut >> >> Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove hideDuplicate.png That's useful, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22108#pullrequestreview-2439112236 From chagedorn at openjdk.org Fri Nov 15 16:53:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 15 Nov 2024 16:53:22 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v5] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 09:22:05 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Roberto Casta?eda Lozano That's very handy and useful! Looks good to me, too. src/hotspot/share/opto/compile.cpp line 5306: > 5304: } > 5305: tty->print_cr("Method printed over network stream to IGV"); > 5306: _debug_network_printer->print(name, (Node*)C->root(), visible_nodes); Cast is not required I think: Suggestion: _debug_network_printer->print(name, C->root(), visible_nodes); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2439119135 PR Review Comment: https://git.openjdk.org/jdk/pull/22076#discussion_r1844147417 From rkennke at openjdk.org Fri Nov 15 18:13:51 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 15 Nov 2024 18:13:51 GMT Subject: Integrated: 8340453: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 15:08:03 GMT, Roman Kennke wrote: > We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. > > However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. > > Testing: > - [x] tier1 aarch64 +UCOH > - [x] tier1 x86_64 +UCOH This pull request has now been integrated. Changeset: ff12ff53 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/ff12ff534abb2e08d1bb44a83ef4f84b8476f94c Stats: 59 lines in 9 files changed: 17 ins; 33 del; 9 mod 8340453: C2: Improve encoding of LoadNKlass for compact headers Reviewed-by: rcastanedalo, mli ------------- PR: https://git.openjdk.org/jdk/pull/22078 From dlunden at openjdk.org Fri Nov 15 18:36:16 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 15 Nov 2024 18:36:16 GMT Subject: RFR: 8331295: C2: High memory usage reported in PhaseChaitin::Split Message-ID: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. ### Changeset - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. - Add a new IR framework regression test. ### Testing - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. ------------- Commit messages: - Initial fix Changes: https://git.openjdk.org/jdk/pull/22157/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22157&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331295 Stats: 252 lines in 4 files changed: 249 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22157/head:pull/22157 PR: https://git.openjdk.org/jdk/pull/22157 From vlivanov at openjdk.org Fri Nov 15 18:53:46 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 15 Nov 2024 18:53:46 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <_e4cV3iB485lOoKDn_Bkt2G_Qni8XwL9czQFTOhXAis=.91751b42-23cf-4ed3-acc0-68a97b0b0ab9@github.com> On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. Testing results are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2439389187 From dlong at openjdk.org Fri Nov 15 21:09:51 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 21:09:51 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 07:21:22 GMT, Roman Kennke wrote: > > It looks like this only works for little-endian. Is that documented somewhere? > > I am not sure what you mean. This change is about x86_64 and aarch64, and both are little-endian. The layout of the mark-word is documented in markWord.hpp. Is that what you are looking for? Wouldn't klass_offset_in_bytes need to have a different value for big-endian? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2479923097 From rkennke at openjdk.org Fri Nov 15 21:29:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 15 Nov 2024 21:29:11 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v2] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 21:06:19 GMT, Dean Long wrote: > > > It looks like this only works for little-endian. Is that documented somewhere? > > > > > > I am not sure what you mean. This change is about x86_64 and aarch64, and both are little-endian. The layout of the mark-word is documented in markWord.hpp. Is that what you are looking for? > > Wouldn't klass_offset_in_bytes need to have a different value for big-endian? I think so, yes. It would be 0, right? Which would not easily work, because offset 0 would conflict with the C2 memory slice for the mark-word. Big endian arches (do we even support any?) would need to use the old fake-offset method like we did before this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2479943053 From dlong at openjdk.org Fri Nov 15 21:47:56 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 15 Nov 2024 21:47:56 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 11:32:47 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline in opto output Yes, I think we still care about big-endian. See for example JDK-8314949 and JDK-8312495. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2479982177 From syan at openjdk.org Sat Nov 16 01:05:27 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 16 Nov 2024 01:05:27 GMT Subject: RFR: 8331295: C2: High memory usage reported in PhaseChaitin::Split In-Reply-To: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Fri, 15 Nov 2024 18:29:48 GMT, Daniel Lund?n wrote: > On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. > > ### Changeset > > - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. > - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. > - Add a new IR framework regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. Hi, after this PR integrated, should we backout the change of https://github.com/openjdk/jdk/pull/21586 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22157#issuecomment-2480245057 From dlong at openjdk.org Sat Nov 16 01:05:36 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 16 Nov 2024 01:05:36 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) This looks good, and should help save work by not generating this data during a compilation. But I'm wondering if we should go further and reduce the static code footprint of all these similar functions. We could make these functions completely data-driven using compact signature strings. For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2480240976 From vlivanov at openjdk.org Sat Nov 16 02:00:51 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 16 Nov 2024 02:00:51 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> References: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> Message-ID: On Sat, 16 Nov 2024 00:51:21 GMT, Dean Long wrote: > For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2480303644 From vlivanov at openjdk.org Sat Nov 16 02:00:52 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 16 Nov 2024 02:00:52 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) src/hotspot/share/opto/type.cpp line 716: > 714: > 715: LockNode::lock_type_init(); > 716: OptoRuntime::new_instance_Type_init(); I suggest to move the initialization code into `OptoRuntime`. As a benefit, you'll be able to directly access fields from there, so some trivial `init` methods won't be needed anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1844790153 From vlivanov at openjdk.org Sat Nov 16 02:10:52 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 16 Nov 2024 02:10:52 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 06:43:23 GMT, Amit Kumar wrote: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Types are allocated in C2 type arena, so what confused me at first was the lifetime of allocated instances. Unless they live in shared arena, it's not safe to cache them. It would be helpful to assert that during initialization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2480318532 From amitkumar at openjdk.org Sat Nov 16 05:02:53 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 16 Nov 2024 05:02:53 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 01:45:45 GMT, Vladimir Ivanov wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > src/hotspot/share/opto/type.cpp line 716: > >> 714: >> 715: LockNode::lock_type_init(); >> 716: OptoRuntime::new_instance_Type_init(); > > I suggest to move the initialization code into `OptoRuntime`. As a benefit, you'll be able to directly access fields from there, so some trivial `init` methods won't be needed anymore. "first" call is made from here because of shared space. Otherwise the object-allocation will deleted and VM will crash. That's what I observed. And again that was the reason why the initialization call is made from `Type::Initialize_shared`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1844909261 From amitkumar at openjdk.org Sat Nov 16 05:08:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 16 Nov 2024 05:08:12 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 02:08:26 GMT, Vladimir Ivanov wrote: > Types are allocated in C2 type arena, so what confused me at first was the lifetime of allocated instances. Unless they live in shared arena, it's not safe to cache them. It would be helpful to assert that during initialization. For now there are two assert which I included: `assert(_multianewarray4_tf == nullptr, "should be called once only");` which will be be in the `*_init()` method. and `assert(_multianewarray4_tf != nullptr, "should be initialized");` which is do null-check before returning the object. Is there some arena-specific check that exists, which could be used here ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2480406136 From dlunden at openjdk.org Sat Nov 16 09:25:45 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Sat, 16 Nov 2024 09:25:45 GMT Subject: RFR: 8331295: C2: High memory usage reported in PhaseChaitin::Split In-Reply-To: References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Sat, 16 Nov 2024 00:57:55 GMT, SendaoYan wrote: >> On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. >> >> ### Changeset >> >> - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. >> - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. >> - Add a new IR framework regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. > > Hi, after this PR integrated, should we backout the change of https://github.com/openjdk/jdk/pull/21586 @sendaoYan Thanks for checking! It has already been reverted in [JDK-8344018](https://bugs.openjdk.org/browse/JDK-8344018) (and also turned out to be a separate issue: [JDK-8343038](https://bugs.openjdk.org/browse/JDK-8343038)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22157#issuecomment-2480491521 From gcao at openjdk.org Mon Nov 18 00:50:58 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 18 Nov 2024 00:50:58 GMT Subject: RFR: 8344265: RISC-V: Remove unused function get_previous_sp_entry [v2] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 03:30:11 GMT, Gui Cao wrote: >> Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. >> >> ### Testing >> - [x] release & fastdebug cross-build linux-riscv64 OK >> - [x] release & fastdebug build on SOPHON SG2042 >> - [x] Run tier1 tests on SOPHON SG2042 (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Remove large_byte_array_inflate function Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22130#issuecomment-2481717019 From gcao at openjdk.org Mon Nov 18 00:50:58 2024 From: gcao at openjdk.org (Gui Cao) Date: Mon, 18 Nov 2024 00:50:58 GMT Subject: Integrated: 8344265: RISC-V: Remove unused function get_previous_sp_entry In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 01:35:05 GMT, Gui Cao wrote: > Hi, I noticed that there are a several unused functions here that are currently only used in the x86 architecture, not used in RISC-V. > > ### Testing > - [x] release & fastdebug cross-build linux-riscv64 OK > - [x] release & fastdebug build on SOPHON SG2042 > - [x] Run tier1 tests on SOPHON SG2042 (release) This pull request has now been integrated. Changeset: 80e37a96 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/80e37a96bbd4167bca44b11b9968949318ee1140 Stats: 64 lines in 2 files changed: 0 ins; 64 del; 0 mod 8344265: RISC-V: Remove unused function get_previous_sp_entry Reviewed-by: fyang, mli ------------- PR: https://git.openjdk.org/jdk/pull/22130 From fyang at openjdk.org Mon Nov 18 01:30:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Nov 2024 01:30:39 GMT Subject: RFR: 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-834355 Message-ID: >From the error message, the cause of the failure is that 'UseRVV' was made diagnostic in [JDK-8343555](https://bugs.openjdk.org/browse/JDK-8343555). But the test was not updated to refect this. Instead of adding one extra `-XX:+UnlockDiagnosticVMOptions` option, this simply removed the use of `-XX:+UseRVV` from the test. The reason is that we have `-XX:+UseRVV` auto detected and enabled, so we will have RVV extension if we satisfy the test requirement: (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") Same test pass with this fix on linux-riscv64 with RVV extension. ------------- Commit messages: - 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-8343555 Changes: https://git.openjdk.org/jdk/pull/22188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22188&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344371 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22188/head:pull/22188 PR: https://git.openjdk.org/jdk/pull/22188 From dhanalla at openjdk.org Mon Nov 18 02:23:46 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Mon, 18 Nov 2024 02:23:46 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: > In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. > > When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. > > The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: CR comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20504/files - new: https://git.openjdk.org/jdk/pull/20504/files/4c444f10..27aab6b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20504&range=05-06 Stats: 99 lines in 3 files changed: 48 ins; 50 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20504/head:pull/20504 PR: https://git.openjdk.org/jdk/pull/20504 From thartmann at openjdk.org Mon Nov 18 06:04:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 06:04:53 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v5] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 14 Nov 2024 14:36:16 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into topic.bimorphic-inlining > - Update test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > Co-authored-by: Tobias Hartmann > - Added Jetbrains copyright > - Added copyright and @bug identifiers > - Fix formatting > - Fix more formatting issues > - Fix formatting > - Add test that replicates issue > > Co-authored-by: Filipp Zhinkin Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21920#pullrequestreview-2441608049 From thartmann at openjdk.org Mon Nov 18 06:53:43 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 06:53:43 GMT Subject: RFR: 8331295: C2: High memory usage reported in PhaseChaitin::Split In-Reply-To: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Fri, 15 Nov 2024 18:29:48 GMT, Daniel Lund?n wrote: > On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. > > ### Changeset > > - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. > - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. > - Add a new IR framework regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. Nice analysis! The fix looks good to me. Maybe the issue title should be updated to better reflect the root cause? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22157#pullrequestreview-2441671546 PR Comment: https://git.openjdk.org/jdk/pull/22157#issuecomment-2482095176 From amitkumar at openjdk.org Mon Nov 18 07:18:16 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 18 Nov 2024 07:18:16 GMT Subject: RFR: 8344379: [s390x] build failure due to missing change from JDK-8339466 Message-ID: Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/22190/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22190&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344379 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22190/head:pull/22190 PR: https://git.openjdk.org/jdk/pull/22190 From amitkumar at openjdk.org Mon Nov 18 07:22:45 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 18 Nov 2024 07:22:45 GMT Subject: RFR: 8344379: [s390x] build failure due to missing change from JDK-8339466 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:11:45 GMT, Amit Kumar wrote: > Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. @RealLucy a little bit of help here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22190#issuecomment-2482137452 From duke at openjdk.org Mon Nov 18 07:56:31 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 18 Nov 2024 07:56:31 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21836/files - new: https://git.openjdk.org/jdk/pull/21836/files/8fd2875d..4ed14b2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21836&range=10-11 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21836.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21836/head:pull/21836 PR: https://git.openjdk.org/jdk/pull/21836 From epeter at openjdk.org Mon Nov 18 08:04:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 08:04:35 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - manual merge - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding - fix whitespace - fix tests and build - fix store-to-load forward IR rules - updates before the weekend ... who knows if they are any good - refactor to iteration threshold - use jvmArgs again, and apply same fix as 8343345 - revert to jvmArgsPrepend - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 ------------- Changes: https://git.openjdk.org/jdk/pull/21521/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=01 Stats: 4386 lines in 17 files changed: 4324 ins; 4 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From rrich at openjdk.org Mon Nov 18 08:17:47 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Nov 2024 08:17:47 GMT Subject: RFR: 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:29:02 GMT, Richard Reingruber wrote: > This PR removes the bad assertion that fails when leaving a continuation because the cookie value is not found in `frame::common_abi::cr`. > This is because after the cookie is stored there when entering the continuation it is overridden by the runtime call to thaw frames. This is compliant with the abi. > Strangely the assertion only ever failed on aix. > > Testing: compiler/codecache/stress/UnexpectedDeoptimizationTest.java always failed since the bad assertion was introduced recently. It succeeds after removal. > > The fix passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. Thanks for reviewing, Martin. I'll integrate this as a trivial change now. Cheers, Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22109#issuecomment-2482230854 From rrich at openjdk.org Mon Nov 18 08:21:13 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 18 Nov 2024 08:21:13 GMT Subject: Integrated: 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 15:29:02 GMT, Richard Reingruber wrote: > This PR removes the bad assertion that fails when leaving a continuation because the cookie value is not found in `frame::common_abi::cr`. > This is because after the cookie is stored there when entering the continuation it is overridden by the runtime call to thaw frames. This is compliant with the abi. > Strangely the assertion only ever failed on aix. > > Testing: compiler/codecache/stress/UnexpectedDeoptimizationTest.java always failed since the bad assertion was introduced recently. It succeeds after removal. > > The fix passed our CI testing: > Tier 1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance Suite and SAP specific tests. > Testing was done on the main platforms and also on Linux/PPC64le and AIX. This pull request has now been integrated. Changeset: 4a7ce1d7 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/4a7ce1d7c1bd4b751063b98cf8bedcd27055760b Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod 8344205: [PPC]: failing assertion: sharedRuntime_ppc.cpp:1652: cookie not found Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/22109 From tholenstein at openjdk.org Mon Nov 18 08:37:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 18 Nov 2024 08:37:58 GMT Subject: RFR: 8344204: IGV: Button to enable/disable cutting of long edges [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 13:05:15 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> remove hideDuplicate.png > > Marked as reviewed by rcastanedalo (Reviewer). thanks @robcasloz and @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22108#issuecomment-2482272083 From tholenstein at openjdk.org Mon Nov 18 08:37:59 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 18 Nov 2024 08:37:59 GMT Subject: Integrated: 8344204: IGV: Button to enable/disable cutting of long edges In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 14:51:00 GMT, Tobias Holenstein wrote: > Currently IGV layout cuts edges that are longer than 10 layers. Add an option to enable/disable the cutting > > cut > > Because the Toolbar gets crowded, I removed the non-functioning button for `HideDuplicatesAction` This pull request has now been integrated. Changeset: 6c2ae44c Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/6c2ae44c052bdabbfc2fd15e133b30849580b4a6 Stats: 269 lines in 13 files changed: 163 ins; 93 del; 13 mod 8344204: IGV: Button to enable/disable cutting of long edges Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22108 From chagedorn at openjdk.org Mon Nov 18 08:40:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 18 Nov 2024 08:40:43 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:56:31 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2441861659 From tholenstein at openjdk.org Mon Nov 18 08:40:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 18 Nov 2024 08:40:46 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v6] In-Reply-To: References: Message-ID: <1iLDX0GPYHEp4KcV8rF4JqTz8tGirW-KDuNBS6F2Vvs=.f8bcbaa3-6a49-4b58-8cc0-da522ed97afb@github.com> > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/compile.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22076/files - new: https://git.openjdk.org/jdk/pull/22076/files/17765b07..7b9664ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22076&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22076.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22076/head:pull/22076 PR: https://git.openjdk.org/jdk/pull/22076 From tholenstein at openjdk.org Mon Nov 18 08:40:47 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 18 Nov 2024 08:40:47 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV In-Reply-To: <2cSNg5EUSErn3fXs6GFaFl7YPQuXpJHJ47NhDK0ECrA=.a5b5b46f-51ba-4f59-98a2-544a5b8ba767@github.com> References: <2cSNg5EUSErn3fXs6GFaFl7YPQuXpJHJ47NhDK0ECrA=.a5b5b46f-51ba-4f59-98a2-544a5b8ba767@github.com> Message-ID: On Thu, 14 Nov 2024 07:40:25 GMT, Emanuel Peter wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > ![image](https://github.com/user-attachments/assets/5515a542-c8e1-487f-a17c-c6a558044cfa) > > You also need to fix this. thanks for the reviews @eme64 , @chhagedorn and @robcasloz ------------- PR Comment: https://git.openjdk.org/jdk/pull/22076#issuecomment-2482277808 From lucy at openjdk.org Mon Nov 18 08:57:42 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 18 Nov 2024 08:57:42 GMT Subject: RFR: 8344379: [s390x] build failure due to missing change from JDK-8339466 In-Reply-To: References: Message-ID: <0Ls6FNyumBHFWjL3e3rWKiXc5pn-VYU_A0Rnk6F-VNY=.1505df02-7e7f-4feb-a405-517092f8868e@github.com> On Mon, 18 Nov 2024 07:11:45 GMT, Amit Kumar wrote: > Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. Looks good. And trivial. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22190#pullrequestreview-2441896609 From chagedorn at openjdk.org Mon Nov 18 09:01:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 18 Nov 2024 09:01:45 GMT Subject: RFR: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV [v6] In-Reply-To: <1iLDX0GPYHEp4KcV8rF4JqTz8tGirW-KDuNBS6F2Vvs=.f8bcbaa3-6a49-4b58-8cc0-da522ed97afb@github.com> References: <1iLDX0GPYHEp4KcV8rF4JqTz8tGirW-KDuNBS6F2Vvs=.f8bcbaa3-6a49-4b58-8cc0-da522ed97afb@github.com> Message-ID: On Mon, 18 Nov 2024 08:40:46 GMT, Tobias Holenstein wrote: >> IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network >> >> ### Add a new option "!" to dump_bfs >> The option ! send the printed nodes of dump_bfs to IGV and shows them >> >> p find_node(0)->dump_bfs(1,0,"dcmxo+!") >> >> dist dump >> --------------------------------------------- >> 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] >> 0 0 Root === 0 51 [[ 0 1 3 26 ]] >> Method printed over network stream to IGV >> >> >> dump > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/compile.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22076#pullrequestreview-2441909560 From tholenstein at openjdk.org Mon Nov 18 09:38:55 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 18 Nov 2024 09:38:55 GMT Subject: Integrated: 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:41:24 GMT, Tobias Holenstein wrote: > IGV XML already support to define which graphs are visible when opened. Extend the IdealGraphPrinter::print... in C2 to define which nodes should be visible in IGV when sent over the network > > ### Add a new option "!" to dump_bfs > The option ! send the printed nodes of dump_bfs to IGV and shows them > > p find_node(0)->dump_bfs(1,0,"dcmxo+!") > > dist dump > --------------------------------------------- > 1 51 Return === 46 6 47 8 9 returns 39 [[ 0 ]] > 0 0 Root === 0 51 [[ 0 1 3 26 ]] > Method printed over network stream to IGV > > > dump This pull request has now been integrated. Changeset: b9c6ce90 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/b9c6ce900b512adfcaccd2341be3eb0003a28b87 Stats: 85 lines in 6 files changed: 61 ins; 3 del; 21 mod 8344122: IGV: Extend c2 IdealGraphPrinter to send subgraphs to IGV Reviewed-by: chagedorn, epeter, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22076 From shade at openjdk.org Mon Nov 18 09:52:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 18 Nov 2024 09:52:49 GMT Subject: RFR: 8344379: [s390x] build failure due to missing change from JDK-8339466 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:11:45 GMT, Amit Kumar wrote: > Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. Just noticed this in my builds too. Looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22190#pullrequestreview-2442032277 From dlunden at openjdk.org Mon Nov 18 10:03:54 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 18 Nov 2024 10:03:54 GMT Subject: RFR: 8331295: C2: Do not clone address computations that are indirect memory input to at least one load/store In-Reply-To: References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Mon, 18 Nov 2024 06:51:24 GMT, Tobias Hartmann wrote: >> On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. >> >> ### Changeset >> >> - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. >> - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. >> - Add a new IR framework regression test. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. > > Maybe the issue title should be updated to better reflect the root cause? @TobiHartmann Thanks, title updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22157#issuecomment-2482490257 From mdoerr at openjdk.org Mon Nov 18 10:16:02 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 18 Nov 2024 10:16:02 GMT Subject: RFR: 8340453: C2: Improve encoding of LoadNKlass for compact headers [v4] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 11:32:47 GMT, Roman Kennke wrote: >> We currently use the offset 4 as a placeholder in LoadNKlass, when running with compact headers. In reality, we are loading from offset 0, but we want to keep LoadNKlass on a separate memory slice from other mark-word-accesses, because LoadNKlass is essentially immutable memory. The consequence is that we need to figure out the address of the mark-word in the backend, and this is ugly. >> >> However, we can do better. We can just as well load 4 bytes from offset 4, and shift by a 32 smaller shift. This has previously not been possible because we needed to check for the monitor bit in the markWord, but this is no longer necessary. This simplifies the code and even makes the instructions encoding a bit smaller. >> >> Testing: >> - [x] tier1 aarch64 +UCOH >> - [x] tier1 x86_64 +UCOH > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing newline in opto output AIX on PPC64 and linux on s390 are supported Big Endian platforms. We can't do the same change for these platforms without endianness sensitive adaptations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22078#issuecomment-2482535428 From thartmann at openjdk.org Mon Nov 18 10:16:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 10:16:29 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic Message-ID: The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. Thanks, Tobias ------------- Commit messages: - 8344199: Incorrect excluded field value set by getEventWriter intrinsic Changes: https://git.openjdk.org/jdk/pull/22195/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22195&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344199 Stats: 11 lines in 2 files changed: 6 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22195.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22195/head:pull/22195 PR: https://git.openjdk.org/jdk/pull/22195 From amitkumar at openjdk.org Mon Nov 18 10:44:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 18 Nov 2024 10:44:08 GMT Subject: RFR: 8344379: [s390x] build failure due to missing change from JDK-8339466 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:11:45 GMT, Amit Kumar wrote: > Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. Thanks for quick review Lutz, Aleksey. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22190#issuecomment-2482644997 From amitkumar at openjdk.org Mon Nov 18 10:44:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 18 Nov 2024 10:44:09 GMT Subject: Integrated: 8344379: [s390x] build failure due to missing change from JDK-8339466 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:11:45 GMT, Amit Kumar wrote: > Trivial change. Adds one missing part from [JDK-8327652](https://bugs.openjdk.org/browse/JDK-8327652) as that is causing build failure on s390x. This pull request has now been integrated. Changeset: b8b70c8b Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/b8b70c8b4efd97ae6a57a880b03a4bf26d79acc4 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344379: [s390x] build failure due to missing change from JDK-8339466 Reviewed-by: lucy, shade ------------- PR: https://git.openjdk.org/jdk/pull/22190 From syan at openjdk.org Mon Nov 18 11:10:57 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 18 Nov 2024 11:10:57 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Test passed after apply the patch of this PR ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/22195#pullrequestreview-2442318379 From mgronlun at openjdk.org Mon Nov 18 11:25:44 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 18 Nov 2024 11:25:44 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Thanks for finding and fixing this. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22195#pullrequestreview-2442347674 From thartmann at openjdk.org Mon Nov 18 11:30:56 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 11:30:56 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Thanks for the quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22195#issuecomment-2482771701 From mli at openjdk.org Mon Nov 18 11:38:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 18 Nov 2024 11:38:42 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers Message-ID: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Hi, Can you help to review this patch? This is a follow-up of 8340453 on riscv. Thanks! ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/22203/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22203&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344387 Stats: 18 lines in 3 files changed: 4 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22203/head:pull/22203 PR: https://git.openjdk.org/jdk/pull/22203 From mli at openjdk.org Mon Nov 18 11:47:56 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 18 Nov 2024 11:47:56 GMT Subject: RFR: 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-8343555 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 01:16:37 GMT, Fei Yang wrote: > From the error message, the cause of the failure is that 'UseRVV' was made diagnostic in [JDK-8343555](https://bugs.openjdk.org/browse/JDK-8343555). But the test was not updated to refect this. Instead of adding one extra `-XX:+UnlockDiagnosticVMOptions` option, this simply removed the use of `-XX:+UseRVV` from the test. The reason is that we have `-XX:+UseRVV` auto detected and enabled, so we will have RVV extension if we satisfy the test requirement: > > (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") > > > Same test pass with this fix on linux-riscv64 with RVV extension. Looks good. Thanks for catching this. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22188#pullrequestreview-2442392962 From swen at openjdk.org Mon Nov 18 11:58:49 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 18 Nov 2024 11:58:49 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: References: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> Message-ID: <_FmA2KMB09g9ylaAlfudrSZ6oB2BLnmfUFyetvBJQdo=.27c71f1a-11b7-4285-b081-6b96e49f45e2@github.com> On Thu, 14 Nov 2024 07:48:38 GMT, Emanuel Peter wrote: >> @eme64 Are there plans to support MergeLoad, and big-endian MergeStore on little-endian machines? > > @wenshao Ah. I only just realized it: you have a lot of `get` benchmarks... they don't really belong to `MergeStores`... if anything you could put them in a separate `MergeLoads` benchmark! > > Also: now we have lots of data here. But data alone is kind of pointless. We need analysis to see **what patterns** and **why** they get speedups. @eme64 I have updated the benchmark numbers above, removing the getXXX part. I have also added some analysis. I found that the test using VarHandle is the fastest, whether it is BigEndian or LittleEndian. All the BigEndian tests are not MergeStored, including the Reverse combination scenario. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2482831411 From fyang at openjdk.org Mon Nov 18 12:57:43 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 18 Nov 2024 12:57:43 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: On Mon, 18 Nov 2024 11:32:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is a follow-up of 8340453 on riscv. > Thanks! src/hotspot/cpu/riscv/riscv.ad line 4824: > 4822: ins_encode %{ > 4823: __ lwu(as_Register($dst$$reg), Address(as_Register($mem$$base), $mem$$disp)); > 4824: __ srli(as_Register($dst$$reg), as_Register($dst$$reg), (unsigned) markWord::klass_shift_at_offset); Do we need this explicit unsigned type conversion? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22203#discussion_r1846535954 From qamai at openjdk.org Mon Nov 18 13:08:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 13:08:44 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22038#pullrequestreview-2442568149 From mdoerr at openjdk.org Mon Nov 18 16:55:49 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 18 Nov 2024 16:55:49 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 10:04:51 GMT, Amit Kumar wrote: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. I'm thinking about improving it to optimize more cases: diff --git a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp index 7973e9d0545..aa948facba1 100644 --- a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp @@ -296,14 +296,19 @@ void LIRGenerator::cmp_reg_mem(LIR_Condition condition, LIR_Opr reg, LIR_Opr bas bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { assert(left != result, "should be different registers"); - if (is_power_of_2(c + 1)) { - __ shift_left(left, log2i_exact(c + 1), result); + // Using unsigned arithmetics to avoid undefined behavior due to integer overflow. + // The involved operations are not sensitive to signedness. + if (is_power_of_2((juint)c + 1)) { + __ shift_left(left, log2i_exact((juint)c + 1), result); __ sub(result, left, result); return true; - } else if (is_power_of_2(c - 1)) { - __ shift_left(left, log2i_exact(c - 1), result); + } else if (is_power_of_2((juint)c - 1)) { + __ shift_left(left, log2i_exact((juint)c - 1), result); __ add(result, left, result); return true; + } else if (c == -1) { + __ negate(left, result); + return true; } return false; } The same should work for s390, too. Opinions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2483587955 From amitkumar at openjdk.org Mon Nov 18 17:28:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 18 Nov 2024 17:28:52 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 16:52:37 GMT, Martin Doerr wrote: >The same should work for s390, too. Opinions? Okay these changes seem good-to-have. Just curious to know, if `c == -1` is worth for optimisations, looking at real world applications ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2483668422 From dlong at openjdk.org Mon Nov 18 19:19:14 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 18 Nov 2024 19:19:14 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> Message-ID: On Sat, 16 Nov 2024 01:57:00 GMT, Vladimir Ivanov wrote: > > For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". > > There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. Good idea. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2483900585 From vlivanov at openjdk.org Mon Nov 18 20:03:53 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 18 Nov 2024 20:03:53 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 04:59:54 GMT, Amit Kumar wrote: >> src/hotspot/share/opto/type.cpp line 716: >> >>> 714: >>> 715: LockNode::lock_type_init(); >>> 716: OptoRuntime::new_instance_Type_init(); >> >> I suggest to move the initialization code into `OptoRuntime`. As a benefit, you'll be able to directly access fields from there, so some trivial `init` methods won't be needed anymore. > > "first" call is made from here because of shared space. Otherwise the object-allocation will deleted and VM will crash. That's what I observed. And again that was the reason why the initialization call is made from `Type::Initialize_shared`. My suggestion is about refactoring the code, so initialization is performed in `OptoRuntime` code (e.g., in `OptoRuntime::initialize_types()`). Then you call it from here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1847179850 From vlivanov at openjdk.org Mon Nov 18 20:19:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 18 Nov 2024 20:19:12 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 05:05:12 GMT, Amit Kumar wrote: > Is there some arena-specific check that exists, which could be used here ? `_type_arena != &_Compile_types` on current `Compile` should reliably detect when shared type dictionary is being populated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2484041505 From never at openjdk.org Mon Nov 18 20:20:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 18 Nov 2024 20:20:49 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. I think you need to add a unit test for isConcrete now: java.lang.AssertionError: test missing for public default boolean jdk.vm.ci.meta.ResolvedJavaType.isConcrete() src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ModifiersProvider.java line 140: > 138: > 139: /** > 140: * Checks that this element is concrete and not abstract. It might be worth clarifying that we don't mean `isAbstract()` here. We specifically mean that it corresponds to a method with a real implementation or a type which can be instantiated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2484044896 PR Review Comment: https://git.openjdk.org/jdk/pull/22111#discussion_r1847215563 From mdoerr at openjdk.org Mon Nov 18 20:37:59 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 18 Nov 2024 20:37:59 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 10:04:51 GMT, Amit Kumar wrote: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. Well, `strength_reduce_multiply` in C1 has probably no real benefit because we have C2. Nevertheless, I think it's nice to have some simple optimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2484074611 From fyang at openjdk.org Tue Nov 19 01:52:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 19 Nov 2024 01:52:50 GMT Subject: Integrated: 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-8343555 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 01:16:37 GMT, Fei Yang wrote: > From the error message, the cause of the failure is that 'UseRVV' was made diagnostic in [JDK-8343555](https://bugs.openjdk.org/browse/JDK-8343555). But the test was not updated to refect this. Instead of adding one extra `-XX:+UnlockDiagnosticVMOptions` option, this simply removed the use of `-XX:+UseRVV` from the test. The reason is that we have `-XX:+UseRVV` auto detected and enabled, so we will have RVV extension if we satisfy the test requirement: > > (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") > > > Same test pass with this fix on linux-riscv64 with RVV extension. This pull request has now been integrated. Changeset: 37298844 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/37298844c9504fbafb08c593cb6eec70184e308b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-8343555 Reviewed-by: mli ------------- PR: https://git.openjdk.org/jdk/pull/22188 From fyang at openjdk.org Tue Nov 19 01:52:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 19 Nov 2024 01:52:50 GMT Subject: RFR: 8344371: RISC-V: compiler/intrinsics/chacha/TestChaCha20.java fails after JDK-8343555 In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 11:44:51 GMT, Hamlin Li wrote: >> From the error message, the cause of the failure is that 'UseRVV' was made diagnostic in [JDK-8343555](https://bugs.openjdk.org/browse/JDK-8343555). But the test was not updated to refect this. Instead of adding one extra `-XX:+UnlockDiagnosticVMOptions` option, this simply removed the use of `-XX:+UseRVV` from the test. The reason is that we have `-XX:+UseRVV` auto detected and enabled, so we will have RVV extension if we satisfy the test requirement: >> >> (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") >> >> >> Same test pass with this fix on linux-riscv64 with RVV extension. > > Looks good. Thanks for catching this. @Hamlin-Li : Thanks for the review! Moving on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22188#issuecomment-2484536564 From dhanalla at openjdk.org Tue Nov 19 03:36:51 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 19 Nov 2024 03:36:51 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v4] In-Reply-To: <4t14KRimdrYG3dPJ4FgeeX0oz1xwGDNfuParbwVIL68=.ea5c0137-ffad-41f7-9ac0-e95daecf09ea@github.com> References: <4oWQ5tScx2i8xp1XO-q7R-SczbUZT_Klq757GyFkmlY=.2907afac-3872-4ff6-88e0-2a05144ff21b@github.com> <4t14KRimdrYG3dPJ4FgeeX0oz1xwGDNfuParbwVIL68=.ea5c0137-ffad-41f7-9ac0-e95daecf09ea@github.com> Message-ID: On Mon, 4 Nov 2024 16:17:54 GMT, Vladimir Kozlov wrote: >> Okay thanks for investigating again. A bailout makes sense for this edge case. > > Yes, bailout with recompilation is preferable. Graph could be already partially modified with some fields accesses nodes for scalaraized object. > > If bailout check and code is the same as in `escape.cpp` consider factoring it into one function to use in both places. Thanks for reviewing this PR @vnkozlov ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1847591390 From amitkumar at openjdk.org Tue Nov 19 04:44:00 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Nov 2024 04:44:00 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v2] In-Reply-To: References: Message-ID: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: - s390x update - ppc changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/39e16cd3..a08e4fdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=00-01 Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From amitkumar at openjdk.org Tue Nov 19 05:20:45 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Nov 2024 05:20:45 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 04:44:00 GMT, Amit Kumar wrote: >> This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - s390x update > - ppc changes I kept the negative check condition `c > 0 && c < max_jint` for s390x for two key reasons: 1. If c is negative, there is no possibility that is_power_of_2 will allow it to pass through for optimised division, so we return false early. 2. Treating negative numbers as inherently negative feels logically correct in this context. As a result, if `c == -1`, both implementations (for s390x and PPC) will return `true` by performing the negate operation. For cases where `c < -1`, both will return `false`. However, on s390x, the condition will avoid performing two extra checks for determining if the value is a power of 2, because it can't be. Let me know if you would like to modify the PPC implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2484729604 From kbarrett at openjdk.org Tue Nov 19 05:35:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 19 Nov 2024 05:35:50 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 04:44:00 GMT, Amit Kumar wrote: >> This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with two additional commits since the last revision: > > - s390x update > - ppc changes It looks to me that aarch64 and arm have exactly the same issue. As mentioned in JBS, x86 and riscv already have similar checking as being proposed here. It would be nice if all platforms had exactly the same check, rather than some in one order and some in a different order. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2484744881 From rcastanedalo at openjdk.org Tue Nov 19 07:05:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 19 Nov 2024 07:05:50 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: <3O2qOBqRKItBOZbVZWwBNjheOciIH5RGP-Rz0sLNiQA=.5fdaf58e-bdde-4d27-ad9e-85bc381d89bd@github.com> On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) Thanks for reviewing, Qu?n Anh! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2484850909 From syan at openjdk.org Tue Nov 19 07:29:55 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 07:29:55 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize Message-ID: Hi all, Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. Additional testing - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build ------------- Commit messages: - use rscratch2 instead of rscratch1, and use interpreter_frame_mirror_offset instead interpreter_frame_initial_sp_offset - Use 64-bit arithmetic, use sub instead subw - use subw instead of sub - revert change of test/jdk/ProblemList-Xcomp.txt - use br(Assembler::GE, L) - use r0 as Rtemp - 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize - 8344199: Problemlist jdk/jfr/jvm/TestVirtualThreadExclusion.java before JDK-8344199 resolved Changes: https://git.openjdk.org/jdk/pull/22181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22181&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344356 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22181/head:pull/22181 PR: https://git.openjdk.org/jdk/pull/22181 From aph at openjdk.org Tue Nov 19 07:29:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 07:29:55 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: <9to3ci-oAluMHFelVhfNvlly-ujX8fs6r3wfrTTCcQw=.edcde771-14c1-45ef-b551-1bad89c44c14@github.com> On Sun, 17 Nov 2024 09:06:37 GMT, SendaoYan wrote: > Hi all, > Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 397: > 395: if (VerifyActivationFrameSize) { > 396: Label L; > 397: subw(rscratch1, rfp, esp); Use 64-bit arithmetic here, or you'll miss some kinds of error. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1846712858 From syan at openjdk.org Tue Nov 19 07:29:55 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 07:29:55 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize In-Reply-To: <9to3ci-oAluMHFelVhfNvlly-ujX8fs6r3wfrTTCcQw=.edcde771-14c1-45ef-b551-1bad89c44c14@github.com> References: <9to3ci-oAluMHFelVhfNvlly-ujX8fs6r3wfrTTCcQw=.edcde771-14c1-45ef-b551-1bad89c44c14@github.com> Message-ID: On Mon, 18 Nov 2024 14:45:48 GMT, Andrew Haley wrote: >> Hi all, >> Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 397: > >> 395: if (VerifyActivationFrameSize) { >> 396: Label L; >> 397: subw(rscratch1, rfp, esp); > > Use 64-bit arithmetic here, or you'll miss some kinds of error. Okey, Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1846826087 From syan at openjdk.org Tue Nov 19 07:29:56 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 07:29:56 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: <9to3ci-oAluMHFelVhfNvlly-ujX8fs6r3wfrTTCcQw=.edcde771-14c1-45ef-b551-1bad89c44c14@github.com> Message-ID: On Mon, 18 Nov 2024 15:43:51 GMT, SendaoYan wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 397: >> >>> 395: if (VerifyActivationFrameSize) { >>> 396: Label L; >>> 397: subw(rscratch1, rfp, esp); >> >> Use 64-bit arithmetic here, or you'll miss some kinds of error. > > Okey, Thanks. `subw` has been replaced as `sub` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1847749966 From dlong at openjdk.org Tue Nov 19 07:56:15 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 19 Nov 2024 07:56:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 08:04:35 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - manual merge > - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding > - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding > - fix whitespace > - fix tests and build > - fix store-to-load forward IR rules > - updates before the weekend ... who knows if they are any good > - refactor to iteration threshold > - use jvmArgs again, and apply same fix as 8343345 > - revert to jvmArgsPrepend > - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 Why does the benchmark need to have so many methods, to make sure the different values are treated as constants? I?m not sure, but JMH might turn @Param values into constants. If so, then your benchmark can be greatly simplified. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2484953605 From rrich at openjdk.org Tue Nov 19 08:16:55 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 19 Nov 2024 08:16:55 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() Message-ID: This change removes the ResourceMark from `PhaseChaitin::merge_multidefs()` because it frees memory that is used in the caller method `PhaseChaitin::Register_Allocate`. [My comment](https://bugs.openjdk.org/browse/JDK-8328085?focusedId=14723086&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14723086) on the JBS item explains the details. #### Testing I was able to reproduce the issue on ppc64le but not on x86_64 running applications/ctw/modules/java_desktop.java. The issue didn't reproduce with this pr. #### ResourceArea Sizes I've traced maximum ResourceArea size after returning from `PhaseChaitin::merge_multidefs()` (see [first commit](https://github.com/openjdk/jdk/pull/22200/commits/ffbe6dee05a5a66c2965f4ff7e4cd466605cf89d)). I haven't found a significant difference. Below you can see the last trace line from each run. ##### x86_64: 3 Runs Dacapo Tomcat 5 Iterations ###### Baseline Run 1: [24.222s][info][newcode] New maximum for resource area size: 3274 KB Run 2: [21.317s][info][newcode] New maximum for resource area size: 3274 KB Run 3: [37.400s][info][newcode] New maximum for resource area size: 3336 KB ###### PR Run 1: [35.002s][info][newcode] New maximum for resource area size: 3363 KB Run 2: [21.332s][info][newcode] New maximum for resource area size: 3274 KB Run 3: [36.050s][info][newcode] New maximum for resource area size: 3286 KB ##### x86_64: 3 Runs applications/ctw/modules/java_desktop.java ###### Baseline Run 1: [29.876s][info][newcode] New maximum for resource area size: 3143 KB Run 2: [29.631s][info][newcode] New maximum for resource area size: 3111 KB Run 3: [29.227s][info][newcode] New maximum for resource area size: 3142 KB ###### PR Run 1: [29.755s][info][newcode] New maximum for resource area size: 3175 KB Run 2: [28.964s][info][newcode] New maximum for resource area size: 3143 KB Run 3: [28.863s][info][newcode] New maximum for resource area size: 3143 KB ##### PPC: 3 Runs Dacapo Tomcat 5 Iterations ###### Baseline Run 1: [20.041s][info][newcode] New maximum for resource area size: 3474 KB Run 2: [20.581s][info][newcode] New maximum for resource area size: 3474 KB Run 3: [20.367s][info][newcode] New maximum for resource area size: 3474 KB ###### PR Run 1: [20.520s][info][newcode] New maximum for resource area size: 3506 KB Run 2: [20.918s][info][newcode] New maximum for resource area size: 3506 KB Run 3: [20.994s][info][newcode] New maximum for resource area size: 3505 KB ##### PPC: 3 Runs applications/ctw/modules/java_desktop.java ###### Baseline Run 1: [71.992s][info][newcode] New maximum for resource area size: 3483 KB Run 2: [55.808s][info][newcode] New maximum for resource area size: 3483 KB Run 3: [29.252s][info][newcode] New maximum for resource area size: 1684 KB ###### PR Run 1: [55.996s][info][newcode] New maximum for resource area size: 3515 KB Run 2: [30.384s][info][newcode] New maximum for resource area size: 2849 KB Run 3: [65.671s][info][newcode] New maximum for resource area size: 3547 KB ------------- Commit messages: - Revert trace code - Remove ResourceMark from PhaseChaitin::merge_multidefs - Log max Resourcearea size after merge_multidefs Changes: https://git.openjdk.org/jdk/pull/22200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328085 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22200/head:pull/22200 PR: https://git.openjdk.org/jdk/pull/22200 From epeter at openjdk.org Tue Nov 19 08:20:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 08:20:48 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 07:53:26 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - manual merge >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - fix whitespace >> - fix tests and build >> - fix store-to-load forward IR rules >> - updates before the weekend ... who knows if they are any good >> - refactor to iteration threshold >> - use jvmArgs again, and apply same fix as 8343345 >> - revert to jvmArgsPrepend >> - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 > > Why does the benchmark need to have so many methods, to make sure the different values are treated as constants? I?m not sure, but JMH might turn @Param values into constants. If so, then your benchmark can be greatly simplified. @dean-long I tried that with `@param`, but then they are not constants... sadly. And they need to be constants. Let me know if you find some better way though ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2484997967 From aph at openjdk.org Tue Nov 19 09:23:50 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 09:23:50 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: <2RyGpzYi_QoExhfPwB48y_HJHar6UcZ-itiC8XAv0_g=.fefbc4fd-381a-4d90-a72b-f37b0f5146d4@github.com> On Sun, 17 Nov 2024 09:06:37 GMT, SendaoYan wrote: > Hi all, > Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 399: > 397: sub(rscratch2, rfp, esp); > 398: int32_t min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; > 399: cmpw(rscratch2, min_frame_size); Here too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1847963527 From mli at openjdk.org Tue Nov 19 09:36:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 19 Nov 2024 09:36:53 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: On Mon, 18 Nov 2024 12:53:52 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? This is a follow-up of 8340453 on riscv. >> Thanks! > > src/hotspot/cpu/riscv/riscv.ad line 4824: > >> 4822: ins_encode %{ >> 4823: __ lwu(as_Register($dst$$reg), Address(as_Register($mem$$base), $mem$$disp)); >> 4824: __ srli(as_Register($dst$$reg), as_Register($dst$$reg), (unsigned) markWord::klass_shift_at_offset); > > Do we need this explicit unsigned type conversion? > (PS: I didn't see any issue when building without this type conversion) Yes, you're right here, the `int` will be converted to `unsigned int` implicitly. I added the explicit conversion because it makes the code more clear, and in `riscv.ad` similar conversions are explicit (e.g. `LShiftI` and so on) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22203#discussion_r1847985329 From thartmann at openjdk.org Tue Nov 19 10:05:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 19 Nov 2024 10:05:01 GMT Subject: Integrated: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: <219p_K01F6d3a_AXn1VekrcxgUiKZlDFpaseniUcEIM=.78bbcc6b-ec4f-469f-a348-a3315c21c24f@github.com> On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 9d60300f Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/9d60300feea12d353fcd6c806b196ace2df02d05 Stats: 11 lines in 2 files changed: 6 ins; 1 del; 4 mod 8344199: Incorrect excluded field value set by getEventWriter intrinsic Co-authored-by: Patricio Chilano Mateo Reviewed-by: syan, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/22195 From syan at openjdk.org Tue Nov 19 10:06:30 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 10:06:30 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: References: Message-ID: > Hi all, > Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build SendaoYan has updated the pull request incrementally with one additional commit since the last revision: use cmp instead cmpw ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22181/files - new: https://git.openjdk.org/jdk/pull/22181/files/960e7c27..29e0d3ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22181&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22181&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22181/head:pull/22181 PR: https://git.openjdk.org/jdk/pull/22181 From syan at openjdk.org Tue Nov 19 10:06:30 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 10:06:30 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: <2RyGpzYi_QoExhfPwB48y_HJHar6UcZ-itiC8XAv0_g=.fefbc4fd-381a-4d90-a72b-f37b0f5146d4@github.com> References: <2RyGpzYi_QoExhfPwB48y_HJHar6UcZ-itiC8XAv0_g=.fefbc4fd-381a-4d90-a72b-f37b0f5146d4@github.com> Message-ID: On Tue, 19 Nov 2024 09:20:47 GMT, Andrew Haley wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> use cmp instead cmpw > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 399: > >> 397: sub(rscratch2, rfp, esp); >> 398: int32_t min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; >> 399: cmpw(rscratch2, min_frame_size); > > Here too. Thanks, `cmpw` has replace as `cmp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1848034420 From fyang at openjdk.org Tue Nov 19 10:26:42 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 19 Nov 2024 10:26:42 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: On Mon, 18 Nov 2024 11:32:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is a follow-up of 8340453 on riscv. > Thanks! Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22203#pullrequestreview-2444997647 From fyang at openjdk.org Tue Nov 19 10:26:43 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 19 Nov 2024 10:26:43 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: <6eNkFtKI1hondFC_bAkfAZ0i0RdyE2avhG11Se1Xol8=.a1bb33b4-4b34-4612-b780-ff512c794f74@github.com> On Tue, 19 Nov 2024 09:34:33 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 4824: >> >>> 4822: ins_encode %{ >>> 4823: __ lwu(as_Register($dst$$reg), Address(as_Register($mem$$base), $mem$$disp)); >>> 4824: __ srli(as_Register($dst$$reg), as_Register($dst$$reg), (unsigned) markWord::klass_shift_at_offset); >> >> Do we need this explicit unsigned type conversion? >> (PS: I didn't see any issue when building without this type conversion) > > Yes, you're right here, the `int` will be converted to `unsigned int` implicitly. > I added the explicit conversion because it makes the code more clear, and in `riscv.ad` similar conversions are explicit (e.g. `LShiftI` and so on) All right then ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22203#discussion_r1848064450 From chagedorn at openjdk.org Tue Nov 19 10:45:56 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 10:45:56 GMT Subject: RFR: 8331295: C2: Do not clone address computations that are indirect memory input to at least one load/store In-Reply-To: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Fri, 15 Nov 2024 18:29:48 GMT, Daniel Lund?n wrote: > On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. > > ### Changeset > > - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. > - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. > - Add a new IR framework regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22157#pullrequestreview-2445046713 From aph at openjdk.org Tue Nov 19 10:49:55 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 10:49:55 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: References: Message-ID: <90AXceKglZdrUp-bJgkKYcTc2lBVi-YMhPZGTITW9Rc=.f4202eac-e969-4898-93fe-c45cd440c2c4@github.com> On Tue, 19 Nov 2024 10:06:30 GMT, SendaoYan wrote: >> Hi all, >> Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > use cmp instead cmpw src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 399: > 397: sub(rscratch2, rfp, esp); > 398: unsigned char min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; > 399: cmp(rscratch2, min_frame_size); Suggestion: int min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; subs(rscratch2, min_frame_size); The use of `subs` here is a bit odd, but it's less odd than defining `min_frame_size` as unsigned char. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1848105099 From enikitin at openjdk.org Tue Nov 19 10:56:13 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 19 Nov 2024 10:56:13 GMT Subject: RFR: 8344533: CTW: Add option to remove clinits before loading Message-ID: This PR adds an option-controlled (off by default) removal of methods before loading them with CTW ClassLoader. The main purpose is to prevent `static { ... }` blocks execution (along with static fields initialization). Testing: manual CTW runs. ------------- Commit messages: - 8344533: CTW: Add option to remove clinits before loading Changes: https://git.openjdk.org/jdk/pull/22235/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22235&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344533 Stats: 20 lines in 1 file changed: 18 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22235.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22235/head:pull/22235 PR: https://git.openjdk.org/jdk/pull/22235 From thartmann at openjdk.org Tue Nov 19 11:08:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 19 Nov 2024 11:08:47 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) That looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22038#pullrequestreview-2445113179 From rcastanedalo at openjdk.org Tue Nov 19 11:18:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 19 Nov 2024 11:18:16 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: <8sY3H0mc3f41Hd8UmY53ykQ_i2RCMs6P_4TuSB_gtBw=.71571365-e83c-4f6b-b486-b0f0ed31c46d@github.com> On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2485425713 From epeter at openjdk.org Tue Nov 19 11:51:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 11:51:54 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark Message-ID: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 It will be important for the efforts in: [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count I ran the plots for `byte, int, long`. We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. --------------------------------------------------- **Results** red: normal -> saw-tooth green: randomized offsets -> more "smooth" linux_x64 ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) linux_aarch64 ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) macosx_x64 ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) macosx_aarch64 ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) windows_x64 ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) ------------- Commit messages: - whitespace - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - JDK-8344118 Changes: https://git.openjdk.org/jdk/pull/22070/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22070&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344118 Stats: 436 lines in 1 file changed: 436 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22070.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22070/head:pull/22070 PR: https://git.openjdk.org/jdk/pull/22070 From syan at openjdk.org Tue Nov 19 12:33:56 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 12:33:56 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: > Hi all, > Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build SendaoYan has updated the pull request incrementally with one additional commit since the last revision: use subs instead of cmp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22181/files - new: https://git.openjdk.org/jdk/pull/22181/files/29e0d3ec..6e7a2fa2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22181&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22181&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22181/head:pull/22181 PR: https://git.openjdk.org/jdk/pull/22181 From syan at openjdk.org Tue Nov 19 12:37:47 2024 From: syan at openjdk.org (SendaoYan) Date: Tue, 19 Nov 2024 12:37:47 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: <90AXceKglZdrUp-bJgkKYcTc2lBVi-YMhPZGTITW9Rc=.f4202eac-e969-4898-93fe-c45cd440c2c4@github.com> References: <90AXceKglZdrUp-bJgkKYcTc2lBVi-YMhPZGTITW9Rc=.f4202eac-e969-4898-93fe-c45cd440c2c4@github.com> Message-ID: <5LVVJu3bdnP-iTw6m3k6nkXXNVip7iaWgiYY5OV9P18=.cf186b8e-c7e5-43f7-b06a-82e3e5a12c23@github.com> On Tue, 19 Nov 2024 10:47:17 GMT, Andrew Haley wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> use cmp instead cmpw > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 399: > >> 397: sub(rscratch2, rfp, esp); >> 398: unsigned char min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; >> 399: cmp(rscratch2, min_frame_size); > > Suggestion: > > int min_frame_size = (frame::link_offset - frame::interpreter_frame_mirror_offset) * wordSize; > subs(rscratch2, rscratch2, min_frame_size); > > The use of `subs` here is a bit odd, but it's less odd than defining `min_frame_size` as unsigned char. Thanks your patient review and advice. The `cmp` has been replaced as `subs`. The subs was translate to 3 instructions as expected: sub x9, x29, x20 subs x9, x9, #0x50 b.ge L ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22181#discussion_r1848280473 From qamai at openjdk.org Tue Nov 19 12:49:45 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 19 Nov 2024 12:49:45 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: <8_5vFJyJBm-hWnbvYrLbtA7Tdip7XNplNZ3d0wxt9YM=.255c6528-5cd9-4520-8508-13103b61cf63@github.com> On Tue, 19 Nov 2024 08:17:18 GMT, Emanuel Peter wrote: >> Why does the benchmark need to have so many methods, to make sure the different values are treated as constants? I?m not sure, but JMH might turn @Param values into constants. If so, then your benchmark can be greatly simplified. > > @dean-long I tried that with `@param`, but then they are not constants... sadly. And they need to be constants. Let me know if you find some better way though ;) @eme64 FYI you can make param a constant using this pattern: static final MutableCallSite MUTABLE_CONSTANT = new MutableCallSite(MethodType.methodType(int.class)); static final MethodHandle MUTABLE_CONSTANT_HANDLE = MUTABLE_CONSTANT.dynamicInvoker(); static { MethodHandle init = MethodHandles.constant(int.class, 1); MUTABLE_CONSTANT.setTarget(init); } @Param({"1", "2"}) int size; @Setup(Level.Iteration) public void setup() throws Throwable { if (size != (int) MUTABLE_CONSTANT_HANDLE.invokeExact()) { MethodHandle constant = MethodHandles.constant(int.class, size); MUTABLE_CONSTANT.setTarget(constant); } } @CompilerControl(CompilerControl.Mode.DONT_INLINE) private int test() throws Throwable { return (int) MUTABLE_CONSTANT_HANDLE.invokeExact(); } @Benchmark public void run() throws Throwable { test(); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2485621523 From epeter at openjdk.org Tue Nov 19 13:29:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 13:29:48 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 07:53:26 GMT, Dean Long wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - manual merge >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - fix whitespace >> - fix tests and build >> - fix store-to-load forward IR rules >> - updates before the weekend ... who knows if they are any good >> - refactor to iteration threshold >> - use jvmArgs again, and apply same fix as 8343345 >> - revert to jvmArgsPrepend >> - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 > > Why does the benchmark need to have so many methods, to make sure the different values are treated as constants? I?m not sure, but JMH might turn @Param values into constants. If so, then your benchmark can be greatly simplified. @dean-long @merykitty I'm trying the `MethodHandles.constant` trick. Any chance you'd review the VM fix itself? ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2485711765 From epeter at openjdk.org Tue Nov 19 13:58:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 13:58:14 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use constant method handle to make the benchmark smaller ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/000f9f13..e89f4b03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=01-02 Stats: 3615 lines in 1 file changed: 25 ins; 3576 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Tue Nov 19 13:58:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 13:58:14 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: <8_5vFJyJBm-hWnbvYrLbtA7Tdip7XNplNZ3d0wxt9YM=.255c6528-5cd9-4520-8508-13103b61cf63@github.com> References: <8_5vFJyJBm-hWnbvYrLbtA7Tdip7XNplNZ3d0wxt9YM=.255c6528-5cd9-4520-8508-13103b61cf63@github.com> Message-ID: <8KCxrboBeH9m7Xtg5P37uIDU2BY_uAlgO-fHmtmxUhA=.d0cfce20-33ec-45c0-8d61-face77c811c3@github.com> On Tue, 19 Nov 2024 12:46:16 GMT, Quan Anh Mai wrote: >> @dean-long I tried that with `@param`, but then they are not constants... sadly. And they need to be constants. Let me know if you find some better way though ;) > > @eme64 FYI you can make param a constant using this pattern: > > static final MutableCallSite MUTABLE_CONSTANT = new MutableCallSite(MethodType.methodType(int.class)); > static final MethodHandle MUTABLE_CONSTANT_HANDLE = MUTABLE_CONSTANT.dynamicInvoker(); > > static { > MethodHandle init = MethodHandles.constant(int.class, 1); > MUTABLE_CONSTANT.setTarget(init); > } > > @Param({"1", "2"}) > int size; > > @Setup(Level.Iteration) > public void setup() throws Throwable { > if (size != (int) MUTABLE_CONSTANT_HANDLE.invokeExact()) { > MethodHandle constant = MethodHandles.constant(int.class, size); > MUTABLE_CONSTANT.setTarget(constant); > } > } > > @CompilerControl(CompilerControl.Mode.DONT_INLINE) > private int test() throws Throwable { > return (int) MUTABLE_CONSTANT_HANDLE.invokeExact(); > } > > @Benchmark > public void run() throws Throwable { > test(); > } @merykitty nice trick. It seems to work :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2485789391 From aph at openjdk.org Tue Nov 19 14:41:44 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 14:41:44 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 12:33:56 GMT, SendaoYan wrote: >> Hi all, >> Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > use subs instead of cmp Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22181#pullrequestreview-2445674920 From chagedorn at openjdk.org Tue Nov 19 14:54:55 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 14:54:55 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 13:58:14 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use constant method handle to make the benchmark smaller Nice blog post, benchmarks, summary and explanation of the problem! The new heuristic looks reasonable and safer/simpler to use for JDK 24. Would be interesting to see if you could come up with a throughput and latency based heuristic/cost model at some point in the future. I have some comments, mostly minor things. src/hotspot/share/opto/vtransform.cpp line 154: > 152: // Helper-class for VTransformGraph::has_store_to_load_forwarding_failure. > 153: // It represents a memory region: [ptr, ptr + memory_size) > 154: class VPointerRecord : public StackObj { Not so sure about the name of this class. When first reading the code below (which I reviewed first), I found it difficult to understand its purpose without reading this class comment. How about `VMemoryRegion` or something like that to better show that it's about regions of memory and not single pointers? test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 28: > 26: * @test > 27: * @bug 8298935 > 28: * @summary Writing forward on array creates cyclic dependency You could probably add the bug number of this bug to the `@bug` in the line above this one. test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 36: > 34: * TestCyclicDependency > 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+AlignVector -XX:+VerifyAlignVector > 36: * TestCyclicDependency These flags will be ignored for the test VM. You should pass them as separate runs in `main()` with `TestFramework.runWithFlags()`. test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java line 137: > 135: }) > 136: public static class NoVectorization extends VectorStoreToLoadForwarding {} > 137: Suggestion: ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21521#pullrequestreview-2445346045 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848489712 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848509941 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848508701 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848499954 From chagedorn at openjdk.org Tue Nov 19 14:55:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 14:55:04 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 08:04:35 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - manual merge > - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding > - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding > - fix whitespace > - fix tests and build > - fix store-to-load forward IR rules > - updates before the weekend ... who knows if they are any good > - refactor to iteration threshold > - use jvmArgs again, and apply same fix as 8343345 > - revert to jvmArgsPrepend > - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 src/hotspot/share/opto/c2_globals.hpp line 361: > 359: "if >0, auto-vectorization detects possible store-to-load" \ > 360: "forwarding failures. The number specifies over how many" \ > 361: "loop iterations this detection spans.") \ You should add whitespaces at the end of the line to avoid connected words: Suggestion: "if >0, auto-vectorization detects possible store-to-load " \ "forwarding failures. The number specifies over how many " \ "loop iterations this detection spans.") \ src/hotspot/share/opto/vtransform.cpp line 162: > 160: uint _memory_size; > 161: bool _is_load; // load or store? > 162: uint _order; // order in schedule You could rename this to `_schedule_order`. Then you can remove the comment. src/hotspot/share/opto/vtransform.cpp line 188: > 186: int r1_inva_idx = r1->invar() == nullptr ? 0 : r1->invar()->_idx; > 187: int r2_inva_idx = r2->invar() == nullptr ? 0 : r2->invar()->_idx; > 188: RETURN_CMP_VALUE_IF_NOT_EQUAL(r1_inva_idx, r2_inva_idx); Suggestion: int r1_invar_idx = r1->invar() == nullptr ? 0 : r1->invar()->_idx; int r2_invar_idx = r2->invar() == nullptr ? 0 : r2->invar()->_idx; RETURN_CMP_VALUE_IF_NOT_EQUAL(r1_invar_idx, r2_invar_idx); src/hotspot/share/opto/vtransform.cpp line 234: > 232: // its value from the store-buffer, rather than from the L1 cache. This is many CPU cycles > 233: // faster. However, this optimization comes with some restrictions, depending on the CPU. > 234: // Generally, Store-to-load forwarding works if the load and store memory regions match For consistency: Suggestion: // Generally, store-to-load-forwarding works if the load and store memory regions match src/hotspot/share/opto/vtransform.cpp line 235: > 233: // faster. However, this optimization comes with some restrictions, depending on the CPU. > 234: // Generally, Store-to-load forwarding works if the load and store memory regions match > 235: // exactly (same start and width). Generally problematic are partial overlaps - though Should we also mention here that it also works when the loaded data is fully contained in the stored data. (taken from your blog post). Maybe you can also add some examples from your blog post which helped to understand this optimization better when reading the first time about it. src/hotspot/share/opto/vtransform.cpp line 237: > 235: // exactly (same start and width). Generally problematic are partial overlaps - though > 236: // some CPU's can handle even some subsets of these cases. We conservatively assume that > 237: // all such partial overlaps lead to a store-to-load-forwarding failure, which means the Suggestion: // all such partial overlaps lead to a store-to-load-forwarding failures, which means the src/hotspot/share/opto/vtransform.cpp line 238: > 236: // some CPU's can handle even some subsets of these cases. We conservatively assume that > 237: // all such partial overlaps lead to a store-to-load-forwarding failure, which means the > 238: // load has to stall until the store goes from the store-buffer into the L1 cache, incuring Suggestion: // load has to stall until the store goes from the store-buffer into the L1 cache, incurring src/hotspot/share/opto/vtransform.cpp line 247: > 245: // } > 246: // > 247: // Assume we have a 2-element vectors (2*4 = 8 bytes). This gives us this machine code: Suggestion: // Assume we have 2-element vectors (2*4 = 8 bytes). This gives us this machine code: src/hotspot/share/opto/vtransform.cpp line 259: > 257: // be forwarded because there is some partial overlap. > 258: // > 259: // Preferrably, we would have some latency-based cost-model that accounts for such forwarding Suggestion: // Preferably, we would have some latency-based cost-model that accounts for such forwarding src/hotspot/share/opto/vtransform.cpp line 260: > 258: // > 259: // Preferrably, we would have some latency-based cost-model that accounts for such forwarding > 260: // failures, and decides if vectorization with forwarding failures is still profitable. For Suggestion: // failures, and decide if vectorization with forwarding failures is still profitable. For src/hotspot/share/opto/vtransform.cpp line 261: > 259: // Preferrably, we would have some latency-based cost-model that accounts for such forwarding > 260: // failures, and decides if vectorization with forwarding failures is still profitable. For > 261: // now we go with a simpler huristic: we simply forbid vectorization if we can PROVE that Suggestion: // now we go with a simpler heuristic: we simply forbid vectorization if we can PROVE that src/hotspot/share/opto/vtransform.cpp line 263: > 261: // now we go with a simpler huristic: we simply forbid vectorization if we can PROVE that > 262: // there will be a forwarding failure. This approach has at least 2 possible weaknesses: > 263: // (1) There may be forwarding failures in cases where we cannot prove it. Maybe add a new line here Suggestion: // // (1) There may be forwarding failures in cases where we cannot prove it. src/hotspot/share/opto/vtransform.cpp line 271: > 269: // We do not know if aI and bI refer to the same array or not. However, it is reasonable > 270: // to assume that if we have two different array references, that they most likely refer > 271: // to different arrays, where we would have no forwarding failures. For completeness: Suggestion: // to different arrays (i.e. no aliasing), where we would have no forwarding failures. src/hotspot/share/opto/vtransform.cpp line 278: > 276: // Performance measurements with the JMH benchmark StoreToLoadForwarding.java have indicated > 277: // that there is some iteration threshold: if the failure happens between a store and load that > 278: // have an iteration distance below this threshold, the latency is the limiting factor, and we It's probably clear what you mean by "iteration distance" but maybe to be sure, you can add at your example above that the "iteration distance" is 3 there. src/hotspot/share/opto/vtransform.cpp line 292: > 290: // To detect store-to-load-forwarding failures at the iteration threshold or below, we > 291: // simulate a super-unrolling to reach SuperWordStoreToLoadForwardingFailureDetection > 292: // iterations at least. The comment could be a bit misleading when `simulated_unrolling_count` < `unrolled_count`. Looks like you then just do the analysis with the current `unrolled_count`. Can you extend this comment to mention that? src/hotspot/share/opto/vtransform.cpp line 297: > 295: uint simulated_super_unrolling_count = MAX2(1, simulated_unrolling_count / unrolled_count); > 296: int iv_stride = vloop_analyzer.vloop().iv_stride(); > 297: int order = 0; To make it more clear what this order is: Suggestion: int schedule_order = 0; src/hotspot/share/opto/vtransform.cpp line 307: > 305: VTransformVectorNode* vector = vtn->isa_Vector(); > 306: uint vector_length = vector != nullptr ? vector->nodes().length() : 1; > 307: records.push(VPointerRecord(p, iv_offset, vector_length, order++)); Suggestion: records.push(VPointerRecord(p, iv_offset, vector_length, schedule_order++)); src/hotspot/share/opto/vtransform.cpp line 313: > 311: } > 312: > 313: // Sort the pointers by group (same base, invar and stride), and by offset. Suggestion: // Sort the pointers by group (same base, invar and stride), and then by offset. src/hotspot/share/opto/vtransform.cpp line 328: > 326: #endif > 327: > 328: // For all pairs of pointers in the same group, check if they have partial overlap. Suggestion: // For all pairs of pointers in the same group, check if they have a partial overlap. src/hotspot/share/opto/vtransform.cpp line 332: > 330: VPointerRecord& record1 = records.at(i); > 331: > 332: for(int j = i + 1; j < records.length(); j++) { Suggestion: for (int j = i + 1; j < records.length(); j++) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848281955 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848409352 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848416569 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848289128 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848295567 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848296424 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848296769 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848297537 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848307762 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848308419 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848308782 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848309441 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848311419 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848397758 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848422595 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848408057 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848408767 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848436839 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848437407 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848464296 From epeter at openjdk.org Tue Nov 19 15:00:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:00:47 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 14:40:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use constant method handle to make the benchmark smaller > > src/hotspot/share/opto/vtransform.cpp line 154: > >> 152: // Helper-class for VTransformGraph::has_store_to_load_forwarding_failure. >> 153: // It represents a memory region: [ptr, ptr + memory_size) >> 154: class VPointerRecord : public StackObj { > > Not so sure about the name of this class. When first reading the code below (which I reviewed first), I found it difficult to understand its purpose without reading this class comment. How about `VMemoryRegion` or something like that to better show that it's about regions of memory and not single pointers? @chhagedorn all `VPointer` are in fact memory-regions, and not just zero-length pointers. They have a `MemNode` which gives them a size. That is how we can compute the overlapping / aliasing queries with it... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848522853 From luhenry at openjdk.org Tue Nov 19 15:04:04 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 19 Nov 2024 15:04:04 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: On Mon, 18 Nov 2024 11:32:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is a follow-up of 8340453 on riscv. > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22203#pullrequestreview-2445743937 From luhenry at openjdk.org Tue Nov 19 15:05:58 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 19 Nov 2024 15:05:58 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Thu, 14 Nov 2024 14:05:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. >> >> Thanks >> >> ## Performance >> Tests on bananapi, for other platform, please check jbs issue for test data. >> >> ### Before >> data >> >> Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op >> >> >> >> >> ### After >> data >> >> Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test typo Instead of removing, can we put it behind a flag that disabled by default? We clearly don't want to keep something that's slower for the current generation of hardware but we could expect that the next generation of hardware to go faster. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22102#issuecomment-2485962930 From epeter at openjdk.org Tue Nov 19 15:07:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:07:33 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v4] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Christian's suggestions Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/e89f4b03..3bdd9477 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=02-03 Stats: 20 lines in 3 files changed: 1 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Tue Nov 19 15:07:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:07:34 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 14:50:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use constant method handle to make the benchmark smaller > > test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 36: > >> 34: * TestCyclicDependency >> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+AlignVector -XX:+VerifyAlignVector >> 36: * TestCyclicDependency > > These flags will be ignored for the test VM. You should pass them as separate runs in `main()` with `TestFramework.runWithFlags()`. Oh, is that true? That's scarry. I'm sure we are doing this elsewhere too. Is there not a way to forward these kinds of flags from the `@run` to the IR Framework test VM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848531873 From epeter at openjdk.org Tue Nov 19 15:11:58 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:11:58 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 14:58:31 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 154: >> >>> 152: // Helper-class for VTransformGraph::has_store_to_load_forwarding_failure. >>> 153: // It represents a memory region: [ptr, ptr + memory_size) >>> 154: class VPointerRecord : public StackObj { >> >> Not so sure about the name of this class. When first reading the code below (which I reviewed first), I found it difficult to understand its purpose without reading this class comment. How about `VMemoryRegion` or something like that to better show that it's about regions of memory and not single pointers? > > @chhagedorn all `VPointer` are in fact memory-regions, and not just zero-length pointers. They have a `MemNode` which gives them a size. That is how we can compute the overlapping / aliasing queries with it... Actually, it is supposed to be a `VPointer`, but one where we can give some `iv_offset`, i.e. mutate the offset. But ok, I could rename it to `VMemoryRegion`, it would not hurt :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848540452 From epeter at openjdk.org Tue Nov 19 15:18:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:18:14 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v5] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/3bdd9477..244e1f8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=03-04 Stats: 38 lines in 1 file changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Tue Nov 19 15:18:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:18:15 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: <8TKyptVs0wRKKQ5FLzvRhTy2Xu-Jk1TpZxdGwUT9TC0=.033d2d39-a578-4e21-b9e8-194374c48806@github.com> On Tue, 19 Nov 2024 15:08:47 GMT, Emanuel Peter wrote: >> @chhagedorn all `VPointer` are in fact memory-regions, and not just zero-length pointers. They have a `MemNode` which gives them a size. That is how we can compute the overlapping / aliasing queries with it... > > Actually, it is supposed to be a `VPointer`, but one where we can give some `iv_offset`, i.e. mutate the offset. But ok, I could rename it to `VMemoryRegion`, it would not hurt :) done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848550495 From epeter at openjdk.org Tue Nov 19 15:18:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:18:18 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 13:54:21 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - manual merge >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - Merge branch 'master' into JDK-8334431-V-store-to-load-forwarding >> - fix whitespace >> - fix tests and build >> - fix store-to-load forward IR rules >> - updates before the weekend ... who knows if they are any good >> - refactor to iteration threshold >> - use jvmArgs again, and apply same fix as 8343345 >> - revert to jvmArgsPrepend >> - ... and 15 more: https://git.openjdk.org/jdk/compare/543e355b...000f9f13 > > src/hotspot/share/opto/vtransform.cpp line 162: > >> 160: uint _memory_size; >> 161: bool _is_load; // load or store? >> 162: uint _order; // order in schedule > > You could rename this to `_schedule_order`. Then you can remove the comment. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848550596 From chagedorn at openjdk.org Tue Nov 19 15:24:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 15:24:05 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 15:03:40 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestCyclicDependency.java line 36: >> >>> 34: * TestCyclicDependency >>> 35: * @run main/othervm -XX:+IgnoreUnrecognizedVMOptions -XX:+AlignVector -XX:+VerifyAlignVector >>> 36: * TestCyclicDependency >> >> These flags will be ignored for the test VM. You should pass them as separate runs in `main()` with `TestFramework.runWithFlags()`. > > Oh, is that true? That's scarry. I'm sure we are doing this elsewhere too. Is there not a way to forward these kinds of flags from the `@run` to the IR Framework test VM? Good point, maybe we should check that at some point that other places are not passing flags like that. The main reason to go with `driver` + `runWithFlags()` is that we do not want to stress the driver VM that calls your `main()` method, prepares the test VM and does the IR matching. When you only pass the flags witih `runWithFlags()`, then the IR framework makes sure only to run the test VM with the additional flags which could impact the overall test performance otherwise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848562790 From epeter at openjdk.org Tue Nov 19 15:24:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:24:04 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 12:45:03 GMT, Christian Hagedorn wrote: > Should we also mention here that it also works when the loaded data is fully contained in the stored data. fully contained, as in `strict subset`? I mentioned that already... and sadly it works on some platforms, but not others... quite complex. That is why I make the "conservative assumption". > src/hotspot/share/opto/vtransform.cpp line 278: > >> 276: // Performance measurements with the JMH benchmark StoreToLoadForwarding.java have indicated >> 277: // that there is some iteration threshold: if the failure happens between a store and load that >> 278: // have an iteration distance below this threshold, the latency is the limiting factor, and we > > It's probably clear what you mean by "iteration distance" but maybe to be sure, you can add at your example above that the "iteration distance" is 3 there. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848557462 PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848561078 From chagedorn at openjdk.org Tue Nov 19 15:32:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 15:32:12 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: Message-ID: <4It017EEWWcnTUs-XO1sWZ1OOTEMkMSx3enrqDhE3uY=.2ac738d9-dc1b-4aa1-bdc4-e4a167a22647@github.com> On Tue, 19 Nov 2024 15:18:18 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vtransform.cpp line 235: >> >>> 233: // faster. However, this optimization comes with some restrictions, depending on the CPU. >>> 234: // Generally, Store-to-load forwarding works if the load and store memory regions match >>> 235: // exactly (same start and width). Generally problematic are partial overlaps - though >> >> Should we also mention here that it also works when the loaded data is fully contained in the stored data. (taken from your blog post). Maybe you can also add some examples from your blog post which helped to understand this optimization better when reading the first time about it. > >> Should we also mention here that it also works when the loaded data is fully contained in the stored data. > > fully contained, as in `strict subset`? I mentioned that already... and sadly it works on some platforms, but not others... quite complex. That is why I make the "conservative assumption". Ah, I thought that as long as the starting addresses match, then all platforms will do the optimization when we store more bytes than we load. But that's not the case then? But of course for the analysis we do in Superword, we only assume that exact matches will work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848579487 From amitkumar at openjdk.org Tue Nov 19 15:42:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Nov 2024 15:42:09 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v3] In-Reply-To: References: Message-ID: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds change for arm & aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/a08e4fdb..9820b9cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=01-02 Stats: 47 lines in 3 files changed: 29 ins; 2 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From epeter at openjdk.org Tue Nov 19 15:42:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:42:21 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v6] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: even more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/244e1f8a..2d98fd1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=04-05 Stats: 21 lines in 2 files changed: 11 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From chagedorn at openjdk.org Tue Nov 19 15:42:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 15:42:22 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: <4It017EEWWcnTUs-XO1sWZ1OOTEMkMSx3enrqDhE3uY=.2ac738d9-dc1b-4aa1-bdc4-e4a167a22647@github.com> Message-ID: On Tue, 19 Nov 2024 15:36:32 GMT, Emanuel Peter wrote: >> Ah, I thought that as long as the starting addresses match, then all platforms will do the optimization when we store more bytes than we load. But that's not the case then? But of course for the analysis we do in Superword, we only assume that exact matches will work. > > Yes, exactly. In general the CPU can be smarter, but we assume only exact matches are successes - all others failure if they overlap in any way. Got it, thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848595159 From epeter at openjdk.org Tue Nov 19 15:42:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:42:22 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: <4It017EEWWcnTUs-XO1sWZ1OOTEMkMSx3enrqDhE3uY=.2ac738d9-dc1b-4aa1-bdc4-e4a167a22647@github.com> References: <4It017EEWWcnTUs-XO1sWZ1OOTEMkMSx3enrqDhE3uY=.2ac738d9-dc1b-4aa1-bdc4-e4a167a22647@github.com> Message-ID: On Tue, 19 Nov 2024 15:29:30 GMT, Christian Hagedorn wrote: >>> Should we also mention here that it also works when the loaded data is fully contained in the stored data. >> >> fully contained, as in `strict subset`? I mentioned that already... and sadly it works on some platforms, but not others... quite complex. That is why I make the "conservative assumption". > > Ah, I thought that as long as the starting addresses match, then all platforms will do the optimization when we store more bytes than we load. But that's not the case then? But of course for the analysis we do in Superword, we only assume that exact matches will work. Yes, exactly. In general the CPU can be smarter, but we assume only exact matches are successes - all others failure if they overlap in any way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848591954 From qamai at openjdk.org Tue Nov 19 15:42:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 19 Nov 2024 15:42:22 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v5] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 15:18:14 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more for Christian test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java line 91: > 89: } > 90: > 91: @CompilerControl(CompilerControl.Mode.DONT_INLINE) Err are you sure this works I think this should be `FORCE_INLINE` instead. I see you want to have different `SIZE`s, too. Then you can make a `MutableCallSite` for each parameter. The magic here is that the compiler will treat the call target as a constant and force a recompilation each time you call `setTarget` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848587591 From epeter at openjdk.org Tue Nov 19 15:42:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 15:42:23 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v5] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 15:34:01 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more for Christian > > test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java line 91: > >> 89: } >> 90: >> 91: @CompilerControl(CompilerControl.Mode.DONT_INLINE) > > Err are you sure this works I think this should be `FORCE_INLINE` instead. I see you want to have different `SIZE`s, too. Then you can make a `MutableCallSite` for each parameter. The magic here is that the compiler will treat the call target as a constant and force a recompilation each time you call `setTarget` Hmm, somehow it looked like it worked... but it cannot with this. I'll have another look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848593471 From amitkumar at openjdk.org Tue Nov 19 15:48:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Nov 2024 15:48:49 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v4] In-Reply-To: References: Message-ID: > This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: remove dummy code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/9820b9cf..1823bfc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=02-03 Stats: 18 lines in 1 file changed: 0 ins; 18 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From amitkumar at openjdk.org Tue Nov 19 15:48:50 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 19 Nov 2024 15:48:50 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 05:32:36 GMT, Kim Barrett wrote: > It looks to me that aarch64 and arm have exactly the same issue. As mentioned in JBS, x86 and riscv already have similar checking as being proposed here. It would be nice if all platforms had exactly the same check, rather than some in one order and some in a different order. @kimbarrett I have added changes for `aarch64` and `arm`. Please have a look :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2486076583 From mli at openjdk.org Tue Nov 19 16:03:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 19 Nov 2024 16:03:05 GMT Subject: RFR: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: <8AfXb7Cwh9Izce2JILWl6JwFgrC4x6gWAu6b6v3PKp0=.65140986-d6a4-4675-be5e-e0d5b1734184@github.com> On Mon, 18 Nov 2024 11:32:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is a follow-up of 8340453 on riscv. > Thanks! Thanks for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22203#issuecomment-2486110469 From mli at openjdk.org Tue Nov 19 16:03:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 19 Nov 2024 16:03:05 GMT Subject: Integrated: 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers In-Reply-To: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> References: <1BxUf0HFCf1MyZts3fHUNPMpmLL6l7YVv-u_InB6xqo=.fe6a392c-d1b0-42f0-ba38-aa1677caf2a0@github.com> Message-ID: On Mon, 18 Nov 2024 11:32:27 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? This is a follow-up of 8340453 on riscv. > Thanks! This pull request has now been integrated. Changeset: dc940ec8 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/dc940ec8afcd3cd12ed3785d547f4cd602f65c15 Stats: 18 lines in 3 files changed: 4 ins; 12 del; 2 mod 8344387: RISC-V: C2: Improve encoding of LoadNKlass for compact headers Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/22203 From epeter at openjdk.org Tue Nov 19 16:03:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:03:26 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v7] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more examples for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/2d98fd1c..7a8f365e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=05-06 Stats: 50 lines in 1 file changed: 43 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Tue Nov 19 16:03:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:03:26 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: References: <4It017EEWWcnTUs-XO1sWZ1OOTEMkMSx3enrqDhE3uY=.2ac738d9-dc1b-4aa1-bdc4-e4a167a22647@github.com> Message-ID: On Tue, 19 Nov 2024 15:38:22 GMT, Christian Hagedorn wrote: >> Yes, exactly. In general the CPU can be smarter, but we assume only exact matches are successes - all others failure if they overlap in any way. > > Got it, thanks for the explanation! I added some more examples. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848628162 From epeter at openjdk.org Tue Nov 19 16:03:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:03:26 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v5] In-Reply-To: References: Message-ID: <5Faii6xD3GEMF1hYVolTxmNLDdEn8Ow_bU7Eyv4IEvE=.5f9019d3-9e57-42cf-96f1-fc536c9ec2b2@github.com> On Tue, 19 Nov 2024 15:37:21 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/vm/compiler/VectorStoreToLoadForwarding.java line 91: >> >>> 89: } >>> 90: >>> 91: @CompilerControl(CompilerControl.Mode.DONT_INLINE) >> >> Err are you sure this works I think this should be `FORCE_INLINE` instead. I see you want to have different `SIZE`s, too. Then you can make a `MutableCallSite` for each parameter. The magic here is that the compiler will treat the call target as a constant and force a recompilation each time you call `setTarget` > > Hmm, somehow it looked like it worked... but it cannot with this. I'll have another look. I don't need different sizes though. Just the OFFSET is ok - those really need to be constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21521#discussion_r1848633084 From mli at openjdk.org Tue Nov 19 16:04:49 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 19 Nov 2024 16:04:49 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Tue, 19 Nov 2024 15:02:43 GMT, Ludovic Henry wrote: > Instead of removing, can we put it behind a flag that disabled by default? We clearly don't want to keep something that's slower for the current generation of hardware but we could expect that the next generation of hardware to go faster. Not quite sure, as the test result shows it's too slow, and we need to introduce another vm option. How's your opinion? @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/22102#issuecomment-2486115268 From epeter at openjdk.org Tue Nov 19 16:09:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:09:31 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: References: Message-ID: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix benchmark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21521/files - new: https://git.openjdk.org/jdk/pull/21521/files/7a8f365e..25c44757 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21521&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21521/head:pull/21521 PR: https://git.openjdk.org/jdk/pull/21521 From chagedorn at openjdk.org Tue Nov 19 16:38:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 16:38:05 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Looks reasonable. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21853#pullrequestreview-2446014859 From epeter at openjdk.org Tue Nov 19 16:46:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:46:08 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v2] In-Reply-To: <8_5vFJyJBm-hWnbvYrLbtA7Tdip7XNplNZ3d0wxt9YM=.255c6528-5cd9-4520-8508-13103b61cf63@github.com> References: <8_5vFJyJBm-hWnbvYrLbtA7Tdip7XNplNZ3d0wxt9YM=.255c6528-5cd9-4520-8508-13103b61cf63@github.com> Message-ID: On Tue, 19 Nov 2024 12:46:16 GMT, Quan Anh Mai wrote: >> @dean-long I tried that with `@param`, but then they are not constants... sadly. And they need to be constants. Let me know if you find some better way though ;) > > @eme64 FYI you can make param a constant using this pattern: > > static final MutableCallSite MUTABLE_CONSTANT = new MutableCallSite(MethodType.methodType(int.class)); > static final MethodHandle MUTABLE_CONSTANT_HANDLE = MUTABLE_CONSTANT.dynamicInvoker(); > > static { > MethodHandle init = MethodHandles.constant(int.class, 1); > MUTABLE_CONSTANT.setTarget(init); > } > > @Param({"1", "2"}) > int size; > > @Setup(Level.Iteration) > public void setup() throws Throwable { > if (size != (int) MUTABLE_CONSTANT_HANDLE.invokeExact()) { > MethodHandle constant = MethodHandles.constant(int.class, size); > MUTABLE_CONSTANT.setTarget(constant); > } > } > > @CompilerControl(CompilerControl.Mode.DONT_INLINE) > private int test() throws Throwable { > return (int) MUTABLE_CONSTANT_HANDLE.invokeExact(); > } > > @Benchmark > public void run() throws Throwable { > test(); > } @merykitty The benchmark is now fixed, the results look good. Thanks for the help! Benchmark (OFFSET) (SIZE) (seed) Mode Cnt Score Error Units VectorStoreToLoadForwarding.Default.bytes 0 10000 0 avgt 3 84.419 ? 2.318 ns/op VectorStoreToLoadForwarding.Default.bytes 1 10000 0 avgt 3 3929.075 ? 1010.049 ns/op VectorStoreToLoadForwarding.Default.bytes 2 10000 0 avgt 3 7330.787 ? 312.536 ns/op VectorStoreToLoadForwarding.Default.bytes 3 10000 0 avgt 3 4892.922 ? 670.867 ns/op VectorStoreToLoadForwarding.Default.bytes 4 10000 0 avgt 3 640.643 ? 13.942 ns/op VectorStoreToLoadForwarding.Default.bytes 5 10000 0 avgt 3 2507.479 ? 63.858 ns/op VectorStoreToLoadForwarding.Default.bytes 6 10000 0 avgt 3 2381.243 ? 993.419 ns/op VectorStoreToLoadForwarding.Default.bytes 7 10000 0 avgt 3 1977.162 ? 277.043 ns/op VectorStoreToLoadForwarding.Default.bytes 8 10000 0 avgt 3 399.015 ? 18.180 ns/op VectorStoreToLoadForwarding.Default.bytes 9 10000 0 avgt 3 1920.202 ? 135.682 ns/op VectorStoreToLoadForwarding.Default.bytes 10 10000 0 avgt 3 1874.257 ? 411.567 ns/op VectorStoreToLoadForwarding.Default.bytes 11 10000 0 avgt 3 1890.358 ? 1538.414 ns/op VectorStoreToLoadForwarding.Default.bytes 12 10000 0 avgt 3 1701.539 ? 2105.819 ns/op VectorStoreToLoadForwarding.Default.bytes 13 10000 0 avgt 3 1612.812 ? 58.573 ns/op VectorStoreToLoadForwarding.Default.bytes 14 10000 0 avgt 3 1442.488 ? 44.108 ns/op VectorStoreToLoadForwarding.Default.bytes 15 10000 0 avgt 3 1414.342 ? 57.398 ns/op VectorStoreToLoadForwarding.Default.bytes 16 10000 0 avgt 3 277.813 ? 11.511 ns/op VectorStoreToLoadForwarding.Default.bytes 17 10000 0 avgt 3 1385.329 ? 419.368 ns/op VectorStoreToLoadForwarding.Default.bytes 18 10000 0 avgt 3 1368.331 ? 49.108 ns/op VectorStoreToLoadForwarding.Default.bytes 19 10000 0 avgt 3 1366.278 ? 12.408 ns/op VectorStoreToLoadForwarding.Default.bytes 20 10000 0 avgt 3 1372.812 ? 51.706 ns/op VectorStoreToLoadForwarding.Default.bytes 21 10000 0 avgt 3 1398.275 ? 64.086 ns/op VectorStoreToLoadForwarding.Default.bytes 22 10000 0 avgt 3 1361.567 ? 47.301 ns/op VectorStoreToLoadForwarding.Default.bytes 23 10000 0 avgt 3 1521.131 ? 372.578 ns/op VectorStoreToLoadForwarding.Default.bytes 24 10000 0 avgt 3 1508.359 ? 656.543 ns/op VectorStoreToLoadForwarding.Default.bytes 25 10000 0 avgt 3 1488.101 ? 972.030 ns/op VectorStoreToLoadForwarding.Default.bytes 26 10000 0 avgt 3 1464.272 ? 314.889 ns/op VectorStoreToLoadForwarding.Default.bytes 27 10000 0 avgt 3 1557.113 ? 71.264 ns/op VectorStoreToLoadForwarding.Default.bytes 28 10000 0 avgt 3 1546.363 ? 115.719 ns/op VectorStoreToLoadForwarding.Default.bytes 29 10000 0 avgt 3 1564.489 ? 23.133 ns/op VectorStoreToLoadForwarding.Default.bytes 30 10000 0 avgt 3 1571.730 ? 123.272 ns/op VectorStoreToLoadForwarding.Default.bytes 31 10000 0 avgt 3 1595.116 ? 578.300 ns/op VectorStoreToLoadForwarding.Default.bytes 32 10000 0 avgt 3 246.158 ? 2.173 ns/op VectorStoreToLoadForwarding.Default.bytes 33 10000 0 avgt 3 1572.533 ? 188.633 ns/op VectorStoreToLoadForwarding.Default.bytes 34 10000 0 avgt 3 1586.926 ? 290.448 ns/op VectorStoreToLoadForwarding.Default.bytes 35 10000 0 avgt 3 1553.085 ? 132.149 ns/op VectorStoreToLoadForwarding.Default.bytes 36 10000 0 avgt 3 1559.736 ? 125.902 ns/op VectorStoreToLoadForwarding.Default.bytes 37 10000 0 avgt 3 1594.768 ? 832.743 ns/op VectorStoreToLoadForwarding.Default.bytes 38 10000 0 avgt 3 1509.641 ? 326.219 ns/op VectorStoreToLoadForwarding.Default.bytes 39 10000 0 avgt 3 1479.121 ? 164.986 ns/op VectorStoreToLoadForwarding.Default.bytes 40 10000 0 avgt 3 1425.943 ? 46.541 ns/op VectorStoreToLoadForwarding.Default.bytes 41 10000 0 avgt 3 1461.884 ? 453.731 ns/op VectorStoreToLoadForwarding.Default.bytes 42 10000 0 avgt 3 1437.846 ? 41.903 ns/op VectorStoreToLoadForwarding.Default.bytes 43 10000 0 avgt 3 1483.152 ? 303.466 ns/op VectorStoreToLoadForwarding.Default.bytes 44 10000 0 avgt 3 1447.585 ? 200.255 ns/op VectorStoreToLoadForwarding.Default.bytes 45 10000 0 avgt 3 1446.681 ? 21.455 ns/op VectorStoreToLoadForwarding.Default.bytes 46 10000 0 avgt 3 1475.594 ? 149.059 ns/op VectorStoreToLoadForwarding.Default.bytes 47 10000 0 avgt 3 1463.778 ? 380.469 ns/op VectorStoreToLoadForwarding.Default.bytes 48 10000 0 avgt 3 1469.632 ? 10.665 ns/op VectorStoreToLoadForwarding.Default.bytes 49 10000 0 avgt 3 1478.896 ? 14.962 ns/op VectorStoreToLoadForwarding.Default.bytes 50 10000 0 avgt 3 1500.971 ? 898.381 ns/op VectorStoreToLoadForwarding.Default.bytes 51 10000 0 avgt 3 1519.968 ? 815.377 ns/op VectorStoreToLoadForwarding.Default.bytes 52 10000 0 avgt 3 1522.245 ? 354.020 ns/op VectorStoreToLoadForwarding.Default.bytes 53 10000 0 avgt 3 1519.103 ? 21.608 ns/op VectorStoreToLoadForwarding.Default.bytes 54 10000 0 avgt 3 1506.415 ? 10.809 ns/op VectorStoreToLoadForwarding.Default.bytes 55 10000 0 avgt 3 1531.535 ? 453.600 ns/op VectorStoreToLoadForwarding.Default.bytes 56 10000 0 avgt 3 1517.761 ? 216.394 ns/op VectorStoreToLoadForwarding.Default.bytes 57 10000 0 avgt 3 1518.809 ? 76.599 ns/op VectorStoreToLoadForwarding.Default.bytes 58 10000 0 avgt 3 1534.455 ? 362.115 ns/op VectorStoreToLoadForwarding.Default.bytes 59 10000 0 avgt 3 1521.613 ? 49.548 ns/op VectorStoreToLoadForwarding.Default.bytes 60 10000 0 avgt 3 1531.424 ? 15.598 ns/op VectorStoreToLoadForwarding.Default.bytes 61 10000 0 avgt 3 1545.331 ? 72.731 ns/op VectorStoreToLoadForwarding.Default.bytes 62 10000 0 avgt 3 1544.233 ? 25.841 ns/op VectorStoreToLoadForwarding.Default.bytes 63 10000 0 avgt 3 1546.748 ? 88.799 ns/op VectorStoreToLoadForwarding.Default.bytes 64 10000 0 avgt 3 103.986 ? 18.561 ns/op VectorStoreToLoadForwarding.Default.bytes 65 10000 0 avgt 3 739.338 ? 70.809 ns/op VectorStoreToLoadForwarding.Default.bytes 66 10000 0 avgt 3 710.806 ? 2.989 ns/op VectorStoreToLoadForwarding.Default.bytes 67 10000 0 avgt 3 710.522 ? 1.608 ns/op VectorStoreToLoadForwarding.Default.bytes 68 10000 0 avgt 3 731.133 ? 13.279 ns/op VectorStoreToLoadForwarding.Default.bytes 69 10000 0 avgt 3 731.297 ? 28.622 ns/op VectorStoreToLoadForwarding.Default.bytes 70 10000 0 avgt 3 733.355 ? 22.273 ns/op VectorStoreToLoadForwarding.Default.bytes 71 10000 0 avgt 3 738.980 ? 154.371 ns/op VectorStoreToLoadForwarding.Default.bytes 72 10000 0 avgt 3 729.717 ? 3.865 ns/op VectorStoreToLoadForwarding.Default.bytes 73 10000 0 avgt 3 708.800 ? 5.073 ns/op VectorStoreToLoadForwarding.Default.bytes 74 10000 0 avgt 3 710.764 ? 11.598 ns/op VectorStoreToLoadForwarding.Default.bytes 75 10000 0 avgt 3 723.889 ? 3.341 ns/op VectorStoreToLoadForwarding.Default.bytes 76 10000 0 avgt 3 728.995 ? 154.982 ns/op VectorStoreToLoadForwarding.Default.bytes 77 10000 0 avgt 3 710.761 ? 48.943 ns/op VectorStoreToLoadForwarding.Default.bytes 78 10000 0 avgt 3 717.265 ? 132.054 ns/op VectorStoreToLoadForwarding.Default.bytes 79 10000 0 avgt 3 734.528 ? 269.623 ns/op VectorStoreToLoadForwarding.Default.bytes 80 10000 0 avgt 3 709.711 ? 42.097 ns/op VectorStoreToLoadForwarding.Default.bytes 81 10000 0 avgt 3 706.456 ? 3.155 ns/op VectorStoreToLoadForwarding.Default.bytes 82 10000 0 avgt 3 715.795 ? 69.245 ns/op VectorStoreToLoadForwarding.Default.bytes 83 10000 0 avgt 3 703.538 ? 2.055 ns/op VectorStoreToLoadForwarding.Default.bytes 84 10000 0 avgt 3 717.157 ? 23.538 ns/op VectorStoreToLoadForwarding.Default.bytes 85 10000 0 avgt 3 703.222 ? 12.425 ns/op VectorStoreToLoadForwarding.Default.bytes 86 10000 0 avgt 3 739.261 ? 136.588 ns/op VectorStoreToLoadForwarding.Default.bytes 87 10000 0 avgt 3 706.857 ? 1.111 ns/op VectorStoreToLoadForwarding.Default.bytes 88 10000 0 avgt 3 704.209 ? 3.364 ns/op VectorStoreToLoadForwarding.Default.bytes 89 10000 0 avgt 3 715.131 ? 105.452 ns/op VectorStoreToLoadForwarding.Default.bytes 90 10000 0 avgt 3 707.352 ? 8.930 ns/op VectorStoreToLoadForwarding.Default.bytes 91 10000 0 avgt 3 702.664 ? 0.936 ns/op VectorStoreToLoadForwarding.Default.bytes 92 10000 0 avgt 3 709.437 ? 101.651 ns/op VectorStoreToLoadForwarding.Default.bytes 93 10000 0 avgt 3 706.809 ? 4.217 ns/op VectorStoreToLoadForwarding.Default.bytes 94 10000 0 avgt 3 731.476 ? 10.675 ns/op VectorStoreToLoadForwarding.Default.bytes 95 10000 0 avgt 3 706.546 ? 2.279 ns/op VectorStoreToLoadForwarding.Default.bytes 96 10000 0 avgt 3 705.334 ? 38.863 ns/op VectorStoreToLoadForwarding.Default.bytes 97 10000 0 avgt 3 725.917 ? 52.017 ns/op VectorStoreToLoadForwarding.Default.bytes 98 10000 0 avgt 3 732.445 ? 187.455 ns/op VectorStoreToLoadForwarding.Default.bytes 99 10000 0 avgt 3 713.681 ? 156.426 ns/op VectorStoreToLoadForwarding.Default.bytes 100 10000 0 avgt 3 707.998 ? 162.045 ns/op VectorStoreToLoadForwarding.Default.bytes 101 10000 0 avgt 3 702.803 ? 0.578 ns/op VectorStoreToLoadForwarding.Default.bytes 102 10000 0 avgt 3 707.133 ? 3.472 ns/op VectorStoreToLoadForwarding.Default.bytes 103 10000 0 avgt 3 706.983 ? 12.320 ns/op VectorStoreToLoadForwarding.Default.bytes 104 10000 0 avgt 3 710.192 ? 119.045 ns/op VectorStoreToLoadForwarding.Default.bytes 105 10000 0 avgt 3 704.997 ? 59.079 ns/op VectorStoreToLoadForwarding.Default.bytes 106 10000 0 avgt 3 703.934 ? 4.299 ns/op VectorStoreToLoadForwarding.Default.bytes 107 10000 0 avgt 3 703.291 ? 7.547 ns/op VectorStoreToLoadForwarding.Default.bytes 108 10000 0 avgt 3 707.445 ? 9.157 ns/op VectorStoreToLoadForwarding.Default.bytes 109 10000 0 avgt 3 713.612 ? 158.228 ns/op VectorStoreToLoadForwarding.Default.bytes 110 10000 0 avgt 3 708.522 ? 172.037 ns/op VectorStoreToLoadForwarding.Default.bytes 111 10000 0 avgt 3 706.644 ? 8.504 ns/op VectorStoreToLoadForwarding.Default.bytes 112 10000 0 avgt 3 706.487 ? 2.299 ns/op VectorStoreToLoadForwarding.Default.bytes 113 10000 0 avgt 3 735.559 ? 4.990 ns/op VectorStoreToLoadForwarding.Default.bytes 114 10000 0 avgt 3 736.984 ? 49.354 ns/op VectorStoreToLoadForwarding.Default.bytes 115 10000 0 avgt 3 735.442 ? 0.737 ns/op VectorStoreToLoadForwarding.Default.bytes 116 10000 0 avgt 3 732.078 ? 29.918 ns/op VectorStoreToLoadForwarding.Default.bytes 117 10000 0 avgt 3 740.796 ? 176.607 ns/op VectorStoreToLoadForwarding.Default.bytes 118 10000 0 avgt 3 702.843 ? 2.376 ns/op VectorStoreToLoadForwarding.Default.bytes 119 10000 0 avgt 3 739.625 ? 66.250 ns/op VectorStoreToLoadForwarding.Default.bytes 120 10000 0 avgt 3 731.305 ? 7.676 ns/op VectorStoreToLoadForwarding.Default.bytes 121 10000 0 avgt 3 741.294 ? 163.888 ns/op VectorStoreToLoadForwarding.Default.bytes 122 10000 0 avgt 3 705.320 ? 46.094 ns/op VectorStoreToLoadForwarding.Default.bytes 123 10000 0 avgt 3 731.293 ? 8.593 ns/op VectorStoreToLoadForwarding.Default.bytes 124 10000 0 avgt 3 731.404 ? 9.028 ns/op VectorStoreToLoadForwarding.Default.bytes 125 10000 0 avgt 3 735.768 ? 107.554 ns/op VectorStoreToLoadForwarding.Default.bytes 126 10000 0 avgt 3 729.665 ? 10.149 ns/op VectorStoreToLoadForwarding.Default.bytes 127 10000 0 avgt 3 729.545 ? 7.208 ns/op VectorStoreToLoadForwarding.Default.bytes 128 10000 0 avgt 3 167.031 ? 14.520 ns/op VectorStoreToLoadForwarding.Default.bytes 129 10000 0 avgt 3 361.999 ? 1.724 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2486224321 From epeter at openjdk.org Tue Nov 19 16:46:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:46:08 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v3] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 14:51:47 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use constant method handle to make the benchmark smaller > > Nice blog post, benchmarks, summary and explanation of the problem! The new heuristic looks reasonable and safer/simpler to use for JDK 24. Would be interesting to see if you could come up with a throughput and latency based heuristic/cost model at some point in the future. > > I have some comments, mostly minor things. @chhagedorn I think I addressed all your comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2486225491 From epeter at openjdk.org Tue Nov 19 16:48:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:48:49 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Ok, thanks for the explanation. Sounds reasonable. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21853#pullrequestreview-2446046463 From chagedorn at openjdk.org Tue Nov 19 16:56:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 16:56:58 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: References: Message-ID: <5u5EVs-ykQcF9eNZiNdDIEFOW7IYMKLQLgcb66o7PiI=.c9fb768e-a54a-4004-81ef-1f561005b18c@github.com> On Tue, 19 Nov 2024 16:09:31 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix benchmark Thanks for the updates and the improved comments. That looks good to me. Nice compression of the initial large benchmark :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21521#pullrequestreview-2446063525 From dhanalla at openjdk.org Tue Nov 19 17:43:39 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Tue, 19 Nov 2024 17:43:39 GMT Subject: RFR: 8341293: Split field loads through Nested Phis Message-ID: As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. Here are the sequence of Ideal graph transformations for Nested phi: ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) ------------- Commit messages: - Fix trailing whitespaces - Split load fields through nestead phi nodes Changes: https://git.openjdk.org/jdk/pull/21270/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341293 Stats: 1687 lines in 7 files changed: 1598 ins; 38 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/21270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21270/head:pull/21270 PR: https://git.openjdk.org/jdk/pull/21270 From acobbs at openjdk.org Tue Nov 19 17:47:05 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Tue, 19 Nov 2024 17:47:05 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2486358645 From acobbs at openjdk.org Tue Nov 19 17:47:05 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Tue, 19 Nov 2024 17:47:05 GMT Subject: Integrated: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) In-Reply-To: References: Message-ID: On Sat, 2 Nov 2024 15:51:21 GMT, Archie Cobbs wrote: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. This pull request has now been integrated. Changeset: 087a07b5 Author: Archie Cobbs URL: https://git.openjdk.org/jdk/commit/087a07b5ededc6381d3d12cad045d3522434709e Stats: 8 lines in 3 files changed: 0 ins; 6 del; 2 mod 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21853 From mgronlun at openjdk.org Tue Nov 19 18:55:48 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 19 Nov 2024 18:55:48 GMT Subject: RFR: 8318098: Update jfr tests with corresponding requires flags [v2] In-Reply-To: <3lP5QAVmGNrN8J-A-1BuhbRSf_xjEDfVETuEMug0eUs=.2b8103e2-8bd2-47d8-ac41-f05e217c4be8@github.com> References: <8s_n2j62lexzKj2qM6w7JMMAngbXqstCW-ta5iGBgEU=.013471ef-46ce-40de-b4fb-312f333a138c@github.com> <3lP5QAVmGNrN8J-A-1BuhbRSf_xjEDfVETuEMug0eUs=.2b8103e2-8bd2-47d8-ac41-f05e217c4be8@github.com> Message-ID: On Tue, 19 Nov 2024 18:11:39 GMT, Leonid Mesnik wrote: > The jfr keyword has been used by internal Oracle system only because there were now way to control VM flags by jtreg 10 years ago. This is why I haven't mentioned it in the PR. > > There is a requires tag has been introduced to mark the tests that don't accept certain or any vm flags. It is used in many areas to mark flags sensitive tests so they are not executed for some or any combinations. > > So anyone who executes openjdk jtreg tests, either in Adhoc/locally or using their CI expects that tests are correctly configured and incompatible options are not selected.The only exception is JFR tests . The 'jfr' keyword is not documented and there is no formal and easy way saying to run all SVC tests with ZGC. > > So using requires tag just makes jfr test consistent with all other JDK tests. > > Most of JFR tests don't set very specific combinations and might be executed with all VM flags. So need to mark on incompatible combination. There is no requirement to spent a lot of time trying to improve test for all Compiler/GC combinations, just mark tests that are too specific. And as I mention in the description there are real Hotspot issues that were found by running jfr tests with different options. > > Saying that I believe it makes sense to change how we run JFR tests. So we have the same way for ALL JDK tests. Why not put vm.flagless on everything then? Else we find ourselves in situations like the following: TestPromotionEventWithG1.java @requires vm.compMode != "Xcomp" Why is vm.compMode now all of a sudden a requirement for this test? How was it determined that Xcomp should be excluded for this test that tests a PromotionEvent for G1? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22249#issuecomment-2486493883 From lmesnik at openjdk.org Tue Nov 19 19:11:46 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 19 Nov 2024 19:11:46 GMT Subject: RFR: 8318098: Update jfr tests with corresponding requires flags [v2] In-Reply-To: References: <8s_n2j62lexzKj2qM6w7JMMAngbXqstCW-ta5iGBgEU=.013471ef-46ce-40de-b4fb-312f333a138c@github.com> <3lP5QAVmGNrN8J-A-1BuhbRSf_xjEDfVETuEMug0eUs=.2b8103e2-8bd2-47d8-ac41-f05e217c4be8@github.com> Message-ID: <6GjLaoTx7O7dfjJfZmTp2AAzCnBSXRO2Ge9N65GPP2Q=.c51cf3f6-d64b-438f-9c21-6cd77b922143@github.com> On Tue, 19 Nov 2024 18:51:46 GMT, Markus Gr?nlund wrote: > Why not put vm.flagless on everything then? > Although it makes the test compliant, it significantly reduce coverage. I provide examples of the issues that were already found. The issue https://bugs.openjdk.org/browse/JDK-8344199 Incorrect excluded field value set by getEventWriter intrinsic is observed only when we run tests with Xcomp. Otherwise neither of JFR code is compiled and the C2-path is not executed. Unfortunately, there are no tests for this and the easiest way to test is just to run all existing tests with forced C2. > Else we find ourselves in situations like the following: > > TestPromotionEventWithG1.java @requires vm.compMode != "Xcomp" > > Why is vm.compMode now all of a sudden a requirement for this test? How was it determined that Xcomp should be excluded for this test that tests a PromotionEvent for G1? The requirement for JDK tests is to reject incompatible options used during testing. I excluded the test TestPromotionEventWithG1 because it failing with Xcomp. Looking on the test that configure * @run main/othervm -Xmx32m -Xms32m -Xmn12m -XX:+UseG1GC -XX:-UseStringDeduplication -XX:MaxTenuringThreshold=5 -XX:InitialTenuringThreshold=5 jdk.jfr.event.gc.detailed.TestPromotionEventWithG1 * @run main/othervm -Xmx32m -Xms32m -Xmn12m -XX:AllocatePrefetchLines=1 -XX:AllocateInstancePrefetchLines=1 -XX:AllocatePrefetchStepSize=16 -XX:AllocatePrefetchDistance=1 -XX:+UseG1GC * -XX:-UseStringDeduplication -Xlog:os+cpu=info -XX:MaxTenuringThreshold=5 -XX:InitialTenuringThreshold=5 -XX:MinTLABSize=768 -XX:TLABSize=768 jdk.jfr.event.gc.detailed.TestPromotionEventWithG1 it is clear that test might be too specific to use any additional VM flags. Such tests are indeed good candidate to be marked as a flagless from the beginning. You might just use 'vm.flagless' for unit tests, however for tests that don't need very specific setup it is better to allow to run different combinations. It helps to better test how JFR is not broken in different modes. No need to test all those thing before push, unless you have a reasons to believe that test is specific to some modes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22249#issuecomment-2486530617 From jbhateja at openjdk.org Tue Nov 19 19:57:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:09 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations Message-ID: Hi All, This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) Following is the summary of changes included with this patch:- 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF 6. Auto-vectorization of newly supported scalar operations. 7. X86 and AARCH64 backend implementation for all supported intrinsics. 9. Functional and Performance validation tests. **Missing Pieces:-** **- AARCH64 Backend.** Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Code styling changes - Review comments resoultion. - Jcheck and build fixes - New halffloat type 'TypeH' and associated changes - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - Jcheck cleanup - Review comments and tests cleanup. - Annotating Float16 as a ValueBased class - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - ... and 6 more: https://git.openjdk.org/jdk/compare/2c509a15...132878ba Changes: https://git.openjdk.org/jdk/pull/21490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342103 Stats: 3055 lines in 58 files changed: 2974 ins; 0 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From bkilambi at openjdk.org Tue Nov 19 19:57:13 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:13 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Can we add the JMH micro benchmark that you added recently for FP16 as well ? or has it intentionally not been included? Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? src/hotspot/share/opto/addnode.hpp line 445: > 443: MinHFNode(Node* in1, Node* in2) : MaxNode(in1, in2) {} > 444: virtual int Opcode() const; > 445: virtual const Type *add_ring(const Type*, const Type*) const; `Type* ` ? to align with the style used in the constructor. src/hotspot/share/opto/divnode.cpp line 752: > 750: //============================================================================= > 751: //------------------------------Value------------------------------------------ > 752: // An DivFNode divides its inputs. The third input is a Control input, used to DivHFNode? src/hotspot/share/opto/divnode.cpp line 775: > 773: } > 774: > 775: if( t2 == TypeH::ONE ) should if condition be styled as - `if ()` ? or is this to align with already existing float routines? src/hotspot/share/opto/mulnode.cpp line 558: > 556: } > 557: > 558: // Compute the product type of two double ranges into this node. of two *half-float* ranges? src/hotspot/share/opto/node.cpp line 1600: > 1598: > 1599: // Get a half float constant from a ConstNode. > 1600: // Returns the constant if it is a float ConstNode half float ConstNode? src/hotspot/share/opto/type.hpp line 530: > 528: }; > 529: > 530: // Class of Float-Constant Types. Class of Half-float constant Types? test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 122: > 120: public static final String VECTOR_SIZE_64 = VECTOR_SIZE + "64"; > 121: > 122: private static final String TYPE_BYTE = "byte"; Hi Jatin, why have these changes been made? The PrintIdeal output still prints the vector size of the node in this format - `#vectord`. This test - `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java` was failing due to this mismatch .. test/jdk/java/lang/Float/FP16ReductionOperations.java line 25: > 23: > 24: /* > 25: * @test Hi Jatin, is there any reason why these have been kept under the `Float` folder and not a separate `Float16` folder? test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java line 334: > 332: > 333: @Test(dataProvider = "ternaryOpProvider") > 334: public static void minTest(Object input1, Object input2, Object input3) { `fmaTest` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411381410 PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411607884 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848152453 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848128281 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848135401 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848112186 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848195342 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847971311 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1803209988 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1802767337 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848388981 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 16:42:24 GMT, Paul Sandoz wrote: > We should move the `Float16` class to `jdk.incubator.vector` and relevant intrinsic stuff to `jdk.internal.vm.vector`, and we don't need the changes to `BigDecimal` and `BigInteger`. To expand on that point, a few weeks back I took a look at what porting Float16 from java.lang in the lworld+fp16 branch of Valhalla to the jdk.incubator.vector package in JDK 24 would look like: the result were favorable and the diffs are attached to JDK-8341260. Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420616927 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin We should move the `Float16` class to `jdk.incubator.vector` and relevant intrinsic stuff to `jdk.internal.vm.vector`, and we don't need the changes to `BigDecimal` and `BigInteger`. make/modules/jdk.incubator.vector/Java.gmk line 30: > 28: DOCLINT += -Xdoclint:all/protected > 29: > 30: JAVAC_FLAGS += --add-exports=java.base/jdk.internal=jdk.incubator.vector Please remove this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411758902 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1845208651 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:33:03 GMT, Joe Darcy wrote: > > Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. Good point, we should separate the Java changes from the intrinsic + HotSpot changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420632074 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:35:40 GMT, Paul Sandoz wrote: > > Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. > > Good point, we should separate the Java changes from the intrinsic + HotSpot changes. PS Along those lines, see https://github.com/openjdk/jdk/pull/21574 for a non-intrinsified port of Float16 to the vector API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420926866 From jrose at openjdk.org Tue Nov 19 19:57:14 2024 From: jrose at openjdk.org (John R Rose) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin As I noted on Joe's PR, I like the fact that the intrinsics are decoupled from the box class. I'm now wondering if there is another simplification possible (as I claimed to Joe!) which is to reduce the number of intrinsics, ideally down to conversions (to and from HF). For example, `sqrt_float16` is an intrinsic, but I think it could be just an invisible IR node. After inlining the Java definition, you start with an IR graph that mentions `sqrtD` and is surrounded by conversion nodes. Then you refactor the IR graph to use `sqrt_float16` directly, presumably with fewer conversions (and/or reinterprets). Same argument for max, min, add, mul, etc. I'm not saying the current PR is wrong, but I would like to know if it could be simplified, either now or later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2424373685 From jbhateja at openjdk.org Tue Nov 19 19:57:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Extending on John's thoughts. ![image](https://github.com/user-attachments/assets/c795e79f-a857-4991-9b8a-c36d8525ba73) ![image](https://github.com/user-attachments/assets/264eeeea-86a0-43ed-a365-88b91e85d9cc) There are two possibilities of a pattern match here, one rooted at node **A** and other at **B** With pattern match rooted at **A**, we will need to inject additional ConvHF2F after replacing AddF with AddHF to preserve the type semantics of IR graph, [significand bit preservation constraints](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1103) for NaN value imposed by Float.float16ToFloat API makes the idealization toward the end infeasible, thereby reducing the operating vector size for FP16 operation to half of what can be possible, as depicted by following Ideal graph fragment. ![image](https://github.com/user-attachments/assets/0094e613-2c11-40db-b2bb-84ddf6b251f2) Thus only feasible match is the one rooted at node **B** ![image](https://github.com/user-attachments/assets/22576617-9533-40e2-94f0-dd6048e295dd) Please consider Java side implimentation of Float16.sqrt Float16 sqrt(Float16 radicand) { return valueOf(Math.sqrt(radicand.doubleValue())); } Here, radicand is first upcasted to doubelValue, following 2P+2 rule of IEEE 754, square root computed at double precision is not subjected to double rounding penalty when final results is down casted to Float16 value. Following is the C2 IR for above Java implementation. T0 = Param0 (TypeInt::SHORT) T1 = CastHF2F T0 T2 = CastF2D T1 T3 = SqrtD T2 T4 = ConvD2F T3 T5 = CastF2HF T4 To replace SqrtD with SqrtHF, we need following IR modifications. T0 = Param0 (TypeInt::SHORT) // Replacing IR T1-T3 in original fragment with following IR T1-T6. T1 = ReinterpretS2HF T0 T3 = SqrtHF T1 T4 = ReinterpretHF2S T3 T5 = ConvHF2F T4 T6 = ConvF2D T5 T7 = ConvD2F T6 T5 = CastF2HF T4 Simplified IR after applying Identity rules , T0 = Param0 (TypeInt::SHORT) // Replacing IR T1-T3 in original fragment with following IR T1-T6. T1 = ReinterpretS2HF T0 T3 = SqrtHF T1 T4 = ReinterpretHF2S T3 While above transformation are valid replacements for current intrinsic approach which uses explicit entry points in newly defined Float16Math helper class, they deviate from implementation of several j.l intrinsified methods which could be replaced by pattern matches e.g. https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2022 https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2116 I think we need to carefully pick pattern match over intrinsification if former handles more general cases. If our intention is to capture various Float16 operation patterns in user's code which does not directly uses Float16 API then pattern matching looks appealing, but APIs like SQRT and FMA are very carefully drafted keeping in view rounding impact, and such patterns will be hard to find, thus it should be ok to take intrinsic route for them, simpler cases like add / sub / mul /div / max / min can be handled through a pattern matching approach. There are also some issues around VM symbol creations for intrinsic entries defined in non-java.base modules which did not surface with then Float16 and Float16Math were part of java.base module. For this PR taking hybrid approach comprising of both pattern match and intensification looks reasonable to me. Please let me know if you have any comments. Some FAQs on the newly added ideal type for half-float IR nodes:- Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type? A. Newly defined half float type named TypeH is special as its basictype is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register. Q. Problem with ConF? A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width. All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2425873278 PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2482867818 From jbhateja at openjdk.org Tue Nov 19 19:57:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 15:32:41 GMT, Bhavana Kilambi wrote: > Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? Hi @Bhavana-Kilambi , I am in process of refining existing patch, tests and benchmark, will update the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2436821263 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin FYI, https://github.com/openjdk/jdk/pull/21574 has been pushed, adding Float16 to the incubating vector package. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2475035058 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 04:46:52 GMT, Jatin Bhateja wrote: >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > > Hi @Bhavana-Kilambi , > I am in process of refining existing patch, tests and benchmark, will update the PR. @jatin-bhateja i commented directly on code in the commit entitled "Annotating Float16 as a ValueBased class" but i don't see it. This is not the right way to it, see my [comment](https://github.com/openjdk/jdk/pull/21574#discussion_r1841020576) related to this on Joe's FLoat16 PR. We should address it as a separate PR for ease of review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2476891427 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/assembler_x86.cpp line 3481: > 3479: void Assembler::vmovw(XMMRegister dst, Register src) { > 3480: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3483: > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 3482: attributes.set_is_evex_instruction(); > 3483: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when src is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 3489: > 3487: void Assembler::vmovw(Register dst, XMMRegister src) { > 3488: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3491: > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 3490: attributes.set_is_evex_instruction(); > 3491: int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when dst is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 8464: > 8462: void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8463: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8464: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 8483: > 8481: void Assembler::evsubph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8482: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8483: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8502: > 8500: void Assembler::evmulph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8501: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8502: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8521: > 8519: void Assembler::evminph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8520: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8521: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8540: > 8538: void Assembler::evmaxph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8539: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8540: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8559: > 8557: void Assembler::evdivph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8558: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8559: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8576: > 8574: } > 8575: > 8576: void Assembler::evsqrtph(XMMRegister dst, XMMRegister src1, int vector_len) { A nitpick src1 could be src :). src/hotspot/cpu/x86/assembler_x86.cpp line 8614: > 8612: } > 8613: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vaddsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8616: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8615: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8616: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8622: > 8620: } > 8621: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vsubsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8624: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8623: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8624: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8630: > 8628: } > 8629: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vdivsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8632: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8631: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8632: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8638: > 8636: } > 8637: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vmulsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8640: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8639: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8640: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8646: > 8644: } > 8645: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vmaxsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8648: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8647: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8648: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8654: > 8652: } > 8653: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vminsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8656: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8655: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8656: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8662: > 8660: } > 8661: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { This should be vsqrtsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8664: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { > 8663: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8664: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974: > 3972: generate_libm_stubs(); > 3973: > 3974: StubRoutines::_fmod = generate_libmFmod(); // from stubGenerator_x86_64_fmod.cpp Good to retain the is_intrinsic_available checks. src/hotspot/cpu/x86/x86.ad line 4518: > 4516: #ifdef _LP64 > 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{ > 4518: predicate(VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT); I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node. src/hotspot/cpu/x86/x86.ad line 10895: > 10893: format %{ "esqrtsh $dst, $src" %} > 10894: ins_encode %{ > 10895: int opcode = this->ideal_Opcode(); opcode is unused. src/hotspot/cpu/x86/x86.ad line 10936: > 10934: ins_encode %{ > 10935: int vlen_enc = vector_length_encoding(this); > 10936: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10949: > 10947: ins_encode %{ > 10948: int vlen_enc = vector_length_encoding(this); > 10949: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10964: > 10962: match(Set dst (SubVHF src1 src2)); > 10963: format %{ "evbinopfp16_reg $dst, $src1, $src2" %} > 10964: ins_cost(450); Why ins_cost 450 here for reg version and 150 for mem version of binOps? Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent. src/hotspot/cpu/x86/x86.ad line 11012: > 11010: effect(DEF dst); > 11011: format %{ "evfmaph_reg $dst, $src1, $src2\t# $dst = $dst * $src1 + $src2 fma packedH" %} > 11012: ins_cost(450); Good to be consistent with ins_cost for reg vs mem version. src/hotspot/cpu/x86/x86.ad line 11015: > 11013: ins_encode %{ > 11014: int vlen_enc = vector_length_encoding(this); > 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. match(Set dst (FmaVHF dst (Binary src1 src2))); which then leads to fmadd231. src/hotspot/share/adlc/output_h.cpp line 1298: > 1296: case Form::idealD: type = "Type::DOUBLE"; break; > 1297: case Form::idealL: type = "TypeLong::LONG"; break; > 1298: case Form::idealH: type = "Type::HALF_LONG"; break; This should be Type::HALF_FLOAT src/hotspot/share/classfile/vmSymbols.hpp line 143: > 141: template(java_util_DualPivotQuicksort, "java/util/DualPivotQuicksort") \ > 142: template(jdk_internal_misc_Signal, "jdk/internal/misc/Signal") \ > 143: template(jdk_internal_math_Float16Math, "jdk/internal/math/Float16Math") \ This seems to be leftover template. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870304 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843899813 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870852 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843902337 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843871328 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843906656 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843908957 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843910609 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843912897 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843914392 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843916999 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922125 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922490 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843923239 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843924299 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925126 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925319 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926551 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926789 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928252 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928447 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929519 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929686 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843930969 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843931641 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847403451 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847400518 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844234786 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844237825 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844238487 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844244532 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847443990 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847448109 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847470619 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847475384 From bkilambi at openjdk.org Tue Nov 19 19:57:14 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 04:46:52 GMT, Jatin Bhateja wrote: >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > > Hi @Bhavana-Kilambi , > I am in process of refining existing patch, tests and benchmark, will update the PR. Hi @jatin-bhateja , could you also please merge my patch which adds aarch64 backend for these operations here - https://github.com/jatin-bhateja/jdk/pull/6 If you feel there needs to be any changes made before you'd like to merge it, please do let me know and I'll do it. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2476695747 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 08:43:06 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 11015: >> >>> 11013: ins_encode %{ >>> 11014: int vlen_enc = vector_length_encoding(this); >>> 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); >> >> Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. >> match(Set dst (FmaVHF dst (Binary src1 src2))); >> which then leads to fmadd231. > > ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. You could save some reg/reg movs with 231 flavor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848290834 From jbhateja at openjdk.org Tue Nov 19 19:57:19 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 23:11:20 GMT, Sandhya Viswanathan wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974: > >> 3972: generate_libm_stubs(); >> 3973: >> 3974: StubRoutines::_fmod = generate_libmFmod(); // from stubGenerator_x86_64_fmod.cpp > > Good to retain the is_intrinsic_available checks. I reinstantiated it, it was an artifact of my commit. > src/hotspot/cpu/x86/x86.ad line 4518: > >> 4516: #ifdef _LP64 >> 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{ >> 4518: predicate(VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT); > > I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node. We only create Half Float constants (ConH) if the target supports FP16 ISA. These constants are generated by Value transforms associated with FP16-specific IR, whose creation is guarded by target-specific match rule supported checks. > src/hotspot/cpu/x86/x86.ad line 10964: > >> 10962: match(Set dst (SubVHF src1 src2)); >> 10963: format %{ "evbinopfp16_reg $dst, $src1, $src2" %} >> 10964: ins_cost(450); > > Why ins_cost 450 here for reg version and 150 for mem version of binOps? Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent. Cost does not play much role here, removed it for consistency, matching algorithm is a BURS style two pass algorithm, binary state tree construction is done during a bottom-up walk of expressions, each state captures the cost associated with different reductions, actual selection is done through top down walk of the state tree, it is during this stage we pick the reduction with minimum cost from the set of reductions generating same kinds of result operand, once selected, matcher then follows the low-cost path of the state tree, associating cost guide the selector in choosing from the set of active reducitions. in general it's advisable to assign lower cost to memory variant patterns on CISC targets since this way we can save emitting explicit load. > src/hotspot/cpu/x86/x86.ad line 11015: > >> 11013: ins_encode %{ >> 11014: int vlen_enc = vector_length_encoding(this); >> 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); > > Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. > match(Set dst (FmaVHF dst (Binary src1 src2))); > which then leads to fmadd231. ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906271 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906153 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847907028 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906530 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <3vPdEXbVVSjvDf_JAaLRwBTsYCBuD631lPgFz6pIkV4=.65022b33-9275-41ba-83e0-64df0b07f31b@github.com> On Tue, 19 Nov 2024 00:29:42 GMT, Sandhya Viswanathan wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/share/classfile/vmSymbols.hpp line 143: > >> 141: template(java_util_DualPivotQuicksort, "java/util/DualPivotQuicksort") \ >> 142: template(jdk_internal_misc_Signal, "jdk/internal/misc/Signal") \ >> 143: template(jdk_internal_math_Float16Math, "jdk/internal/math/Float16Math") \ > > This seems to be leftover template. I don't see use of this one, you have another one with jdk_internal_vm_vector_Float16Math which is being used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848295979 From bkilambi at openjdk.org Tue Nov 19 19:57:19 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 14:19:40 GMT, Bhavana Kilambi wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 122: > >> 120: public static final String VECTOR_SIZE_64 = VECTOR_SIZE + "64"; >> 121: >> 122: private static final String TYPE_BYTE = "byte"; > > Hi Jatin, why have these changes been made? The PrintIdeal output still prints the vector size of the node in this format - `#vectord`. This test - `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java` was failing due to this mismatch .. Infact many tests under test/hotspot fail due to this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1804557172 From jbhateja at openjdk.org Tue Nov 19 20:39:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 20:39:00 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 12:41:46 GMT, Sandhya Viswanathan wrote: >> ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. > > You could save some reg/reg movs with 231 flavor. It will depend on the live ranges of the three inputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1849024128 From sviswanathan at openjdk.org Tue Nov 19 22:16:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 22:16:23 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <3LxHrzC6cOgSadtyQP7g_3yHYNUev0PUXNTege2qdDI=.3e661cb5-b54d-462a-9482-d6c06394ad9b@github.com> On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2446759294 From sviswanathan at openjdk.org Tue Nov 19 23:25:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 23:25:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <7pJQGLP9E-cCKTxiOJTIxdbGaUjRtbNWYOb-NlymDfI=.fed0c520-4406-4ca0-90a1-3cdd9565aa7d@github.com> On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin x86 changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21490#pullrequestreview-2446932147 From fyang at openjdk.org Wed Nov 20 00:52:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Nov 2024 00:52:16 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Tue, 19 Nov 2024 15:59:59 GMT, Hamlin Li wrote: > > Instead of removing, can we put it behind a flag that disabled by default? We clearly don't want to keep something that's slower for the current generation of hardware but we could expect that the next generation of hardware to go faster. > > Not quite sure, as the test result shows it's too slow, and we need to introduce another vm option. How's your opinion? @RealFYang I would suggest we revert them for maintainability. As I remembered, this code was once added when we don't have any RVV hardware to verify the performance benefit. So we just let it sit there hoping it will be good on real hardwares. Now both RVV 128 and 256 hardwares are available and it doesn't do anything good for us on these hardwares, I don't think we would want to have it if we knew this initially. We can still adds it back if we witness performance improvement on future hardwares. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22102#issuecomment-2487070994 From syan at openjdk.org Wed Nov 20 01:39:28 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 01:39:28 GMT Subject: RFR: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: <21Gvfbexo7r4Ab3rrbASFBi26lC95sDjSkG9cX3LCJA=.0ef9eaef-4ef2-41b4-bb7e-9b3b4346fbec@github.com> On Tue, 19 Nov 2024 12:33:56 GMT, SendaoYan wrote: >> Hi all, >> Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build >> - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > use subs instead of cmp GHA report a test failure 1. Test `java/lang/Thread/virtual/stress/GetStackTraceALotWhenBlocking.java#id0` timeouted, which has been recorded by [JDK-8344577](https://bugs.openjdk.org/browse/JDK-8344577), it's unreleated to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22181#issuecomment-2487132520 From syan at openjdk.org Wed Nov 20 01:39:29 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 01:39:29 GMT Subject: Integrated: 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: On Sun, 17 Nov 2024 09:06:37 GMT, SendaoYan wrote: > Hi all, > Currently on aarch64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > - [x] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 fastdebug build This pull request has now been integrated. Changeset: 4ddd3dec Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/4ddd3dec2d0b232d48646ca89b16591b3026aa5c Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8344356: Aarch64: implement -XX:+VerifyActivationFrameSize Reviewed-by: aph ------------- PR: https://git.openjdk.org/jdk/pull/22181 From syan at openjdk.org Wed Nov 20 03:10:23 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 03:10:23 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize Message-ID: Hi all, Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. Additional testing - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build ------------- Commit messages: - Use sub instead of add - 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize Changes: https://git.openjdk.org/jdk/pull/22264/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344526 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22264/head:pull/22264 PR: https://git.openjdk.org/jdk/pull/22264 From fyang at openjdk.org Wed Nov 20 03:10:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Nov 2024 03:10:24 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: <5ewxZaYdVGAoEgCMjvrvz29UX-6sw1T7Dgq3WFlv8Yg=.df6b3e3f-9d20-47c8-b9c9-7a10972443c0@github.com> On Wed, 20 Nov 2024 02:43:42 GMT, SendaoYan wrote: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 447: > 445: sub(t1, fp, esp); > 446: int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset) * wordSize; > 447: add(t1, t1, -min_frame_size); Suggestion: `sub(t1, t1, min_frame_size);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1849429091 From syan at openjdk.org Wed Nov 20 03:10:24 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 03:10:24 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize In-Reply-To: <5ewxZaYdVGAoEgCMjvrvz29UX-6sw1T7Dgq3WFlv8Yg=.df6b3e3f-9d20-47c8-b9c9-7a10972443c0@github.com> References: <5ewxZaYdVGAoEgCMjvrvz29UX-6sw1T7Dgq3WFlv8Yg=.df6b3e3f-9d20-47c8-b9c9-7a10972443c0@github.com> Message-ID: On Wed, 20 Nov 2024 02:52:40 GMT, Fei Yang wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-aarch64 release build > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 447: > >> 445: sub(t1, fp, esp); >> 446: int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset) * wordSize; >> 447: add(t1, t1, -min_frame_size); > > Suggestion: `sub(t1, t1, min_frame_size);` Thanks, the `add` has been replaced as `sub` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1849438199 From fyang at openjdk.org Wed Nov 20 04:15:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Nov 2024 04:15:15 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 02:43:42 GMT, SendaoYan wrote: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-riscv64 release build src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 445: > 443: if (VerifyActivationFrameSize) { > 444: Label L; > 445: sub(t1, fp, esp); Note that RISC-V is a bit different in frame structure from other CPU platforms. I think we should exlcude the 2 frame metadata words after this `sub`, like: sub(t1, t1, frame::metadata_words * wordSize); // Exclude 2 frame metadata words ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1849476179 From syan at openjdk.org Wed Nov 20 05:00:29 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 05:00:29 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: References: Message-ID: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] jtreg tests(include tier1/2/3 etc.) on linux-riscv64 release build > - [ ] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize SendaoYan has updated the pull request incrementally with one additional commit since the last revision: Exclude 2 frame metadata words before compare to min_frame_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22264/files - new: https://git.openjdk.org/jdk/pull/22264/files/4f73e9eb..15856d8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22264/head:pull/22264 PR: https://git.openjdk.org/jdk/pull/22264 From syan at openjdk.org Wed Nov 20 05:00:29 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 05:00:29 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v2] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 04:57:39 GMT, SendaoYan wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [ ] jtreg tests(include tier1/2/3 etc.) on linux-riscv64 release build >> - [ ] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > Exclude 2 frame metadata words before compare to min_frame_size Thanks, I found that `sp -esp` equals 96, larger than `min_frame_size` 16. I think the extra 16 bytes is the `2 frame metadata words`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1849530410 From syan at openjdk.org Wed Nov 20 06:48:58 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 06:48:58 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [ ] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option SendaoYan has updated the pull request incrementally with one additional commit since the last revision: make exclude 2 frame metadata after "fp -esp" ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22264/files - new: https://git.openjdk.org/jdk/pull/22264/files/15856d8b..bc90d67d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22264/head:pull/22264 PR: https://git.openjdk.org/jdk/pull/22264 From fyang at openjdk.org Wed Nov 20 07:12:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Nov 2024 07:12:15 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: <4P50kpWvYsJQVWmp_aVN2-ABJgeJuockx-nR0zFp2w8=.3c5f97bf-8bc2-42b8-a898-d4b110aa4e3a@github.com> On Wed, 20 Nov 2024 06:48:58 GMT, SendaoYan wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > make exclude 2 frame metadata after "fp -esp" Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22264#pullrequestreview-2447615079 From qamai at openjdk.org Wed Nov 20 07:42:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 20 Nov 2024 07:42:18 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: References: Message-ID: <-Xzg-TpLL-k7BvzBkJfaDEsksm0RhZcJRyJPFmP2XeQ=.ef2a5e2e-43d7-4935-ab73-151bccb91dcc@github.com> On Tue, 19 Nov 2024 16:09:31 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix benchmark I have not looked too deeply but may I ask why the result of `offset == 12` is worse than that of `offset == 4`, similarly for other multiples such as `24`, `40` and `8`? I would assume the former should not be worse than the latter since a vectorization for the latter would also work for the former. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2487741312 From epeter at openjdk.org Wed Nov 20 08:08:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Nov 2024 08:08:17 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: <-Xzg-TpLL-k7BvzBkJfaDEsksm0RhZcJRyJPFmP2XeQ=.ef2a5e2e-43d7-4935-ab73-151bccb91dcc@github.com> References: <-Xzg-TpLL-k7BvzBkJfaDEsksm0RhZcJRyJPFmP2XeQ=.ef2a5e2e-43d7-4935-ab73-151bccb91dcc@github.com> Message-ID: On Wed, 20 Nov 2024 07:39:20 GMT, Quan Anh Mai wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix benchmark > > I have not looked too deeply but may I ask why the result of `offset == 12` is worse than that of `offset == 4`, similarly for other multiples such as `24`, `40` and `8`? I would assume the former should not be worse than the latter since a vectorization for the latter would also work for the former. @merykitty > I have not looked too deeply but may I ask why the result of offset == 12 is worse than that of offset == 4 `offset=4` -> take 4-element vectors. Forwarding works perfectly. `offset=12` -> round down to next power-of-2 -> take 8-element vectors -> forwarding failures. I suppose there could be an alternative solution here: If we detect a forwarding failure, then split the vectors in half, and try again. That would mean we'd try 8-element vectors for the `offset==12`, detect a failure. Then retry with 4-element vectors and detect no failures. The issue with that solution: we would first have to schedule the VTransform graph, so that we get all the store-load dependencies for the vectors. And then we would have to go back to SuperWord, and change the packs, and build a new VTransform. Maybe there are other ways... but they all seem more complicated. Maybe in the future I will add such a "retry with shorter vectors" mechanic. But it would mean that SuperWord/VTransform may run multiple times, and that could be expensive at compile-time. We would have to do this carefully and with plenty of performance testing. At any rate: this is a simple solution here, it works - surely not optimally - but it works. It fixes the regression I introduced earlier in JDK24 with [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). We are close to the JDK24/JDK25 fork, and it would be nice to get this JDK24 regression integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2487823977 From epeter at openjdk.org Wed Nov 20 10:14:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Nov 2024 10:14:27 GMT Subject: RFR: 8341293: Split field loads through Nested Phis In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:23:05 GMT, Dhamoder Nalla wrote: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) Looks interesting. We are getting close to JDK25 fork, so we should make sure to wait until then, because these kinds of changes have long bug-tails. Can you show the benchmark numbers from your micro-benchmark? test/micro/org/openjdk/bench/vm/compiler/AllocationMergesNestedPhi.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. Drive-by comment: fix copyright header date ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21270#pullrequestreview-2448152603 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1850013901 From thartmann at openjdk.org Wed Nov 20 10:39:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 10:39:24 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:53:41 GMT, Richard Reingruber wrote: > This change removes the ResourceMark from `PhaseChaitin::merge_multidefs()` because it frees memory that is used in the caller method `PhaseChaitin::Register_Allocate`. > [My comment](https://bugs.openjdk.org/browse/JDK-8328085?focusedId=14723086&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14723086) on the JBS item explains the details. > > #### Testing > I was able to reproduce the issue on ppc64le but not on x86_64 running applications/ctw/modules/java_desktop.java. The issue didn't reproduce with this pr. > > #### ResourceArea Sizes > > I've traced maximum ResourceArea size after returning from `PhaseChaitin::merge_multidefs()` (see [first commit](https://github.com/openjdk/jdk/pull/22200/commits/ffbe6dee05a5a66c2965f4ff7e4cd466605cf89d)). > I haven't found a significant difference. > Below you can see the last trace line from each run. > > ##### x86_64: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [24.222s][info][newcode] New maximum for resource area size: 3274 KB > Run 2: [21.317s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [37.400s][info][newcode] New maximum for resource area size: 3336 KB > > ###### PR > Run 1: [35.002s][info][newcode] New maximum for resource area size: 3363 KB > Run 2: [21.332s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [36.050s][info][newcode] New maximum for resource area size: 3286 KB > > ##### x86_64: 3 Runs applications/ctw/modules/java_desktop.java > > ###### Baseline > Run 1: [29.876s][info][newcode] New maximum for resource area size: 3143 KB > Run 2: [29.631s][info][newcode] New maximum for resource area size: 3111 KB > Run 3: [29.227s][info][newcode] New maximum for resource area size: 3142 KB > > ###### PR > Run 1: [29.755s][info][newcode] New maximum for resource area size: 3175 KB > Run 2: [28.964s][info][newcode] New maximum for resource area size: 3143 KB > Run 3: [28.863s][info][newcode] New maximum for resource area size: 3143 KB > > ##### PPC: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [20.041s][info][newcode] New maximum for resource area size: 3474 KB > Run 2: [20.581s][info][newcode] New maximum for resource area size: 3474 KB > Run 3: [20.367s][info][newcode] New maximum for resource area size: 3474 KB > > ###### PR > Run 1: [20.520s][info][newcode] New maximum for resource area size: 3506 KB > Run 2: [20.918s][info][newcode] New maximum for resource area size: 3506 KB > Run 3: [20.994s][info][newcode] New maximum for resource area size: 3505 KB > > ##### PPC: 3 Runs ... Nice analysis! The fix looks reasonable to me but I'm a bit worried that such removals of ResourceMarks will lead to an increase in peak memory consumption because memory is only released much later now. And I would assume there is a reason for the ResourceMark placement, i.e., below code doing significant temporary allocations. Kind of related: [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22200#pullrequestreview-2448215994 From mli at openjdk.org Wed Nov 20 10:41:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 20 Nov 2024 10:41:21 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 06:48:58 GMT, SendaoYan wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option > > SendaoYan has updated the pull request incrementally with one additional commit since the last revision: > > make exclude 2 frame metadata after "fp -esp" Looks good, just one minor comment. src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 447: > 445: sub(t1, fp, esp); > 446: sub(t1, t1, frame::metadata_words * wordSize); // Exclude 2 frame metadata words > 447: int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset) * wordSize; Could we also remove the above `sub`, and add `frame::metadata_words * wordSize` in this statement? it's not necessary to calculate the value in the generated code and seems to me the code will be more straight. ------------- PR Review: https://git.openjdk.org/jdk/pull/22264#pullrequestreview-2448220536 PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1850056604 From fyang at openjdk.org Wed Nov 20 10:49:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 20 Nov 2024 10:49:15 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 10:38:39 GMT, Hamlin Li wrote: >> SendaoYan has updated the pull request incrementally with one additional commit since the last revision: >> >> make exclude 2 frame metadata after "fp -esp" > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 447: > >> 445: sub(t1, fp, esp); >> 446: sub(t1, t1, frame::metadata_words * wordSize); // Exclude 2 frame metadata words >> 447: int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset) * wordSize; > > Could we also remove the above `sub`, and add `frame::metadata_words * wordSize` in this statement? it's not necessary to calculate the value in the generated code and seems to me the code will be more straight. Yeah, I agree that will be cleaner. int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset + frame::metadata_words) * wordSize; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1850072839 From thartmann at openjdk.org Wed Nov 20 11:16:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 11:16:20 GMT Subject: RFR: 8341293: Split field loads through Nested Phis In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:23:05 GMT, Dhamoder Nalla wrote: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) `AllocationMergesNestedPhiTests.java` fails on Linux AArch64: Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "int compiler.c2.irTests.scalarReplacement.AllocationMergesNestedPhiTests.testGlobalEscapeInThread_C2(boolean,int,int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={ITER_GVN_AFTER_EA}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#ALLOC#_", ">=5"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "Iter GVN after EA": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(Allocate\\b.*)+(\\s){2}===.*)" - Failed comparison: [found] 4 >= 5 [given] - Matched nodes (4): * 243 Allocate === 1899 6 7 8 1 (241 225 29 1 1 10 11 12 13 14 10 11 12 13 14 1976 1956 1 1 1 11 13 12 14 184 184 11 13 12 14 1 1 184 184 13 14 ) [[ 244 245 246 253 254 255 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) DirectMethodHandle::allocateInstance @ bci:12 (line 506) 0x0000ffff54b0c408::newInvokeSpecial @ bci:1 0x0000ffff54b0d388::linkToTargetMethod @ bci:9 AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:26 (line 247) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) !jvms: DirectMethodHandle::allocateInstance @ bci:12 (line 506) 0x0000ffff54b0c408::newInvokeSpecial @ bci:1 0x0000ffff54b0d388::linkToTargetMethod @ bci:9 AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:26 (line 247) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) * 380 Allocate === 368 254 369 8 1 (378 377 29 1 1 10 11 12 13 14 10 11 12 13 14 1974 1954 260 1 1 13 14 ) [[ 381 382 383 390 391 392 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:33 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) !jvms: AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:33 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) * 465 Allocate === 394 391 427 8 1 (109 126 29 1 1 10 11 12 13 14 10 11 12 13 14 1971 1951 260 1 1 397 397 260 397 411 411 29 260 412 1 1 1 1 397 13 14 ) [[ 466 467 468 475 476 477 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Thread:: @ bci:5 (line 319) Thread:: @ bci:6 (line 1088) AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:39 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) !jvms: Thread:: @ bci:5 (line 319) Thread:: @ bci:6 (line 1088) AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:39 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) * 911 Allocate === 903 476 904 8 1 (909 908 29 1 1 10 11 12 13 14 10 11 12 13 14 1965 1945 260 1 1 397 397 260 397 881 411 29 260 412 1 761 29 907 397 13 14 ) [[ 912 913 914 921 922 923 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top, bool ) Thread:: @ bci:98 (line 675) Thread:: @ bci:6 (line 1088) AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:39 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) !jvms: Thread:: @ bci:98 (line 675) Thread:: @ bci:6 (line 1088) AllocationMergesNestedPhiTests::testGlobalEscapeInThread @ bci:39 (line 251) AllocationMergesNestedPhiTests::testGlobalEscapeInThread_C2 @ bci:6 (line 268) >>> Check stdout for compilation output of the failed methods ------------- PR Comment: https://git.openjdk.org/jdk/pull/21270#issuecomment-2488300819 From thartmann at openjdk.org Wed Nov 20 11:38:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 11:38:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 09:45:37 GMT, theoweidmannoracle wrote: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly Looks good to me otherwise. Nice tests! src/hotspot/share/opto/divnode.cpp line 1158: > 1156: > 1157: template > 1158: Node* unsigned_mod_ideal(PhaseGVN* phase, bool can_reshape, Node* mod) { Should this method be static to limit visibility? src/hotspot/share/opto/divnode.cpp line 1202: > 1200: > 1201: template > 1202: const Type* unsigned_mod_value(PhaseGVN* phase, const Node* mod) { Should this method be static to limit visibility? test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 62: > 60: Asserts.assertFalse(shouldThrow, "Expected an exception to be thrown."); > 61: } catch (ArithmeticException e) { > 62: Asserts.assertTrue(shouldThrow, "Did not expected an exception to be thrown."); Suggestion: Asserts.assertTrue(shouldThrow, "Did not expect an exception to be thrown."); Same in other tests. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2448340110 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1850146911 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1850147036 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1850141218 From thartmann at openjdk.org Wed Nov 20 11:45:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 11:45:20 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates In-Reply-To: References: Message-ID: <4XE8XEOl9fVGxiwsyCP2_JYpFkcADziN9GMtIyp67PQ=.ad8f3e9f-caa8-4973-9623-641452c0693f@github.com> On Fri, 15 Nov 2024 08:17:22 GMT, Christian Hagedorn wrote: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian Looks reasonable to me otherwise. src/hotspot/share/opto/node.hpp line 2118: > 2116: class BFSActions : public StackObj { > 2117: public: > 2118: // Should a node's inputs further be visit in the BFS traversal? By default, we visit all data inputs. Override this Suggestion: // Should a node's inputs further be visited in the BFS traversal? By default, we visit all data inputs. Override this src/hotspot/share/opto/node.hpp line 2119: > 2117: public: > 2118: // Should a node's inputs further be visit in the BFS traversal? By default, we visit all data inputs. Override this > 2119: // method to provide a costum filter. Suggestion: // method to provide a custom filter. src/hotspot/share/opto/node.hpp line 2126: > 2124: > 2125: // Is the visited node a target node that we are looking for in the BFS traversal? We do not visit its inputs further > 2126: // but the BFS will continue to visited all unvisited nodes in the queue. Suggestion: // but the BFS will continue to visit all unvisited nodes in the queue. src/hotspot/share/opto/predicates.cpp line 243: > 241: void target_node_action(Node* target_node) override { > 242: if (target_node->is_OpaqueLoopInit()) { > 243: assert(!_found_init, "can only found one OpaqueLoopInitNode"); Suggestion: assert(!_found_init, "should only find one OpaqueLoopInitNode"); src/hotspot/share/opto/predicates.cpp line 247: > 245: } else { > 246: assert(target_node->is_OpaqueLoopStride(), "unexpected Opaque1 node"); > 247: assert(!_found_stride, "can only found one OpaqueLoopStrideNode"); Suggestion: assert(!_found_stride, "should only find one OpaqueLoopStrideNode"); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22136#pullrequestreview-2448370726 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1850160256 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1850160472 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1850161016 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1850165293 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1850166032 From thartmann at openjdk.org Wed Nov 20 11:48:17 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 11:48:17 GMT Subject: RFR: 8344533: CTW: Add option to remove clinits before loading In-Reply-To: References: Message-ID: <1G6hYK4ucRwpKDFqnAKf7J2OXTUhLbNAu1oPHxQEk6Q=.96fa71db-77ec-4969-b635-7565cf52160f@github.com> On Tue, 19 Nov 2024 10:50:48 GMT, Evgeny Nikitin wrote: > This PR adds an option-controlled (off by default) removal of methods before loading them with CTW ClassLoader. > The main purpose is to prevent `static { ... }` blocks execution (along with static fields initialization). > Testing: manual CTW runs. Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22235#pullrequestreview-2448394839 From chagedorn at openjdk.org Wed Nov 20 11:51:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 20 Nov 2024 11:51:32 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v2] In-Reply-To: References: Message-ID: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22136/files - new: https://git.openjdk.org/jdk/pull/22136/files/94abb686..5ae3a4fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22136/head:pull/22136 PR: https://git.openjdk.org/jdk/pull/22136 From chagedorn at openjdk.org Wed Nov 20 11:51:33 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 20 Nov 2024 11:51:33 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 08:17:22 GMT, Christian Hagedorn wrote: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22136#issuecomment-2488375020 From thartmann at openjdk.org Wed Nov 20 12:14:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 12:14:18 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v2] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 11:51:32 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22136#pullrequestreview-2448450925 From chagedorn at openjdk.org Wed Nov 20 12:49:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 20 Nov 2024 12:49:04 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order Message-ID: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/22136 which is not fully reviewed, yet, but I'd like to already send this PR out for review since I'm away for the rest of the week) This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". #### Current State: Mostly "reverse-order" for Assertion Predicates We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: old target loop entry | x Cloned Template Assertion | Predicate 2 Template Assertion | Predicate 1 Initialized Assertion | ==> Predicate 2 Template Assertion | Predicate 2 Cloned Template Assertion | Predicate 1 source loop | Initialized Assertion Predicate 1 | target loop I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: old target loop entry | x Cloned Template Assertion | Predicate 1 Template Assertion | Predicate 1 Initialized Assertion | ==> Predicate 1 Template Assertion | Predicate 2 Cloned Template Assertion | Predicate 2 source loop | Initialized Assertion Predicate 2 | target loop This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in-order cloning. #### Why Does Loop Unswitching Use In-Order? The main reason was that we can reuse `create_new_if_for_predicate()` which allowed us to keep the UCT on the false path. Now that [JDK-8342047](https://bugs.openjdk.org/browse/JDK-8342047) is in, we no longer use this method because we are only using halt nodes on the false path and no UCTs anymore. We could have flipped the order in Loop Unswitching to a reverse-order now. However, this is only possible for Assertion Predicates. Parse Predicates, which are also cloned during Loop Unswitching, still must keep their relative order. To do the cloning of Parse Predicates with the same predicate visitor and simplify the reasoning about a graph, I propose to switch to an in-order cloning/initialization for all the Assertion Predicates. Thanks, Christian ------------- Depends on: https://git.openjdk.org/jdk/pull/22136 Commit messages: - Update comment - 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order Changes: https://git.openjdk.org/jdk/pull/22275/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344171 Stats: 87 lines in 4 files changed: 41 ins; 33 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22275/head:pull/22275 PR: https://git.openjdk.org/jdk/pull/22275 From chagedorn at openjdk.org Wed Nov 20 12:49:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 20 Nov 2024 12:49:05 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Wed, 20 Nov 2024 12:40:58 GMT, Christian Hagedorn wrote: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/22136 which is not fully reviewed, yet, but I'd like to already send this PR out for review since I'm away for the rest of the week) > > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloni... src/hotspot/share/opto/loopTransform.cpp line 1769: > 1767: _igvn.replace_input_of(target_outer_loop_head, LoopNode::EntryControl, last_created_predicate_success_proj); > 1768: set_idom(target_outer_loop_head, last_created_predicate_success_proj, dom_depth(target_outer_loop_head)); > 1769: } Workflow before was: Reverse-order cloning, so the very last clone needs to be connected to the loop head which was done here. With in-order, we can take care of the rewiring in the predicate visitor itself. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22275#discussion_r1850251858 From syan at openjdk.org Wed Nov 20 12:52:39 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 12:52:39 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v4] In-Reply-To: References: Message-ID: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option SendaoYan has updated the pull request incrementally with two additional commits since the last revision: - delete comment "// Exclude 2 frame metadata words" - Avoid calculate the compare value in the generated code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22264/files - new: https://git.openjdk.org/jdk/pull/22264/files/bc90d67d..fbfb75f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22264&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22264.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22264/head:pull/22264 PR: https://git.openjdk.org/jdk/pull/22264 From syan at openjdk.org Wed Nov 20 12:52:39 2024 From: syan at openjdk.org (SendaoYan) Date: Wed, 20 Nov 2024 12:52:39 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v3] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 10:46:33 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 447: >> >>> 445: sub(t1, fp, esp); >>> 446: sub(t1, t1, frame::metadata_words * wordSize); // Exclude 2 frame metadata words >>> 447: int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset) * wordSize; >> >> Could we also remove the above `sub`, and add `frame::metadata_words * wordSize` in this statement? it's not necessary to calculate the value in the generated code and seems to me the code will be more straight. > > Yeah, I agree that will be cleaner. > > int min_frame_size = (frame::link_offset - frame::interpreter_frame_initial_sp_offset + frame::metadata_words) * wordSize; Thanks, the second `sub` calculation has been merged to `min_frame_size ` statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22264#discussion_r1850260693 From mli at openjdk.org Wed Nov 20 13:22:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 20 Nov 2024 13:22:21 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v4] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 12:52:39 GMT, SendaoYan wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - delete comment "// Exclude 2 frame metadata words" > - Avoid calculate the compare value in the generated code Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22264#pullrequestreview-2448599976 From qamai at openjdk.org Wed Nov 20 13:56:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 20 Nov 2024 13:56:28 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 16:09:31 GMT, Emanuel Peter wrote: >> **History** >> This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): >> On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: >> `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` >> >> I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. >> >> **Summary of Problem** >> >> As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. >> >> **Benchmark** >> >> I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). >> >> The benchmarks look different on different machines, but they all have a pattern similar to this: >> ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) >> ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) >> ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) >> ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) >> >> We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offse... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix benchmark Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21521#pullrequestreview-2448687797 From mdoerr at openjdk.org Wed Nov 20 14:27:16 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Nov 2024 14:27:16 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:53:41 GMT, Richard Reingruber wrote: > This change removes the ResourceMark from `PhaseChaitin::merge_multidefs()` because it frees memory that is used in the caller method `PhaseChaitin::Register_Allocate`. > [My comment](https://bugs.openjdk.org/browse/JDK-8328085?focusedId=14723086&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14723086) on the JBS item explains the details. > > #### Testing > I was able to reproduce the issue on ppc64le but not on x86_64 running applications/ctw/modules/java_desktop.java. The issue didn't reproduce with this pr. > > #### ResourceArea Sizes > > I've traced maximum ResourceArea size after returning from `PhaseChaitin::merge_multidefs()` (see [first commit](https://github.com/openjdk/jdk/pull/22200/commits/ffbe6dee05a5a66c2965f4ff7e4cd466605cf89d)). > I haven't found a significant difference. > Below you can see the last trace line from each run. > > ##### x86_64: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [24.222s][info][newcode] New maximum for resource area size: 3274 KB > Run 2: [21.317s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [37.400s][info][newcode] New maximum for resource area size: 3336 KB > > ###### PR > Run 1: [35.002s][info][newcode] New maximum for resource area size: 3363 KB > Run 2: [21.332s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [36.050s][info][newcode] New maximum for resource area size: 3286 KB > > ##### x86_64: 3 Runs applications/ctw/modules/java_desktop.java > > ###### Baseline > Run 1: [29.876s][info][newcode] New maximum for resource area size: 3143 KB > Run 2: [29.631s][info][newcode] New maximum for resource area size: 3111 KB > Run 3: [29.227s][info][newcode] New maximum for resource area size: 3142 KB > > ###### PR > Run 1: [29.755s][info][newcode] New maximum for resource area size: 3175 KB > Run 2: [28.964s][info][newcode] New maximum for resource area size: 3143 KB > Run 3: [28.863s][info][newcode] New maximum for resource area size: 3143 KB > > ##### PPC: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [20.041s][info][newcode] New maximum for resource area size: 3474 KB > Run 2: [20.581s][info][newcode] New maximum for resource area size: 3474 KB > Run 3: [20.367s][info][newcode] New maximum for resource area size: 3474 KB > > ###### PR > Run 1: [20.520s][info][newcode] New maximum for resource area size: 3506 KB > Run 2: [20.918s][info][newcode] New maximum for resource area size: 3506 KB > Run 3: [20.994s][info][newcode] New maximum for resource area size: 3505 KB > > ##### PPC: 3 Runs ... LGTM. Thank you for fixing this annoying bug! Would it make sense to add a comment somewhere explaining why we can't have a `ResourceMark`? ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22200#pullrequestreview-2448774495 From epeter at openjdk.org Wed Nov 20 14:27:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Nov 2024 14:27:24 GMT Subject: RFR: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures [v8] In-Reply-To: <5u5EVs-ykQcF9eNZiNdDIEFOW7IYMKLQLgcb66o7PiI=.c9fb768e-a54a-4004-81ef-1f561005b18c@github.com> References: <5u5EVs-ykQcF9eNZiNdDIEFOW7IYMKLQLgcb66o7PiI=.c9fb768e-a54a-4004-81ef-1f561005b18c@github.com> Message-ID: On Tue, 19 Nov 2024 16:52:54 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix benchmark > > Thanks for the updates and the improved comments. That looks good to me. Nice compression of the initial large benchmark :-) @chhagedorn @merykitty thank you very much for the reviews and great suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21521#issuecomment-2488715760 From epeter at openjdk.org Wed Nov 20 14:27:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Nov 2024 14:27:25 GMT Subject: Integrated: 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 11:33:04 GMT, Emanuel Peter wrote: > **History** > This issue became apparent with https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155): > On machines that do not support sha intrinsics, we execute the sha code in java code. This java code has a loop that previously did not vectorize, but it now does since https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). It turns out that that kind of loop is actually slower when vectorized - this led to a regression, reported originally as: > `8334431: Regression 18-20% on Mac x64 on Crypto.signverify` > > I then investigated the issue thoroughly, and discovered that it was even an issue before https://github.com/openjdk/jdk/pull/21521 / [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I wrote a [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html) about the issue. > > **Summary of Problem** > > As described in the [blog-post ](https://eme64.github.io/blog/2024/06/24/Auto-Vectorization-and-Store-to-Load-Forwarding.html), vectorization can introduce store-to-load failures that were not present in the scalar loop code. Where in scalar code, the loads and stores were all exactly overlapping or non-overlapping, in vectorized code they can now be partially overlapping. When a store and a later load are partially overlapping, the store value cannot be directly forwarded from the store-buffer to the load (would be fast), but has to first go to L1 cache. This incurs a higher latency on the dependency edge from the store to the load. > > **Benchmark** > > I introduced a new micro-benchmark in https://github.com/openjdk/jdk/pull/19880, and now further expanded it in this PR. You can see the extensive results in [this comment below](https://github.com/openjdk/jdk/pull/21521#issuecomment-2458938698). > > The benchmarks look different on different machines, but they all have a pattern similar to this: > ![image](https://github.com/user-attachments/assets/3366f7fa-af44-44d4-a476-8cd0466fe937) > ![image](https://github.com/user-attachments/assets/1c1408c2-053e-4a8a-ad46-32b75b836161) > ![image](https://github.com/user-attachments/assets/d392c8cf-fb62-4593-93c7-a0d85ad5885e) > ![image](https://github.com/user-attachments/assets/3a79601f-4015-4f71-a510-cab7d7b59ed8) > > We see that the `scalar` loop is faster for low `offset`, and the `vectorized` loop is faster for high offsets (and power-of-w offsets). > > The reason is that for low offsets, th... This pull request has now been integrated. Changeset: 75420e93 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/75420e9314c54adc5b45f9b274a87af54dd6b5a8 Stats: 885 lines in 17 files changed: 822 ins; 4 del; 59 mod 8334431: C2 SuperWord: fix performance regression due to store-to-load-forwarding failures Reviewed-by: chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/21521 From epeter at openjdk.org Wed Nov 20 14:35:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 20 Nov 2024 14:35:36 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v2] In-Reply-To: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: > Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 > > It will be important for the efforts in: > [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count > > I ran the plots for `byte, int, long`. > We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. > > We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. > > --------------------------------------------------- > **Results** > > red: normal -> saw-tooth > green: randomized offsets -> more "smooth" > > linux_x64 > ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) > > linux_aarch64 > ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) > > macosx_x64 > ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) > > macosx_aarch64 > ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) > > windows_x64 > ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - whitespace - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - JDK-8344118 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22070/files - new: https://git.openjdk.org/jdk/pull/22070/files/cb9c6fd1..c3930c4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22070&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22070&range=00-01 Stats: 9229 lines in 369 files changed: 3856 ins; 3824 del; 1549 mod Patch: https://git.openjdk.org/jdk/pull/22070.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22070/head:pull/22070 PR: https://git.openjdk.org/jdk/pull/22070 From rrich at openjdk.org Wed Nov 20 14:40:18 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 20 Nov 2024 14:40:18 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 10:36:38 GMT, Tobias Hartmann wrote: > Nice analysis! The fix looks reasonable to me but I'm a bit worried that such removals of ResourceMarks will lead to an increase in peak memory consumption because memory is only released much later now. And I would assume there is a reason for the ResourceMark placement, i.e., below code doing significant temporary allocations. Kind of related: [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). I was also worried about increased memory consumption. There's at least the minimal testing I'm reporting about in the pr synopsis. Do you know about work loads that lead to higher memory consumption? Maybe you've got some internal tests. For that purpose I put the trace code into the pr. Should be easy to use. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22200#issuecomment-2488755471 From rrich at openjdk.org Wed Nov 20 14:49:19 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 20 Nov 2024 14:49:19 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 14:24:18 GMT, Martin Doerr wrote: > LGTM. Thank you for fixing this annoying bug! Would it make sense to add a comment somewhere explaining why we can't have a `ResourceMark`? Thanks for reviewing! IMHO we're good if https://github.com/openjdk/jdk/pull/22269 is accepted. Let me know if you prefer an additional comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22200#issuecomment-2488779186 From bkilambi at openjdk.org Wed Nov 20 14:49:29 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Nov 2024 14:49:29 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: > 42: @Test > 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) Wouldn't the Ideal transforms convert the IR for this test case to - ReinterpretS2HF ReinterpretS2HF \ / AddHF | ReinterpretHF2S | ConvHF2F ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850449500 From eastigeevich at openjdk.org Wed Nov 20 14:55:25 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 20 Nov 2024 14:55:25 GMT Subject: RFR: 8335662: [AArch64] C1: guarantee(val < (1ULL << nbits)) failed: Field too big for insn [v2] In-Reply-To: References: <8RFDZNLexe8UeSllb743kSrzqFoZYQA1nMNTOrQD_tQ=.1c08fd3b-5d9e-4c42-a094-0fb2b4cbe08f@github.com> Message-ID: On Tue, 15 Oct 2024 19:32:27 GMT, Andrew Haley wrote: > One thing for you to think about if you are interested in some further work in this area.. > > This is a generic problem. It might be very beneficial to look for every base + immediate offset instruction, see if there is a possibility that there may be an overflow, and insert a `form_address()`. Hi @theRealAph, https://bugs.openjdk.org/browse/JDK-8342736 We found there are around ~400 ldr calls and ~180 str calls that would need to be manually updated. I found that we use offsets in C++ classes a lot: $ cpu/aarch64 % grep _offset() *.* | grep ldr | wc -l 250 $ cpu/aarch64 % grep _offset() *.* | grep str | wc -l 94 $ cpu/aarch64 % grep _offset() *.* | grep lea | wc -l 27 IMO, we can use `static_assert` for them. The problem is that the macro `offset_of` is not `constexpr`. Making it `constexpr` is not simple. The standard macro `offsetof` requires classes to have the standard layout. Most of our classes don't have the standard layout. We'll get warnings about this. As you wrote in the macro comments, we can disable warnings. I think we can have something like this: ldr(dst, Address(rmethod, create_mem_op_imm(Method, const_offset))); #define create_mem_op_imm(klass, field_offset_func) \ ([]() { \ constexpr max_possible_offset = sizeof(klass); static_assert(Address::offset_ok_for_immed(max_possible_offset, 0)); \ return klass::field_offset_func(); \ }()) If the size of a class fits into a memory instructions then any offset in it will fit. Class sizes greater than 32760 look insane to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21473#issuecomment-2488792792 From jbhateja at openjdk.org Wed Nov 20 15:00:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 20 Nov 2024 15:00:38 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 14:46:46 GMT, Bhavana Kilambi wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: > >> 42: @Test >> 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >> 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > > Wouldn't the Ideal transforms convert the IR for this test case to - > > ReinterpretS2HF ReinterpretS2HF > \ / > AddHF > | > ReinterpretHF2S > | > ConvHF2F > > in which case, ConvF2HF won't match? New transforms are guarded by target features checks, the IR test rules are enforced only on non AVX512_FP16 targets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850469049 From bkilambi at openjdk.org Wed Nov 20 15:05:20 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Nov 2024 15:05:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 14:57:11 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: >> >>> 42: @Test >>> 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >>> 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >> >> Wouldn't the Ideal transforms convert the IR for this test case to - >> >> ReinterpretS2HF ReinterpretS2HF >> \ / >> AddHF >> | >> ReinterpretHF2S >> | >> ConvHF2F >> >> in which case, ConvF2HF won't match? > > New transforms are guarded by target features checks, the IR test rules are enforced only on non AVX512_FP16 targets. Oh right! Sorry misread the IR test rules. Got it now. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850477856 From dlunden at openjdk.org Wed Nov 20 15:40:22 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 20 Nov 2024 15:40:22 GMT Subject: RFR: 8331295: C2: Do not clone address computations that are indirect memory input to at least one load/store In-Reply-To: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: <4f8YHVSn3FzNIKJBAKWmtee9jx2o3mdPfvZrc2uW65A=.15a29e93-1d31-45b9-bcd9-830648f321f6@github.com> On Fri, 15 Nov 2024 18:29:48 GMT, Daniel Lund?n wrote: > On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. > > ### Changeset > > - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. > - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. > - Add a new IR framework regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22157#issuecomment-2488914316 From dlunden at openjdk.org Wed Nov 20 15:40:23 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 20 Nov 2024 15:40:23 GMT Subject: Integrated: 8331295: C2: Do not clone address computations that are indirect memory input to at least one load/store In-Reply-To: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> References: <9i0WCVzqSbY98ETkLhhurRluzNPQ0gei4ZKnh70LQjc=.0bf483e2-d1d8-4e07-97ab-83f908064012@github.com> Message-ID: On Fri, 15 Nov 2024 18:29:48 GMT, Daniel Lund?n wrote: > On aarch64, the C2 instruction matcher often clones addressing expressions, expecting them to be subsumed (during later stages of matching) into complex load/store instructions. However, volatile aarch64 load and store instructions have indirect memory inputs and therefore cannot subsume the addressing computation. In one case that we investigated, the result is a very large number of cloned identical instructions for address computations that, in combination with how the instruction scheduler currently hoists instructions, create major difficulties for the register allocator. > > ### Changeset > > - Add a guard that ensures the instruction matcher does not clone addressing expressions that have at least one successor load/store that cannot subsume the addressing computation. One could argue that, in cases where there is at least one such successor, other successors may be able to subsume the computation and we should therefore still clone the expression. The benefit of subsuming in such a case is unclear, however, as we in any case need to generate at least one separate instruction for the addressing computation. > - Remove temporary `-XX:CompileCommand=memlimit,...,0` for tests that previously failed. > - Add a new IR framework regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/11859255022) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance benchmarks: DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No clear regressions. This pull request has now been integrated. Changeset: 7d4c3fd0 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/7d4c3fd0915cfa8b279f42494625ec6afda338af Stats: 252 lines in 4 files changed: 249 ins; 1 del; 2 mod 8331295: C2: Do not clone address computations that are indirect memory input to at least one load/store Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22157 From thartmann at openjdk.org Wed Nov 20 16:10:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 20 Nov 2024 16:10:24 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: <4SYVjMwZwMAY4sYABUHR1hZRL7DGulsRrKCbkPh6IeY=.c9979612-e8d8-43e0-885e-5ccd9b6762fb@github.com> On Wed, 20 Nov 2024 14:37:14 GMT, Richard Reingruber wrote: > I was also worried about increased memory consumption. There's at least the minimal testing I'm reporting about in the pr synopsis. Do you know about work loads that lead to higher memory consumption? Maybe you've got some internal tests. For that purpose I put the trace code into the pr. Should be easy to use. Right, I think your testing is good enough for now. Maybe we can do a more thorough investigation with [JDK-8337015](https://bugs.openjdk.org/browse/JDK-8337015). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22200#issuecomment-2488990812 From mdoerr at openjdk.org Wed Nov 20 23:34:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 20 Nov 2024 23:34:56 GMT Subject: RFR: 8344026: [s390x] ubsan failure: signed integer overflow in c1_LIRGenerator_s390.cpp [v4] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 15:48:49 GMT, Amit Kumar wrote: >> This PR adds `c > 0 && c < max_jint` check in c1_LIRGenerator_s390.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > remove dummy code I have looked further into the PPC64 code and figured out that `load_nonconstant()` loads all values which are not simm16 into a register: https://github.com/openjdk/jdk/blob/b9bf447209db5d7f6bb16a0310421dbe4170500c/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp#L487C11-L487C29 https://github.com/openjdk/jdk/blob/b9bf447209db5d7f6bb16a0310421dbe4170500c/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp#L59 only accepts simm16 values. So, `strength_reduce_multiply` will never get a value which overflows on PPC64. That's the reason why PPC64 is not affected by the UB bug. In your comment above, I guess you missed that `is_power_of_2((juint)c + 1)` returns true for c = MAX_INT. So, the optimization can be used for multiplications with MAX_INT if the UB is fixed. I also think that having a more similar solution for all platforms would be nice. For PPC64 and some other platforms, this may only be a cleanup, not a bug fix. In addition, the title is no longer up to date. You're changing more than s390 code. If you prefer to fix only UB on the affected platforms, this will also be fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2489747680 From psandoz at openjdk.org Thu Nov 21 00:42:27 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 21 Nov 2024 00:42:27 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin To make it easier to review this large change i recommend that the aarch64 changes be separated into a dependent PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2489828972 From syan at openjdk.org Thu Nov 21 02:07:25 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 21 Nov 2024 02:07:25 GMT Subject: RFR: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize [v4] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 12:52:39 GMT, SendaoYan wrote: >> Hi all, >> Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. >> >> Additional testing >> >> - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option > > SendaoYan has updated the pull request incrementally with two additional commits since the last revision: > > - delete comment "// Exclude 2 frame metadata words" > - Avoid calculate the compare value in the generated code Thanks all for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22264#issuecomment-2489914062 From syan at openjdk.org Thu Nov 21 02:07:25 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 21 Nov 2024 02:07:25 GMT Subject: Integrated: 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize In-Reply-To: References: Message-ID: <3SzgHsmAI6Cn12b5H6qa6pNZJzQcjRGxII0Wqx071X4=.a05179eb-1516-452d-9f47-7ca26428db9b@github.com> On Wed, 20 Nov 2024 02:43:42 GMT, SendaoYan wrote: > Hi all, > Currently on linux-riscv64 platform the debug VM option `-XX:+VerifyActivationFrameSize` is Unimplemented. I want to implement `-XX:+VerifyActivationFrameSize` to make JVM easier to diagnose. Only effect debug build plus with option `-XX:+VerifyActivationFrameSize`, the change has been verified locally, the risk is low. > > Additional testing > > - [x] Run SPECjbb2015 with -XX:+VerifyActivationFrameSize option This pull request has now been integrated. Changeset: 4fbf2720 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/4fbf272017d2f6933e66f8a67cb88e3ffc42339e Stats: 8 lines in 1 file changed: 7 ins; 0 del; 1 mod 8344526: RISC-V: implement -XX:+VerifyActivationFrameSize Co-authored-by: Fei Yang Reviewed-by: mli, fyang ------------- PR: https://git.openjdk.org/jdk/pull/22264 From haosun at openjdk.org Thu Nov 21 02:44:22 2024 From: haosun at openjdk.org (Hao Sun) Date: Thu, 21 Nov 2024 02:44:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi. Better to update the copyright year to 2024 for the following modified files: src/hotspot/share/adlc/output_h.cpp src/hotspot/share/opto/connode.cpp src/hotspot/share/opto/connode.hpp src/hotspot/share/opto/constantTable.cpp src/hotspot/share/opto/divnode.cpp src/hotspot/share/opto/divnode.hpp src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2489949430 From duke at openjdk.org Thu Nov 21 04:54:22 2024 From: duke at openjdk.org (duke) Date: Thu, 21 Nov 2024 04:54:22 GMT Subject: RFR: 8326369: Add test to verify bimorphic inlining happens after morphism changes [v5] In-Reply-To: References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Thu, 14 Nov 2024 14:36:16 GMT, Galder Zamarre?o wrote: >> This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into topic.bimorphic-inlining > - Update test/hotspot/jtreg/compiler/inlining/InlineBimorphicVirtualCallAfterMorphismChanged.java > > Co-authored-by: Tobias Hartmann > - Added Jetbrains copyright > - Added copyright and @bug identifiers > - Fix formatting > - Fix more formatting issues > - Fix formatting > - Add test that replicates issue > > Co-authored-by: Filipp Zhinkin @galderz Your change (at version 42152a1fcc7c985b505902870ee73e2038e3936c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21920#issuecomment-2490074679 From rrich at openjdk.org Thu Nov 21 06:49:16 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 21 Nov 2024 06:49:16 GMT Subject: RFR: 8328085: C2: Use after free in PhaseChaitin::Register_Allocate() In-Reply-To: References: Message-ID: <5LEKuaTi6KRWBz7UjrD_jY--kldQSDbDIdviqQnnToo=.d4e6647a-341d-4398-9a4c-100d9ea18c6a@github.com> On Mon, 18 Nov 2024 10:53:41 GMT, Richard Reingruber wrote: > This change removes the ResourceMark from `PhaseChaitin::merge_multidefs()` because it frees memory that is used in the caller method `PhaseChaitin::Register_Allocate`. > [My comment](https://bugs.openjdk.org/browse/JDK-8328085?focusedId=14723086&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14723086) on the JBS item explains the details. > > #### Testing > I was able to reproduce the issue on ppc64le but not on x86_64 running applications/ctw/modules/java_desktop.java. The issue didn't reproduce with this pr. > > #### ResourceArea Sizes > > I've traced maximum ResourceArea size after returning from `PhaseChaitin::merge_multidefs()` (see [first commit](https://github.com/openjdk/jdk/pull/22200/commits/ffbe6dee05a5a66c2965f4ff7e4cd466605cf89d)). > I haven't found a significant difference. > Below you can see the last trace line from each run. > > ##### x86_64: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [24.222s][info][newcode] New maximum for resource area size: 3274 KB > Run 2: [21.317s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [37.400s][info][newcode] New maximum for resource area size: 3336 KB > > ###### PR > Run 1: [35.002s][info][newcode] New maximum for resource area size: 3363 KB > Run 2: [21.332s][info][newcode] New maximum for resource area size: 3274 KB > Run 3: [36.050s][info][newcode] New maximum for resource area size: 3286 KB > > ##### x86_64: 3 Runs applications/ctw/modules/java_desktop.java > > ###### Baseline > Run 1: [29.876s][info][newcode] New maximum for resource area size: 3143 KB > Run 2: [29.631s][info][newcode] New maximum for resource area size: 3111 KB > Run 3: [29.227s][info][newcode] New maximum for resource area size: 3142 KB > > ###### PR > Run 1: [29.755s][info][newcode] New maximum for resource area size: 3175 KB > Run 2: [28.964s][info][newcode] New maximum for resource area size: 3143 KB > Run 3: [28.863s][info][newcode] New maximum for resource area size: 3143 KB > > ##### PPC: 3 Runs Dacapo Tomcat 5 Iterations > > ###### Baseline > Run 1: [20.041s][info][newcode] New maximum for resource area size: 3474 KB > Run 2: [20.581s][info][newcode] New maximum for resource area size: 3474 KB > Run 3: [20.367s][info][newcode] New maximum for resource area size: 3474 KB > > ###### PR > Run 1: [20.520s][info][newcode] New maximum for resource area size: 3506 KB > Run 2: [20.918s][info][newcode] New maximum for resource area size: 3506 KB > Run 3: [20.994s][info][newcode] New maximum for resource area size: 3505 KB > > ##### PPC: 3 Runs ... Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22200#issuecomment-2490203592 From epeter at openjdk.org Thu Nov 21 07:41:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 07:41:46 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers Message-ID: This is a followup to: https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. **The problem is this:** Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. -------------------------------- First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. ------------------- Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. ------------- Commit messages: - fix LoopCombinedOpTest.java - same case again - fixup for Christian - manual merge - fix sizes for LoopCombinedOpTest.java - fix IRExample.java - fix TestFloatConversionsVector.java - fix TestUnorderedReductionPartialVectorization.java - fix TestSplitPacks.java - fix 2 more tests - ... and 17 more: https://git.openjdk.org/jdk/compare/75420e93...b56c88cb Changes: https://git.openjdk.org/jdk/pull/22199/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22199&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340010 Stats: 922 lines in 15 files changed: 782 ins; 23 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/22199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22199/head:pull/22199 PR: https://git.openjdk.org/jdk/pull/22199 From rkennke at openjdk.org Thu Nov 21 07:41:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 21 Nov 2024 07:41:46 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Looks good to me, thanks a lot for fixing this. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2445355327 From mli at openjdk.org Thu Nov 21 07:41:46 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Nov 2024 07:41:46 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Looks good, thanks! Just FYI, we met the alignment issue on riscv, so also fixed `AlignVector` on riscv (https://github.com/openjdk/jdk/pull/21974). ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2446078775 From chagedorn at openjdk.org Thu Nov 21 07:41:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 21 Nov 2024 07:41:47 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Looks good to me so far. Will have another look when it's no longer in draft. test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java line 427: > 425: applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"}, > 426: // UNSAFE.ARRAY_BYTE_BASE_OFFSET = 16, but with compact object headers UNSAFE.ARRAY_BYTE_BASE_OFFSET=12. > 427: // If AlignVector=true, we need the offset to be 8-aligned, else the vectors are filtered out. Suggestion: // If AlignVector=true, we need the offset to be 8-byte aligned, else the vectors are filtered out. Some more cases below. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2448328567 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1850133912 From epeter at openjdk.org Thu Nov 21 07:41:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 07:41:47 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: <_XnvtAaYOSJqgAKMJpSvOcpJioEE3itoX_rIT3cNnXY=.73125a08-2e74-4a4b-809d-c7b986fd46cb@github.com> On Tue, 19 Nov 2024 12:39:44 GMT, Roman Kennke wrote: >> This is a followup to: >> https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) >> >> @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. >> >> **The problem is this:** >> Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. >> >> -------------------------------- >> >> First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. >> >> ------------------- >> >> Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. >> >> I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. >> >> To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. >> >> In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. > > Looks good to me, thanks a lot for fixing this. @rkennke bad news: many many more vectorization test are failing with `+AlignVector` and `+UseCompactObjectHeaders`. compiler/vectorization/TestFloatConversionsVector.java compiler/c2/irTests/TestVectorConditionalMove.java compiler/loopopts/superword/TestScheduleReordersScalarMemops.java compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java compiler/c2/TestCastX2NotProcessedIGVN.java compiler/loopopts/superword/TestSplitPacks.java compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java testlibrary_tests/ir_framework/examples/IRExample.java and some in our closed repository as well. Ouff that's going to be a a bit of work... ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2485816237 From epeter at openjdk.org Thu Nov 21 07:41:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 07:41:47 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. It could also be that some simply fail because of `AlignVector`... I'll have to see. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2485819747 From rkennke at openjdk.org Thu Nov 21 07:41:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 21 Nov 2024 07:41:47 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: <_i26J_v6BTgFuSy16U2HXldXQi1KF_lnsgJXbsbON9g=.4a90a31b-a8be-443a-925e-7981e5868678@github.com> On Tue, 19 Nov 2024 14:07:49 GMT, Emanuel Peter wrote: > It could also be that some simply fail because of `AlignVector`... I'll have to see. Could it be a solution to simply disable UCOH (with a warning) when AlignVector is requested together with UCOH? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2485823790 From epeter at openjdk.org Thu Nov 21 07:41:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 07:41:47 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: <_i26J_v6BTgFuSy16U2HXldXQi1KF_lnsgJXbsbON9g=.4a90a31b-a8be-443a-925e-7981e5868678@github.com> References: <_i26J_v6BTgFuSy16U2HXldXQi1KF_lnsgJXbsbON9g=.4a90a31b-a8be-443a-925e-7981e5868678@github.com> Message-ID: On Tue, 19 Nov 2024 14:09:22 GMT, Roman Kennke wrote: > Could it be a solution to simply disable UCOH (with a warning) when AlignVector is requested together with UCOH? That could be a solution, yes. Would that be acceptable for all the people that reviewed the JEP? @rkennke @Hamlin-Li hold your horses... this is still a draft and many more changes are coming ? @rkennke @Hamlin-Li @chhagedorn I think I am there now. Can I please have a re-review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2485831364 PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2486268618 PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2490276259 From bkilambi at openjdk.org Thu Nov 21 08:32:23 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 08:32:23 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <5JP6jPC2kBjgbzZa1397E5ROgo5xY9QpusWzUDMN6jg=.c4735599-b1d0-4a02-a5e6-d5f7eeefce8e@github.com> On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - https://github.com/jatin-bhateja/jdk/pull/6. Do you think it's ok to keep the code as is for some more time until my patch is rebased and merged? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490369622 From jbhateja at openjdk.org Thu Nov 21 09:06:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Nov 2024 09:06:18 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v6] In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: <08UceifyOK3Ybhgtsa0PoYYLUXfNNh0yr4cMRd6EsZQ=.6d7c8ab6-b78c-463e-8be0-dcd14a443528@github.com> > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments + extending IR tests with instruction level checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21244/files - new: https://git.openjdk.org/jdk/pull/21244/files/84f2e04f..c128291c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21244&range=04-05 Stats: 40 lines in 2 files changed: 19 ins; 1 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/21244.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21244/head:pull/21244 PR: https://git.openjdk.org/jdk/pull/21244 From jbhateja at openjdk.org Thu Nov 21 09:06:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Nov 2024 09:06:20 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2] In-Reply-To: References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Tue, 12 Nov 2024 21:49:22 GMT, Vladimir Ivanov wrote: >>> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I don't see a compelling reason to do so. IMO matcher functionality is more than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + `has_int_inputs()` predicate. For `MulUIVL` you additionally do input rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate 0xFFFFFFFF))` operands can be easily detected on matcher side (with an extra AD instruction); and (2) such optimization is limited because it is valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C <= 0xFFFFFFFF`. >>> >>> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without improving situation during matching. >> >> Hi @iwanowww , >> Problem occurs only if AndV gets shared; in such a case, matcher will not be able to identify the constrained multiplication pattern and absorb the masking pattern. Specialized IR overrules such limitations and shields the pattern from downstream optimization passes, thereby removing any non-determinism. In addition, it facilitates forwarding inputs to the multiplier, the new IR is explicit in its semantics of considering only lower doublewords of quadword lanes for multiplication, hence we can safely save emitting redundant input masking instructions. We already have specialized IR nodes like MulAddVS2VINode and I see these new IR nodes similar to it. > > @jatin-bhateja in case when `AndV` is shared, it can't be eliminated unless all users absorb it. For such cases, matcher can perform adhoc node cloning, but in this particular case it looks like an overkill either way. IMO the pattern is too niche to focus on it (either to justify input forwarding or adhoc handling on matcher side). > > It's good you mentioned `MulAddVS2VI`. On one hand, VNNI operations are more complex (similar to FMA), so such complexity *may* be justified there. On the other hand, it doesn't look like VNNI support in C2 age well. It is tied to auto-vectorizer and, by now, Vector API doesn't benefit from it. So, instead of doubling down on `MulAddVS2VI` path, I'd prefer to leave it aside and reimplement it later in a more maintainable manner. Hi @iwanowww , your closing comments addressed, kindly re-approve, the change is mainly in newly added test file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2490437406 From jbhateja at openjdk.org Thu Nov 21 09:08:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Nov 2024 09:08:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? Hi @Bhavana-Kilambi , As @PaulSandoz suggested, please file a follow-up PR on top of these changes with AARCH64 backend changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490445899 From galder at openjdk.org Thu Nov 21 09:51:24 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 21 Nov 2024 09:51:24 GMT Subject: Integrated: 8326369: Add test to verify bimorphic inlining happens after morphism changes In-Reply-To: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> References: <2RMk4ZDxheJK0fkQRZwJt2MJLq01oqCQzq21mV7vJpE=.5bc22b53-a747-4f76-9c78-81fe52e8eed6@github.com> Message-ID: On Wed, 6 Nov 2024 09:06:47 GMT, Galder Zamarre?o wrote: > This issue was fixed by JDK-8339299 indirectly (see [PR](https://github.com/openjdk/jdk/pull/20786)), but the PR for 8339299 didn't include any new tests, so I'm sending this PR with the test that Filipp added in 8326369 so that such issue doesn't come back inadvertently. I've added him as co-author and applied some small formatting changes. This pull request has now been integrated. Changeset: 5ccd5106 Author: Galder Zamarre?o Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/5ccd5106e023dbb47473e8914035c811e0cc6ee1 Stats: 115 lines in 1 file changed: 115 ins; 0 del; 0 mod 8326369: Add test to verify bimorphic inlining happens after morphism changes Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21920 From bkilambi at openjdk.org Thu Nov 21 09:54:24 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 09:54:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 09:05:23 GMT, Jatin Bhateja wrote: >> Hi, >> >> Better to update the copyright year to 2024 for the following modified files: >> >> >> src/hotspot/share/adlc/output_h.cpp >> src/hotspot/share/opto/connode.cpp >> src/hotspot/share/opto/connode.hpp >> src/hotspot/share/opto/constantTable.cpp >> src/hotspot/share/opto/divnode.cpp >> src/hotspot/share/opto/divnode.hpp >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java >> >> >> I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. >> If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. >> >> >> One or more @IR rules failed: >> >> Failed IR Rules (9) of Methods (9) ---------------------------------- >> 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" >> }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > >> Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? > > Hi @Bhavana-Kilambi , As @PaulSandoz suggested, please file a follow-up PR on top of these changes with AARCH64 backend changes. Hi @jatin-bhateja , I am resolving some errors on an aarch64 machine and if I have to raise a separate PR for aarch64, would you please remove all the aarch64 related IR checks until I have added the aarch64 backend? I might take some time to put the changes up for review. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490607729 From duke at openjdk.org Thu Nov 21 10:10:58 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 10:10:58 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v5] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Refactor print inline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/f4cad4ee..ba8bb63e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=03-04 Stats: 1010 lines in 14 files changed: 328 ins; 320 del; 362 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From mli at openjdk.org Thu Nov 21 10:19:25 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Nov 2024 10:19:25 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Thanks for the patch and comments in the code, it's much helpful. Looks good, just some minor comments. test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 184: > 182: @IR(counts = { IRNode.LOAD_VECTOR_L, ">=1", IRNode.STORE_VECTOR, ">=1" }, > 183: // This test fails with compact headers, but only with UseSSE<=3. > 184: applyIf = { "UseCompactObjectHeaders", "false" }, Is this removed intentiionally? or it should be `applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"},`? test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 196: > 194: @IR(counts = { IRNode.LOAD_VECTOR_L, ">=1", IRNode.STORE_VECTOR, ">=1" }, > 195: // This test fails with compact headers, but only with UseSSE<=3. > 196: applyIf = { "UseCompactObjectHeaders", "false" }, same as above comment. test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 111: > 109: // ---------------- Integer Extension ---------------- > 110: @Test > 111: @IR(failOn = {IRNode.STORE_VECTOR}) Not sure if it's necessary to fail on the Node, I'm not an expert, so just a question, not a review comment. Similar question for the below `failOn`s ------------- PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2450860349 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851736762 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851737491 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851755129 From epeter at openjdk.org Thu Nov 21 10:31:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 10:31:23 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 10:06:13 GMT, Hamlin Li wrote: >> This is a followup to: >> https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) >> >> @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. >> >> **The problem is this:** >> Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. >> >> -------------------------------- >> >> First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. >> >> ------------------- >> >> Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. >> >> I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. >> >> To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. >> >> In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 184: > >> 182: @IR(counts = { IRNode.LOAD_VECTOR_L, ">=1", IRNode.STORE_VECTOR, ">=1" }, >> 183: // This test fails with compact headers, but only with UseSSE<=3. >> 184: applyIf = { "UseCompactObjectHeaders", "false" }, > > Is this removed intentiionally? or it should be `applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"},`? Well, it turns out that this case always vectorizes. Did you see the comments below? It is a strange exception, because this test manually sets a large "base-to-payload-offset" of 64. > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 196: > >> 194: @IR(counts = { IRNode.LOAD_VECTOR_L, ">=1", IRNode.STORE_VECTOR, ">=1" }, >> 195: // This test fails with compact headers, but only with UseSSE<=3. >> 196: applyIf = { "UseCompactObjectHeaders", "false" }, > > same as above comment. And same response as above ;) > test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 111: > >> 109: // ---------------- Integer Extension ---------------- >> 110: @Test >> 111: @IR(failOn = {IRNode.STORE_VECTOR}) > > Not sure if it's necessary to fail on the Node, I'm not an expert, so just a question, not a review comment. Similar question for the below `failOn`s I think it makes sense to add a `failOn` here. We have an RFE that tracks this, and when it is implemented, we should expect this test to "fail", i.e. the vectorization succeeds. Then it would be nice if we can enable a positive IR rule. So I put in a negative now, so we don't forget to put in a positive later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851786764 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851787075 PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851783591 From rcastanedalo at openjdk.org Thu Nov 21 11:10:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 21 Nov 2024 11:10:18 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 10:26:38 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java line 111: >> >>> 109: // ---------------- Integer Extension ---------------- >>> 110: @Test >>> 111: @IR(failOn = {IRNode.STORE_VECTOR}) >> >> Not sure if it's necessary to fail on the Node, I'm not an expert, so just a question, not a review comment. Similar question for the below `failOn`s > > I think it makes sense to add a `failOn` here. We have an RFE that tracks this, and when it is implemented, we should expect this test to "fail", i.e. the vectorization succeeds. Then it would be nice if we can enable a positive IR rule. So I put in a negative now, so we don't forget to put in a positive later. @eme64 I agree with having the negative check but please make the intention clearer in the comment. E.g. "Subword vector casts do not work currently. Assert the vectorization failure so that we are reminded to update the test when this limitation is addressed in the future." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851851295 From mli at openjdk.org Thu Nov 21 11:17:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Nov 2024 11:17:20 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Thanks, looks good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2451059594 From mli at openjdk.org Thu Nov 21 11:17:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Nov 2024 11:17:21 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 10:28:39 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 184: >> >>> 182: @IR(counts = { IRNode.LOAD_VECTOR_L, ">=1", IRNode.STORE_VECTOR, ">=1" }, >>> 183: // This test fails with compact headers, but only with UseSSE<=3. >>> 184: applyIf = { "UseCompactObjectHeaders", "false" }, >> >> Is this removed intentiionally? or it should be `applyIfOr = {"UseCompactObjectHeaders", "false", "AlignVector", "false"},`? > > Well, it turns out that this case always vectorizes. Did you see the comments below? It is a strange exception, because this test manually sets a large "base-to-payload-offset" of 64. Ah, it makes sense to me, sorry I did not read the code below carefully. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851860890 From epeter at openjdk.org Thu Nov 21 11:17:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 11:17:22 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 11:07:49 GMT, Roberto Casta?eda Lozano wrote: >> I think it makes sense to add a `failOn` here. We have an RFE that tracks this, and when it is implemented, we should expect this test to "fail", i.e. the vectorization succeeds. Then it would be nice if we can enable a positive IR rule. So I put in a negative now, so we don't forget to put in a positive later. > > @eme64 I agree with having the negative check but please make the intention clearer in the comment. E.g. "Subword vector casts do not work currently. Assert the vectorization failure so that we are reminded to update the test when this limitation is addressed in the future." That is why I wrote this comment: `// Subword vector casts do not work currently, see JDK-8342095.` But I can extend the comment if you like. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851858597 From mli at openjdk.org Thu Nov 21 11:17:22 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 21 Nov 2024 11:17:22 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 11:12:29 GMT, Emanuel Peter wrote: >> @eme64 I agree with having the negative check but please make the intention clearer in the comment. E.g. "Subword vector casts do not work currently. Assert the vectorization failure so that we are reminded to update the test when this limitation is addressed in the future." > > That is why I wrote this comment: > `// Subword vector casts do not work currently, see JDK-8342095.` > But I can extend the comment if you like. I see, thanks for explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851859528 From epeter at openjdk.org Thu Nov 21 11:26:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 21 Nov 2024 11:26:59 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers [v2] In-Reply-To: References: Message-ID: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: make failOf comment explicit for Roberto ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22199/files - new: https://git.openjdk.org/jdk/pull/22199/files/b56c88cb..40f1858c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22199&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22199&range=00-01 Stats: 16 lines in 1 file changed: 16 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22199/head:pull/22199 PR: https://git.openjdk.org/jdk/pull/22199 From bkilambi at openjdk.org Thu Nov 21 11:55:24 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 11:55:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <28lnx2GvWiVFGMw9LSjjwMSeUPNjvqVGpVVQd_WluGI=.f50647e4-0895-462f-9d12-2050a3368088@github.com> On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/share/opto/convertnode.cpp line 260: > 258: in(1)->in(1)->Opcode() == Op_ConvHF2F && > 259: in(1)->in(2)->Opcode() == Op_ConvHF2F) { > 260: if (Matcher::match_rule_supported(in(1)->Opcode()) && Here `match_rule_supported()` is being called on floating point IR (AddHF etc) but it should be called on the half float IR (AddHF for ex). Maybe add another routine to return the opcode for half float IR and then check if it's supported? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1851914140 From rcastanedalo at openjdk.org Thu Nov 21 12:13:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 21 Nov 2024 12:13:19 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers [v2] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 11:13:12 GMT, Hamlin Li wrote: >> That is why I wrote this comment: >> `// Subword vector casts do not work currently, see JDK-8342095.` >> But I can extend the comment if you like. > > I see, thanks for explanation! Thanks for extending the comment, Emanuel! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22199#discussion_r1851942008 From duke at openjdk.org Thu Nov 21 12:49:02 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 12:49:02 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v6] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Allocate in comp arena ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/ba8bb63e..93f0caec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=04-05 Stats: 16 lines in 3 files changed: 4 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 21 13:03:45 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 13:03:45 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v7] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/93f0caec..3e767dcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=05-06 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 21 13:11:22 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 13:11:22 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 02:03:56 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test > > I don't see that print_inlining_append_late() with InliningResult::FAILURE is being tested. @dean-long I've refactored how print inlining works as the old cold was unmaintainable. This should also address the concerns you had about the patch previously. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2491104518 From duke at openjdk.org Thu Nov 21 13:24:38 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 13:24:38 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v8] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add precompiled header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/3e767dcf..ac7d6c2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 21 13:37:59 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 13:37:59 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v9] In-Reply-To: References: Message-ID: <2WUfu2iMtzrM8NDW5mYNhnLIKpEm44O5odCLm50ReoM=.16216e69-df30-463c-b9aa-5fa0be7fc270@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add another missing header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/ac7d6c2b..8b9bab9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=07-08 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From rehn at openjdk.org Thu Nov 21 14:15:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 21 Nov 2024 14:15:19 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Thu, 14 Nov 2024 14:05:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. >> >> Thanks >> >> ## Performance >> Tests on bananapi, for other platform, please check jbs issue for test data. >> >> ### Before >> data >> >> Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op >> >> >> >> >> ### After >> data >> >> Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test typo Thank you! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22102#pullrequestreview-2451592881 From bulasevich at openjdk.org Thu Nov 21 14:17:48 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Nov 2024 14:17:48 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache Message-ID: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): - nmethod_count:134000, total_compilation_time: 510460ms - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB Functional testing: jtreg on arm/aarch/x86. Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. ------------- Commit messages: - remove _code_end_offset - update jvm.hotspot.code.CodeBlob class - update: mutable data for all CodeBlobs with relocations - fix stat printout - 8343789: Move mutable nmethod data out of CodeCache Changes: https://git.openjdk.org/jdk/pull/21276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343789 Stats: 193 lines in 8 files changed: 102 ins; 41 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From kvn at openjdk.org Thu Nov 21 14:17:49 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 21 Nov 2024 14:17:49 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. First, thank you for doing his work. Main question is: why you did it only for `nmethod`? Second question: do you see any performance effects with this change? My concern is that we iterate relocation info data from different memory space to patch code. Note, with https://bugs.openjdk.org/browse/JDK-8334691 and other changes I moving into direction to make relocation info data immutable. It is already "immutable" in mainline JDK after https://bugs.openjdk.org/browse/JDK-8333819. But it is still mutable in Leyden because we have to patch indexes during publishing nmethod. My idea was to move relocation info data (which has big size) into `immutable` data section of nmethod. And leave mutable `_oops` and `_metadata` together with code since they are relatively small and we need to patch them together with code. Mutable sizes % do not add up: mutable data = 6071648 (9.396180%) relocation = 3437176 (12.846409%) oops = 239488 (0.895084%) metadata = 2394984 (8.951227%) src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5273: > 5271: } else { > 5272: address dummy = address(uintptr_t(pc()) & -wordSize); // A nearby aligned address > 5273: mov(dst, Address(dummy, rspec)); Why this is needed? src/hotspot/share/code/codeBlob.cpp line 85: > 83: CodeBlob::CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, > 84: int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, > 85: bool external_mutable_data) : I suggest to add `assert(!external_mutable_data || (kind == CodeBlobKind::Nmethod)` src/hotspot/share/code/codeBlob.hpp line 129: > 127: address _mutable_data; > 128: int _mutable_data_size; > 129: Should we add special CodeBlob subclass for nmethod to avoid increase of size for all blobs and stubs? src/hotspot/share/runtime/vmStructs.cpp line 596: > 594: nonstatic_field(nmethod, _immutable_data_size, int) \ > 595: nonstatic_field(nmethod, _mutable_data, address) \ > 596: nonstatic_field(nmethod, _mutable_data_size, int) \ They are filed in CodeBlob and not in nmethod. ------------- PR Review: https://git.openjdk.org/jdk/pull/21276#pullrequestreview-2422325042 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2463441925 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2463443197 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1833476281 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1833478853 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1833482205 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1833483559 From bulasevich at openjdk.org Thu Nov 21 14:17:49 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Nov 2024 14:17:49 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. @vnkozlov Hi Vladimir, What do you think about the idea of ??moving relocInfo data out of nmethod additionally to recent [Move immutable nmethod data from CodeCache](https://github.com/openjdk/jdk/pull/18984)? It would reduce the CodeHeap fill by 5%. Performance update. On an aarch machine the CodeCacheStress benchmark shows a 1-2% performance improvement with this change, Statistics on the CodeCacheStress benchmark with high numberOfClasses-instanceCount-rangeOfClasses parameter values: - nmethod_count:134000, total_compilation_time: 510460ms - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB -XX:+PrintNMethodStatistics Statistics for 21032 bytecoded nmethods for C1: total size = 120722408 (100%) in CodeCache = 79358808 (65.736603%) header = 5215936 (6.572599%) constants = 320 (0.000403%) main code = 69017912 (86.969444%) stub code = 5124640 (6.457557%) mutable data = 10488856 (8.688409%) relocation = 6573064 (62.667118%) oops = 515680 (4.916456%) metadata = 3400112 (32.416424%) immutable data = 30874744 (25.574991%) dependencies = 636240 (2.060714%) nul chk table = 756920 (2.451583%) handler table = 180456 (0.584478%) scopes pcs = 16052608 (51.992683%) scopes data = 13248520 (42.910542%) Statistics for 8171 bytecoded nmethods for C2: total size = 64948664 (100%) in CodeCache = 25580504 (39.385727%) header = 2026408 (7.921689%) constants = 448 (0.001751%) main code = 20925472 (81.802422%) stub code = 2628176 (10.274137%) mutable data = 6572064 (10.118859%) relocation = 3406216 (51.828709%) oops = 305912 (4.654733%) metadata = 2859936 (43.516556%) immutable data = 32796096 (50.495411%) dependencies = 926992 (2.826532%) nul chk table = 537024 (1.637463%) handler table = 1695568 (5.170030%) scopes pcs = 15451968 (47.115265%) scopes data = 14184544 (43.250710%) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2391711138 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2476420926 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2491100267 From bulasevich at openjdk.org Thu Nov 21 14:17:49 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 21 Nov 2024 14:17:49 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 7 Nov 2024 23:18:03 GMT, Vladimir Kozlov wrote: > Main question is: why you did it only for nmethod? Yes. I did it symmetrically to a separate immutable data storage for nmethod. Now I see I do not like implementation where for some blobs relocation info is local, but for other it stays aside. I am going to rework that. > any performance effects with this change? On the aarch machine I see a slight improvement on the big benchmark caused by the code sparsity improvement. Though I need to do more benchmarking to make sure I am not making things worse for others. Benchmark Mode Cnt Score Error Units | Benchmark Mode Cnt Score Error Units JmhDotty.runOperation ss 999 861.717 ? 1.543 ms/op | JmhDotty.runOperation ss 999 840.959 ? 1.473 ms/op | 34555411781 cache-misses:u | 34343012187 cache-misses:u 2913869717708 cpu-cycles:u | 2863838151745 cpu-cycles:u 4185324759051 instructions:u | 4209616523046 instructions:u 1460914744576 L1-icache-loads:u | 1452066316397 L1-icache-loads:u 97806845375 L1-icache-load-misses:u | 93815390496 L1-icache-load-misses:u 1191854820746 iTLB-loads:u | 1169231847276 iTLB-loads:u 10591067761 iTLB-load-misses:u | 10134696419 iTLB-load-misses:u 838964735227 branch-loads:u | 838353168582 branch-loads:u 25829615231 branch-load-misses:u | 24361474411 branch-load-misses:u 836291984964 br_pred:u | 838153583659 br_pred:u 25733552818 br_mis_pred:u | 24353396612 br_mis_pred:u 562168308 group0-code_sparsity:u | 449848707 group0-code_sparsity:u > Mutable sizes % do not add up: Thanks. The correct sizes: Statistics for 21032 bytecoded nmethods for C1: mutable data = 10488856 (8.688409%) relocation = 6573064 (62.667118%) oops = 515680 (4.916456%) metadata = 3400112 (32.416424%) Statistics for 8171 bytecoded nmethods for C2: mutable data = 6572064 (10.118859%) relocation = 3406216 (51.828709%) oops = 305912 (4.654733%) metadata = 2859936 (43.516556%) > My idea was to move relocation info data (which has big size) into `immutable` data section of nmethod. And leave mutable `_oops` and `_metadata` together with code since they are relatively small and we need to patch them together with code. Hmm. If relocation info goes to an immutable blob, oops+metadata hardly deserves a separate blob. > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5273: > >> 5271: } else { >> 5272: address dummy = address(uintptr_t(pc()) & -wordSize); // A nearby aligned address >> 5273: mov(dst, Address(dummy, rspec)); > > Why this is needed? - it is not a load from a Constant Pool, so calling ldr_constant here is seems incorrect - the ldr_constant function utilizes either ldr (with a range limit of ?1MB) or, when -XX:-NearCpool is enabled, adrp (range limit of ?2GB) followed by ldr ? both of which may fall short when mutable data is allocated on the C heap. > src/hotspot/share/code/codeBlob.hpp line 129: > >> 127: address _mutable_data; >> 128: int _mutable_data_size; >> 129: > > Should we add special CodeBlob subclass for nmethod to avoid increase of size for all blobs and stubs? I am not sure. All CodeBlobs with relocation info needs a mutable data. Let me know if you think it must be a separate subclass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2466182799 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2466183939 PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2466185286 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1835344697 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1852207740 From duke at openjdk.org Thu Nov 21 14:26:56 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 14:26:56 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v10] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Undo accidental style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/8b9bab9b..5620e114 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=08-09 Stats: 164 lines in 1 file changed: 8 ins; 35 del; 121 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Thu Nov 21 14:49:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 21 Nov 2024 14:49:42 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v11] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix more style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/5620e114..4fc57670 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=09-10 Stats: 29 lines in 2 files changed: 3 ins; 11 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From simonis at openjdk.org Thu Nov 21 16:40:26 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 21 Nov 2024 16:40:26 GMT Subject: RFR: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Message-ID: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). ------------- Commit messages: - 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Changes: https://git.openjdk.org/jdk/pull/22295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344727 Stats: 19 lines in 3 files changed: 19 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22295/head:pull/22295 PR: https://git.openjdk.org/jdk/pull/22295 From lmesnik at openjdk.org Thu Nov 21 16:50:20 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 21 Nov 2024 16:50:20 GMT Subject: RFR: 8344533: CTW: Add option to remove clinits before loading In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 10:50:48 GMT, Evgeny Nikitin wrote: > This PR adds an option-controlled (off by default) removal of methods before loading them with CTW ClassLoader. > The main purpose is to prevent `static { ... }` blocks execution (along with static fields initialization). > Testing: manual CTW runs. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22235#pullrequestreview-2452075866 From vlivanov at openjdk.org Thu Nov 21 18:12:25 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 21 Nov 2024 18:12:25 GMT Subject: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v6] In-Reply-To: <08UceifyOK3Ybhgtsa0PoYYLUXfNNh0yr4cMRd6EsZQ=.6d7c8ab6-b78c-463e-8be0-dcd14a443528@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> <08UceifyOK3Ybhgtsa0PoYYLUXfNNh0yr4cMRd6EsZQ=.6d7c8ab6-b78c-463e-8be0-dcd14a443528@github.com> Message-ID: <342FtKSYnO0pDLHx3TOInsQS2PnqrP_KdkD2p2v3Pvc=.e30f2f1f-f40e-4572-8889-9a95ac55698b@github.com> On Thu, 21 Nov 2024 09:06:18 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. >> >> >> MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) >> MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) >> MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) >> MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) >> MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) >> MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) >> >> >> >> A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. >> >> If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. >> >> VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. >> >> Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- >> >> >> Sierra Forest :- >> ============ >> Baseline:- >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms >> VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms >> >> With Optimizati... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments + extending IR tests with instruction level checks Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21244#pullrequestreview-2452281311 From jbhateja at openjdk.org Thu Nov 21 18:16:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Nov 2024 18:16:32 GMT Subject: Integrated: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction In-Reply-To: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> References: <9ce1Y2QVr-uGEPquCA1wytF7Sn4px-wQx5tuUQYQNb8=.253ecb32-0976-42ba-bfaa-1903168fdfe6@github.com> Message-ID: On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ instruction for following IR pallets. > > > MulVL ( AndV SRC1, 0xFFFFFFFF) ( AndV SRC2, 0xFFFFFFFF) > MulVL (URShiftVL SRC1 , 32) (URShiftVL SRC2, 32) > MulVL (URShiftVL SRC1 , 32) ( AndV SRC2, 0xFFFFFFFF) > MulVL ( AndV SRC1, 0xFFFFFFFF) (URShiftVL SRC2 , 32) > MulVL (VectorCastI2X SRC1) (VectorCastI2X SRC2) > MulVL (RShiftVL SRC1 , 32) (RShiftVL SRC2, 32) > > > > A 64x64 bit multiplication produces 128 bit result, and can be performed by individually multiplying upper and lower double word of multiplier with multiplicand and assembling the partial products to compute full width result. Targets supporting vector quadword multiplication have separate instructions to compute upper and lower quadwords for 128 bit result. Therefore existing VectorAPI multiplication operator expects shape conformance between source and result vectors. > > If upper 32 bits of quadword multiplier and multiplicand is always set to zero then result of multiplication is only dependent on the partial product of their lower double words and can be performed using unsigned 32 bit multiplication instruction with quadword saturation. Patch matches this pattern in a target dependent manner without introducing new IR node. > > VPMUL[U]DQ instruction performs [unsigned] multiplication between even numbered doubleword lanes of two long vectors and produces 64 bit result. It has much lower latency compared to full 64 bit multiplication instruction "VPMULLQ", in addition non-AVX512DQ targets does not support direct quadword multiplication, thus we can save redundant partial product for zeroed out upper 32 bits. This results into throughput improvements on both P and E core Xeons. > > Please find below the performance of [XXH3 hashing benchmark ](https://mail.openjdk.org/pipermail/panama-dev/2024-July/020557.html)included with the patch:- > > > Sierra Forest :- > ============ > Baseline:- > Benchmark (SIZE) Mode Cnt Score Error Units > VectorXXH3HashingBenchmark.hashingKernel 1024 thrpt 2 806.228 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 2048 thrpt 2 403.044 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 4096 thrpt 2 200.641 ops/ms > VectorXXH3HashingBenchmark.hashingKernel 8192 thrpt 2 100.664 ops/ms > > With Optimization:- > Benchmark (SIZE) Mode ... This pull request has now been integrated. Changeset: dc9a6ef6 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/dc9a6ef6100d73a431cd0cfa2c252acf7743f8a3 Stats: 544 lines in 7 files changed: 543 ins; 0 del; 1 mod 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction Co-authored-by: Vladimir Ivanov Reviewed-by: vlivanov, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/21244 From dnsimon at openjdk.org Thu Nov 21 18:50:16 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Nov 2024 18:50:16 GMT Subject: RFR: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control In-Reply-To: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> References: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Message-ID: On Thu, 21 Nov 2024 16:34:12 GMT, Volker Simonis wrote: > Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. > > Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. > > When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). > > The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). Looks good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22295#pullrequestreview-2452356239 From jsjolen at openjdk.org Thu Nov 21 19:12:21 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 21 Nov 2024 19:12:21 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v11] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 14:49:42 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix more style issues Hi Theo, I've got some style comments, but more importantly I think that there are some bugs in the allocation scheme. The `InlinePrinter` object is stored inside of `Compile`, so it could dictate the lifetime of its objects by having its own `Arena`, it doesn't have to share `_comp_arena`. That's up to you, I'm just stating that that is a design possibility. Let's look at the real bug, starting in `IPInlineAttempt`. ```c++ struct IPInlineAttempt : public ArenaObj { IPInlineAttempt(InliningResult result); const InliningResult result; stringStream msg; }; It's a bug that this compiles. `stringStream` is an object with a non-trivial destructor which heap allocates. Arena allocated objects never have their destructor run, so each `IPInlineAttempt` that is allocated may potentially leak memory (depends on whether the string grows larger than the small buffer pre-allocated for it or not). I should file a ticket regarding this. The fix for that is to separate out the `stringStream`s and `CHeap` allocate them. Here's an idea: ```c++ class InlinePrinter { GrowableArray _streams; using StreamIndex = int; struct IPInlineAttempt : public ArenaObj { const InliningResult result; const StreamIndex stream; }; stringStream& stream_of(IPInlineAttempt& a) { return _streams.at(a.stream); } }; This is slightly different, since a resizing of the backing GA will invalidate the pointer to the stream. If you don't want that, then you need a different container that we don't have. That'd be a nice addition, actually. I looked at your usages, seems like it's fine that we don't have address stable streams. Am I wrong? With that change `IPInlineAttempt`s are just values and so you can simplify `IPInlineSite`: ```c++ class IPInlineSite : public ArenaObj { GrowableArray _attempts; GrowableArray _children; }; That's nice. No more suspicions of where that `IPInlineAttempt` resides or whether it can be changed. Anyway, there are probably more and maybe more elegant ways of solving the memory leak problem. I've asked another developer to have a look at this and to confirm whether I'm right or not on this matter. Cheers! src/hotspot/share/opto/printinlining.cpp line 34: > 32: } > 33: > 34: InlinePrinter::IPInlineAttempt::IPInlineAttempt(InliningResult result) : result(result) { Hm, here the `msg` isn't explicitly initialized. Does that leave it uninitialized? src/hotspot/share/opto/printinlining.cpp line 41: > 39: return &_nullStream; > 40: } > 41: auto attempt = locate_call(state, method)->add(result); Style: Expand the `auto` into actual types. We typically only use `auto` for lambda functions. src/hotspot/share/opto/printinlining.cpp line 45: > 43: attempt->msg.print("%s", msg); > 44: } > 45: return &attempt->msg; // IPInlineAttempts are heap allocated so this address is safe Surely arena allocated? src/hotspot/share/opto/printinlining.cpp line 61: > 59: > 60: return locate_call(state->caller(), nullptr)->at_bci(state->bci(), create_for); > 61: } Can we be on the safe side and convert this into an iterative process instead, so that we don't have to worry about stack usage? src/hotspot/share/opto/printinlining.hpp line 51: > 49: * @returns An output stream which stores the message associated with this attempt. The buffer stays valid until InlinePrinter is deallocated. > 50: * You can print arbitrary information to this stream but do not add line breaks, as this will break formatting. > 51: */ Style: We typically don't use `@param`, `@returns` in Hotspot. Consider this a nit, I'm not familiar enough with C2 codebase to know whether this adheres to C2 style. src/hotspot/share/opto/printinlining.hpp line 57: > 55: * Prints all collected inlining information to the given output stream. > 56: */ > 57: void dump(outputStream* tty); Style: Typically called `print_on` in Hotspot src/hotspot/share/opto/printinlining.hpp line 95: > 93: ciMethod* const _method; > 94: GrowableArray _attempts; > 95: GrowableArray _children; Style: Private members are placed at the start in HotSpot. ------------- PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2452174604 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852602107 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852582978 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852584341 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852679781 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852569011 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852567039 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1852567843 From dhanalla at openjdk.org Thu Nov 21 21:24:37 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 21 Nov 2024 21:24:37 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v2] In-Reply-To: References: Message-ID: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) > > JMH results: > with disabled RAM > > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ? 0.248 ms/op > NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ? 4.306 ms/op > NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ? 1.781 ms/op > NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ? 0.001 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ? 0.731 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ? 0.443 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ? 1.212 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ? 1.478 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ? 0.306 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ? 0.247 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ? 4.869 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ? 48.797 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ? 2.307 ms/op > NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ? 0.308 ms/op > > with enabled RAM > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ? 0.423 ms/op > NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ? 3.477 ms/op > NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ? 1.916 ms/op > NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ? 0.001 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt 15 18.309 ? 0.706 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiProces... Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: CR feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21270/files - new: https://git.openjdk.org/jdk/pull/21270/files/a2098004..811232d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21270/head:pull/21270 PR: https://git.openjdk.org/jdk/pull/21270 From dlong at openjdk.org Fri Nov 22 00:35:18 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 00:35:18 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. It would be nice to make relocations immutable, but one roadblock is the use of relocInfo::change_reloc_info_for_address() by C1 patching. We would need to separate mutable and immutable relocations, or replace C1 patching with deoptimization, like on DEOPTIMIZE_WHEN_PATCHING aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2492630626 From dlong at openjdk.org Fri Nov 22 02:23:29 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 02:23:29 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Sat, 9 Nov 2024 11:35:23 GMT, Boris Ulasevich wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5273: >> >>> 5271: } else { >>> 5272: address dummy = address(uintptr_t(pc()) & -wordSize); // A nearby aligned address >>> 5273: mov(dst, Address(dummy, rspec)); >> >> Why this is needed? > > - it is not a load from a Constant Pool, so calling ldr_constant here is seems incorrect > - the ldr_constant function utilizes either ldr (with a range limit of ?1MB) or, when -XX:-NearCpool is enabled, adrp (range limit of ?2GB) followed by ldr ? both of which may fall short when mutable data is allocated on the C heap. This change looks wrong, for a number of reasons. First, the dummy address would no longer be needed, and we could just use the same mov as the supports_instruction_patching() case. However, if supports_instruction_patching() is false, I think we are not allowed to generate a multi-instruction movz/movk sequence. We really need something like ldr_constant for this case, so that we load from memory. However, as you point out, this is tied to NearCpool. For a far metadata slot access, ADR+LDR is the right answer. After this change, will there be any metadata left that could still benefit from NearCpool? If not, then it might make sense to turn it off permanently. Instead of choosing between PC-relative "ldr literal" and far ADR+LDR based on NearCpool, we could decide based on the distance to the metadata table. I believe "ldr literal" only has a 1MB range. CC @theRealAph ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853197113 From dlong at openjdk.org Fri Nov 22 02:46:18 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 02:46:18 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. src/hotspot/share/code/codeBlob.cpp line 103: > 101: // The mutable_data_size is either calculated by the nmethod constructor to account > 102: // for reloc_info and additional data, or it is set here to accommodate only the relocation data. > 103: _mutable_data_size = (mutable_data_size == 0) ? cb->total_relocation_size() : mutable_data_size; This seems strange to treat relocations as special. Wouldn't it be better to have the caller always pass in the correct value? src/hotspot/share/code/codeBlob.hpp line 108: > 106: > 107: int _size; // total size of CodeBlob in bytes > 108: int _relocation_size; // size of relocation (could be bigger than 64Kb) For offsets into the external mutable/immutable data, we could reduce codecache footprint further by moving these into a a header section of the external data block. That also allows those blocks to be self-describing, which could help with error reporting or debugging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853209930 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853211488 From dlong at openjdk.org Fri Nov 22 02:53:24 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 02:53:24 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> Message-ID: On Fri, 22 Nov 2024 02:40:38 GMT, Dean Long wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > src/hotspot/share/code/codeBlob.cpp line 103: > >> 101: // The mutable_data_size is either calculated by the nmethod constructor to account >> 102: // for reloc_info and additional data, or it is set here to accommodate only the relocation data. >> 103: _mutable_data_size = (mutable_data_size == 0) ? cb->total_relocation_size() : mutable_data_size; > > This seems strange to treat relocations as special. Wouldn't it be better to have the caller always pass in the correct value? Or compute using something like required_mutable_data_space()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853215761 From dlong at openjdk.org Fri Nov 22 02:53:25 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 02:53:25 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. src/hotspot/share/code/codeBlob.hpp line 135: > 133: CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, > 134: int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, > 135: int mutable_data_size = 0); If we want to allow the default for mutable data size to be the relocations size, then instead of using = 0 here, you could do this instead: CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, int mutable_data_size); CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments) : CodeBlob(name, kind, cb, size, header_size, frame_complete_offset, frame_size, oop_maps, caller_must_gc_arguments, cb->total_relocation_size) { } but I would prefer not to treat relocations as special, and have the caller always pass the correct value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853214675 From dlong at openjdk.org Fri Nov 22 02:57:14 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 02:57:14 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Tue, 1 Oct 2024 02:10:37 GMT, Boris Ulasevich wrote: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. src/hotspot/share/code/nmethod.cpp line 2152: > 2150: delete[] _compiled_ic_data; > 2151: > 2152: if (_immutable_data != blob_end()) { Is this just a name change, or a semantic change? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1853217615 From dlong at openjdk.org Fri Nov 22 04:31:16 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 04:31:16 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v3] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 02:03:56 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test > > I don't see that print_inlining_append_late() with InliningResult::FAILURE is being tested. > @dean-long I've refactored how print inlining works as the old code was unmaintainable. This should also address the concerns you had about the patch previously. Wow, I wasn't expecting you to address that in this PR, so you've gone above and beyond! Let me see if this also fixes the assert problem I was running into with the old implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2492852249 From amitkumar at openjdk.org Fri Nov 22 05:12:55 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Nov 2024 05:12:55 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v5] In-Reply-To: References: Message-ID: > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: - reduce diff size - arm changes - aarch64 changes - s390x changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/1823bfc1..da65683f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=03-04 Stats: 31 lines in 3 files changed: 2 ins; 4 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From amitkumar at openjdk.org Fri Nov 22 05:15:16 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Nov 2024 05:15:16 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v4] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 23:29:25 GMT, Martin Doerr wrote: > I have looked further into the PPC64 code and figured out that `load_nonconstant()` loads all values which are not simm16 into a register: https://github.com/openjdk/jdk/blob/b9bf447209db5d7f6bb16a0310421dbe4170500c/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp#L487C11-L487C29 > > https://github.com/openjdk/jdk/blob/b9bf447209db5d7f6bb16a0310421dbe4170500c/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp#L59 > > only accepts simm16 values. > So, `strength_reduce_multiply` will never get a value which overflows on PPC64. That's the reason why PPC64 is not affected by the UB bug. > In your comment above, I guess you missed that `is_power_of_2((juint)c + 1)` returns true for c = MAX_INT. So, the optimization can be used for multiplications with MAX_INT if the UB is fixed. > > I also think that having a more similar solution for all platforms would be nice. For PPC64 and some other platforms, this may only be a cleanup, not a bug fix. > > In addition, the title is no longer up to date. You're changing more than s390 code. If you prefer to fix only UB on the affected platforms, this will also be fine with me. Thanks @TheRealMDoerr for the suggestions. Yes I missed `INT_MAX+1` case. But my thoughts were that `INT_MAX+1` will go beyond the integer range so need to worry about that. I have updated title, PR description and did similar changes for aarch, arm also. Please have a look again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2492889539 From duke at openjdk.org Fri Nov 22 07:10:16 2024 From: duke at openjdk.org (duke) Date: Fri, 22 Nov 2024 07:10:16 GMT Subject: RFR: 8344533: CTW: Add option to remove clinits before loading In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 10:50:48 GMT, Evgeny Nikitin wrote: > This PR adds an option-controlled (off by default) removal of methods before loading them with CTW ClassLoader. > The main purpose is to prevent `static { ... }` blocks execution (along with static fields initialization). > Testing: manual CTW runs. @lepestock Your change (at version d9a3d136b9216efcc9315cebaf3204fb523d9941) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22235#issuecomment-2493018572 From duke at openjdk.org Fri Nov 22 08:21:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 22 Nov 2024 08:21:23 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v11] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 17:48:34 GMT, Johan Sj?len wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix more style issues > > src/hotspot/share/opto/printinlining.cpp line 34: > >> 32: } >> 33: >> 34: InlinePrinter::IPInlineAttempt::IPInlineAttempt(InliningResult result) : result(result) { > > Hm, here the `msg` isn't explicitly initialized. Does that leave it uninitialized? That's fine. It's default constructor will be called automatically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1853461111 From duke at openjdk.org Fri Nov 22 08:44:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 22 Nov 2024 08:44:26 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v11] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 17:24:43 GMT, Johan Sj?len wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix more style issues > > src/hotspot/share/opto/printinlining.hpp line 51: > >> 49: * @returns An output stream which stores the message associated with this attempt. The buffer stays valid until InlinePrinter is deallocated. >> 50: * You can print arbitrary information to this stream but do not add line breaks, as this will break formatting. >> 51: */ > > Style: We typically don't use `@param`, `@returns` in Hotspot. > > Consider this a nit, I'm not familiar enough with C2 codebase to know whether this adheres to C2 style. I will change this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1853503849 From duke at openjdk.org Fri Nov 22 08:50:04 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 22 Nov 2024 08:50:04 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v12] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Style changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/4fc57670..27251926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=10-11 Stats: 82 lines in 3 files changed: 27 ins; 39 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From thartmann at openjdk.org Fri Nov 22 10:35:58 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 22 Nov 2024 10:35:58 GMT Subject: RFR: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 Message-ID: [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. Thanks, Tobias ------------- Commit messages: - 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 Changes: https://git.openjdk.org/jdk/pull/22317/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22317&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344844 Stats: 23 lines in 3 files changed: 19 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22317.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22317/head:pull/22317 PR: https://git.openjdk.org/jdk/pull/22317 From jbhateja at openjdk.org Fri Nov 22 10:36:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 22 Nov 2024 10:36:10 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Testpoints for new value transforms + code cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21490/files - new: https://git.openjdk.org/jdk/pull/21490/files/132878ba..5f58eea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=00-01 Stats: 279 lines in 20 files changed: 140 ins; 64 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From aph at openjdk.org Fri Nov 22 10:37:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 22 Nov 2024 10:37:31 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v5] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 05:12:55 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reduce diff size > - arm changes > - aarch64 changes > - s390x changes src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 282: > 280: > 281: if (is_power_of_2((juint)c - 1)) { > 282: __ shift_left(left, exact_log2((juint)c - 1), tmp); Please convert `c` to unsigned at the beginning. Then do not use `c` any more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1853702699 From epeter at openjdk.org Fri Nov 22 11:04:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Nov 2024 11:04:18 GMT Subject: RFR: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:30:38 GMT, Tobias Hartmann wrote: > [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22317#pullrequestreview-2454149516 From thartmann at openjdk.org Fri Nov 22 11:10:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 22 Nov 2024 11:10:18 GMT Subject: RFR: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:30:38 GMT, Tobias Hartmann wrote: > [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. > > Thanks, > Tobias Thanks for the review Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22317#issuecomment-2493501317 From enikitin at openjdk.org Fri Nov 22 11:16:25 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 22 Nov 2024 11:16:25 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional Message-ID: For CTW, zero classes in provided jar is now a failure. This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. This PR makes this behaviour controllable. Default reaction is a failure, like before. ------------- Commit messages: - 8344833: CTW: Make failing on zero classes optional Changes: https://git.openjdk.org/jdk/pull/22320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22320&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344833 Stats: 8 lines in 1 file changed: 7 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22320/head:pull/22320 PR: https://git.openjdk.org/jdk/pull/22320 From epeter at openjdk.org Fri Nov 22 11:23:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Nov 2024 11:23:20 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v2] In-Reply-To: References: Message-ID: <6hDh2EszZ8wGi-KDaBVxm-Z4XBfIJNjBnyfbDUBfixM=.bf6488bf-313a-4081-a348-f3f67f209dce@github.com> On Wed, 20 Nov 2024 11:51:32 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Tobias Hartmann src/hotspot/share/opto/node.hpp line 2139: > 2137: > 2138: public: > 2139: explicit DataNodeBFS(BFSActions& bfs_action) : _bfs_actions(bfs_action) {} Is this restricted to data-nodes? If so, you should verify that the `start_node` is a data node. But we could also generalize this to any BFS, and then check in `should_visit` if it is a data node of CFG. You should also say that it traverses inputs/def, not outputs/uses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1853759705 From rcastanedalo at openjdk.org Fri Nov 22 11:29:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 22 Nov 2024 11:29:15 GMT Subject: RFR: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 In-Reply-To: References: Message-ID: <2rVhAJ6tuZKna2NGyuowxcbV-pJLUsR5hxHGmrUacxM=.44ba0bdb-d765-4471-925d-532dff945935@github.com> On Fri, 22 Nov 2024 10:30:38 GMT, Tobias Hartmann wrote: > [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22317#pullrequestreview-2454195571 From epeter at openjdk.org Fri Nov 22 11:31:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 22 Nov 2024 11:31:19 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v2] In-Reply-To: References: Message-ID: <3f4IZTmMDrAsKA88KnyIXg8HtjoDghu72ZrYwDNds9g=.d8564094-eedd-43e7-bdd9-4716e3fef65b@github.com> On Wed, 20 Nov 2024 11:51:32 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Tobias Hartmann Looks reasonable :) src/hotspot/share/opto/predicates.cpp line 911: > 909: IfTrueNode* initialized_predicate = initialized_assertion_predicate.create_from_template(template_head,_new_control, > 910: _init, _stride); > 911: DEBUG_ONLY(InitializedAssertionPredicate::verify(initialized_predicate);) Suggestion: InitializedAssertionPredicateCreator initialized_assertion_predicate_creator(_phase); IfTrueNode* initialized_predicate = initialized_assertion_predicate_creator.create_from_template(template_head,_new_control, _init, _stride); DEBUG_ONLY(InitializedAssertionPredicate::verify(initialized_predicate);) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22136#pullrequestreview-2454198147 PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1853766435 From thartmann at openjdk.org Fri Nov 22 11:40:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 22 Nov 2024 11:40:22 GMT Subject: RFR: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:30:38 GMT, Tobias Hartmann wrote: > [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. > > Thanks, > Tobias Thanks Roberto. I'll push this to get the CI back to a clean state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22317#issuecomment-2493550662 From thartmann at openjdk.org Fri Nov 22 11:40:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 22 Nov 2024 11:40:23 GMT Subject: Integrated: 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:30:38 GMT, Tobias Hartmann wrote: > [JDK-8341553](https://bugs.openjdk.org/browse/JDK-8341553) disabled CDS for `-XX:+UseCompactObjectHeaders`. It's a known issue that the ciReplay tests don't work well if CDS is disabled, see [JDK-8316526](https://bugs.openjdk.org/browse/JDK-8316526), so I'll exclude the tests from running when CDS is disabled for now. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 847f65c1 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/847f65c14a8fea3d5e2ee9d920c458b8923da3b4 Stats: 23 lines in 3 files changed: 19 ins; 0 del; 4 mod 8344844: ciReplay tests fail with -XX:+UseCompactObjectHeaders because CDS is disabled since JDK-8341553 Reviewed-by: epeter, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22317 From mdoerr at openjdk.org Fri Nov 22 11:57:16 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 22 Nov 2024 11:57:16 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v5] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 05:12:55 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with four additional commits since the last revision: > > - reduce diff size > - arm changes > - aarch64 changes > - s390x changes Looks correct. Doing the cast only once may be better. I have seen a C2 test which covers such cases: test/hotspot/jtreg/compiler/c2/TestSerialAdditions.java Are we testing these cases with C1, too? src/hotspot/cpu/arm/c1_LIRGenerator_arm.cpp line 341: > 339: __ add(left, LIR_OprFact::address(addr), result); // add with shifted register > 340: return true; > 341: } else if(c == -1) { Missing whitespace before "(". src/hotspot/cpu/s390/c1_LIRGenerator_s390.cpp line 244: > 242: } > 243: > 244: if(c == -1) { Missing whitespace before "(". ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22144#pullrequestreview-2454176032 PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1853754521 PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1853754012 From duke at openjdk.org Fri Nov 22 12:11:43 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 22 Nov 2024 12:11:43 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v13] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix BCI -1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/27251926..f00aa85e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=11-12 Stats: 25 lines in 2 files changed: 4 ins; 3 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From dnsimon at openjdk.org Fri Nov 22 14:01:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 22 Nov 2024 14:01:46 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails Message-ID: This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. ------------- Commit messages: - mitigate against interleaved output Changes: https://git.openjdk.org/jdk/pull/22323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344628 Stats: 19 lines in 1 file changed: 13 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22323.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22323/head:pull/22323 PR: https://git.openjdk.org/jdk/pull/22323 From dnsimon at openjdk.org Fri Nov 22 14:01:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 22 Nov 2024 14:01:46 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: <0rpKoHwjO0GUzHk-dX_CzZvlW6JklHIuaf3d2NruAIU=.90b39a12-2d84-4e85-ab13-1ac1578003bf@github.com> On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. @sendaoYan can you please test this fix in your setup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22323#issuecomment-2493832221 From duke at openjdk.org Fri Nov 22 14:34:43 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 22 Nov 2024 14:34:43 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: Message-ID: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix TestDuplicatedLateInliningOutput ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/f00aa85e..99c5cbc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=12-13 Stats: 16 lines in 4 files changed: 10 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From syan at openjdk.org Fri Nov 22 15:28:14 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 22 Nov 2024 15:28:14 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: <0rpKoHwjO0GUzHk-dX_CzZvlW6JklHIuaf3d2NruAIU=.90b39a12-2d84-4e85-ab13-1ac1578003bf@github.com> References: <0rpKoHwjO0GUzHk-dX_CzZvlW6JklHIuaf3d2NruAIU=.90b39a12-2d84-4e85-ab13-1ac1578003bf@github.com> Message-ID: On Fri, 22 Nov 2024 13:58:54 GMT, Doug Simon wrote: > @sendaoYan can you please test this fix in your setup. Okey, wait a moment ------------- PR Comment: https://git.openjdk.org/jdk/pull/22323#issuecomment-2494018107 From amitkumar at openjdk.org Fri Nov 22 16:50:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Nov 2024 16:50:39 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v6] In-Reply-To: References: Message-ID: > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from Andrew & Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/da65683f..38428857 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=04-05 Stats: 26 lines in 4 files changed: 3 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From amitkumar at openjdk.org Fri Nov 22 16:53:34 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 22 Nov 2024 16:53:34 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: Message-ID: > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: revert to c for -1 check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/38428857..cad8a9bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=05-06 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From szaldana at openjdk.org Fri Nov 22 19:31:35 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 22 Nov 2024 19:31:35 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose Message-ID: Hi folks, This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. Cheers, Sonia ------------- Commit messages: - 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose Changes: https://git.openjdk.org/jdk/pull/22331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344013 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22331/head:pull/22331 PR: https://git.openjdk.org/jdk/pull/22331 From dlong at openjdk.org Fri Nov 22 20:56:14 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 22 Nov 2024 20:56:14 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. test/hotspot/jtreg/compiler/jvmci/TestEnableJVMCIProduct.java line 106: > 104: } > 105: } else if (flag.equals("-XX:+UseGraalJIT")) { > 106: output.shouldContain("jvmci.Compiler=graal"); Does changing shouldContain() to stdoutShouldContain() also solve the problem? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22323#discussion_r1854647261 From syan at openjdk.org Fri Nov 22 23:45:15 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 22 Nov 2024 23:45:15 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. After apply this patch and run the test with virtual thread 50k times all passed. ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/22323#pullrequestreview-2456019044 From dlong at openjdk.org Sat Nov 23 00:16:20 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 00:16:20 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput src/hotspot/share/opto/compile.cpp line 675: > 673: _oom(false), > 674: _replay_inline_data(nullptr), > 675: _inline_printer(comp_arena(), directive->PrintInliningOption || PrintOptoInlining), How do we support print_intrinsics()? src/hotspot/share/opto/library_call.cpp line 122: > 120: : "(intrinsic)"; > 121: CompileTask::print_inlining_ul(callee, jvms->depth() - 1, bci, InliningResult::SUCCESS, inline_msg); > 122: C->inline_printer()->record(callee, jvms, InliningResult::SUCCESS, inline_msg); How does this still support print_intrinsics()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854927736 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854929710 From dlong at openjdk.org Sat Nov 23 00:22:18 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 00:22:18 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput src/hotspot/share/opto/callGenerator.cpp line 422: > 420: > 421: if (cg != nullptr) { > 422: if (!allow_inline) { Is `!allow_inline` really the correct check here and in LateInlineVirtualCallGenerator::do_late_inline_check()? This is where I ran into trouble with the asserts in previous implementation. Note that even if allow_inline is passed as true to the CallGenerator factory, the factory can force it to false for a variety of reasons. Maybe we should be looking at what kind of CallGenerator we have in `cg`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854932906 From dlong at openjdk.org Sat Nov 23 01:02:29 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 01:02:29 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: <78zTMV-WOGuk5zZiw_tJJLQdPjyDoW87RSOXrvRB_Bk=.00fe05f4-3c36-43ad-aed3-a14bf384ff83@github.com> On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput src/hotspot/share/compiler/compileTask.hpp line 222: > 220: > 221: /** > 222: * @deprecated Please rely on Compile::inline_printer. Do not directly write inlining information to tty. Can we get rid of these instead of making them deprecated? src/hotspot/share/opto/printinlining.cpp line 78: > 76: return child; > 77: } > 78: auto child = new (_arena) IPInlineSite(callee, _arena, bci); This code is nice and compact, but I'm worried about the memory footprint. In the worst case, we get an array element for every bytecode parsed, right? It might be better to use a hash table. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854955825 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854955353 From dlong at openjdk.org Sat Nov 23 01:11:15 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 01:11:15 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput This is a huge improvement. Nice work! I left a few inline comments. src/hotspot/share/opto/printinlining.cpp line 60: > 58: return locate(state->caller(), nullptr)->at_bci(state->bci(), callee); > 59: } > 60: It looks like you are building a tree, just like InlineTree. I wonder if it would make sense to unify them somehow in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2495174365 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1854958669 From dlong at openjdk.org Sat Nov 23 02:52:09 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 02:52:09 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput I pulled your latest changes, and I am seeing missing newlines in the output, just by running `java -XX:+PrintInlining`. With -XX:+PrintIntrinsics, there is no additional output, so I'm wondering how -XX:+PrintIntrinsics tests are passing. Maybe we are missing test coverage for that flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2495242897 From dlong at openjdk.org Sat Nov 23 03:06:24 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 23 Nov 2024 03:06:24 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: <8o2N5Xrb2g4G5IvBmBTfQhkbVi061dbitCyJQpO5bZE=.1cf776a5-9f6d-4502-acdb-484f51b49107@github.com> On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput I'm also seeing missing method names: @ 10 java.lang.StringBuilder:: @ 7 jdk.internal.classfile.impl.SplitConstantPool::utf8Entry (45 bytes) failed to inline: callee is too large and weird indentation: @ 1 java.lang.Object:: (1 bytes) inline @ 1 sun.invoke.util.Wrapper::basicTypeChar (18 bytes) inline ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2495256050 From enikitin at openjdk.org Sat Nov 23 03:58:18 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Sat, 23 Nov 2024 03:58:18 GMT Subject: Integrated: 8344533: CTW: Add option to remove clinits before loading In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 10:50:48 GMT, Evgeny Nikitin wrote: > This PR adds an option-controlled (off by default) removal of methods before loading them with CTW ClassLoader. > The main purpose is to prevent `static { ... }` blocks execution (along with static fields initialization). > Testing: manual CTW runs. This pull request has now been integrated. Changeset: effee122 Author: Evgeny Nikitin Committer: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/effee122dd74241db4ec2b6bfd99f1450741b804 Stats: 20 lines in 1 file changed: 18 ins; 0 del; 2 mod 8344533: CTW: Add option to remove clinits before loading Reviewed-by: thartmann, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/22235 From jsjolen at openjdk.org Sat Nov 23 10:03:42 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 23 Nov 2024 10:03:42 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput If there's a key-value relationship from bci -> whatever, then we have a balanced binary tree class `Treap` that can be used. There's no `TreapArena` right now, but this PR can add it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2495404130 From duke at openjdk.org Sat Nov 23 13:44:30 2024 From: duke at openjdk.org (Piotr Tarsa) Date: Sat, 23 Nov 2024 13:44:30 GMT Subject: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5] In-Reply-To: References: Message-ID: On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD accelerated sort PR (#14227) which implemented AVX512 intrinsics for Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR. >> 2) Addressing the build failure due to a bug in GCC 12 (which was fixed in version 12.3.1). The details of the bug are at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593 >> 3) Minor changes in Javadoc strings > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Revert @ForceInline annotations for small array sort methods The answer for slow performance of AVX512 version of x86-simd-sort on Zen 4 is most probably explained in AMD manuals which could be found at: https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides [Software Optimization Guide for the AMD Zen4 Microarchitecture](https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip) has following remark in "2.11.2 Code recommendations" chapter: > Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops. [Software Optimization Guide for the AMD Zen5 Microarchitecture](https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/58455.zip) doesn't have any remark about COMPRESS instructions. Could you add some code that disables the AVX512 version on Zen4, but keeps it enabled on Zen5 and future Zen architectures? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-2495483841 From duke at openjdk.org Sat Nov 23 13:44:44 2024 From: duke at openjdk.org (Piotr Tarsa) Date: Sat, 23 Nov 2024 13:44:44 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42] In-Reply-To: References: Message-ID: On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: > > - fix code style and formatting > - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort > - Update CompileThresholdScaling only for the sort and partition intrinsics; update build script to remove nested if > - change variable names of indexPivot* to pivotIndex* > - Update DualPivotQuicksort.java > - Rename arraySort and arrayPartition Java methods to sort and partition. Cleanup some comments > - Remove the unnecessary exception in single pivot partitioning fallback method > - Move functional interfaces close to the associated methods > - Refactor the sort and partition intrinsics to accept method references for fallback functions > - Refactor stub handling to use a generic function for all types > - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86 The answer for slow performance of AVX512 version of x86-simd-sort on Zen 4 is most probably explained in AMD manuals which could be found at: https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides [Software Optimization Guide for the AMD Zen4 Microarchitecture](https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip) has following remark in "2.11.2 Code recommendations" chapter: > Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops. [Software Optimization Guide for the AMD Zen5 Microarchitecture](https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/58455.zip) doesn't have any remark about COMPRESS instructions. Could you add some code that disables the AVX512 version on Zen4, but keeps it enabled on Zen5 and future Zen architectures? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-2495483834 From dnsimon at openjdk.org Sat Nov 23 15:50:16 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 23 Nov 2024 15:50:16 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 20:53:55 GMT, Dean Long wrote: >> This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. >> It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. > > test/hotspot/jtreg/compiler/jvmci/TestEnableJVMCIProduct.java line 106: > >> 104: } >> 105: } else if (flag.equals("-XX:+UseGraalJIT")) { >> 106: output.shouldContain("jvmci.Compiler=graal"); > > Does changing shouldContain() to stdoutShouldContain() also solve the problem? I don't think so as the output is sent to HotSpot's log stream (i.e. `tty`) which I believe goes stdout by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22323#discussion_r1855215437 From amitkumar at openjdk.org Sun Nov 24 09:18:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 24 Nov 2024 09:18:52 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v2] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21782/files - new: https://git.openjdk.org/jdk/pull/21782/files/6ef29d21..a3a90b23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=00-01 Stats: 126 lines in 3 files changed: 65 ins; 60 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Sun Nov 24 09:18:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 24 Nov 2024 09:18:52 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: Message-ID: <38dO-GRLyhqTT1CU-FDwHGZ6vLjvYkvb8kdlWwI2-hc=.93be40c6-754d-43f4-8c53-cfdc88e27b9d@github.com> On Mon, 18 Nov 2024 20:16:02 GMT, Vladimir Ivanov wrote: > > Is there some arena-specific check that exists, which could be used here ? > > `_type_arena != &_Compile_types` on current `Compile` should reliably detect when shared type dictionary is being populated. I can't access `_type_arena` in the initialize method directly. So will this change be fine: diff --git a/src/hotspot/share/opto/compile.hpp b/src/hotspot/share/opto/compile.hpp index 05e24bf3f6e..6dbed935218 100644 --- a/src/hotspot/share/opto/compile.hpp +++ b/src/hotspot/share/opto/compile.hpp @@ -964,6 +964,7 @@ class Compile : public Phase { Dict* type_dict() { return _type_dict; } size_t type_last_size() { return _type_last_size; } int num_alias_types() { return _num_alias_types; } + Arena* compile_type_arena() { return &_Compile_types; } void init_type_arena() { _type_arena = &_Compile_types; } void set_type_arena(Arena* a) { _type_arena = a; } diff --git a/src/hotspot/share/opto/runtime.cpp b/src/hotspot/share/opto/runtime.cpp index b91844d383f..95cab3fd117 100644 --- a/src/hotspot/share/opto/runtime.cpp +++ b/src/hotspot/share/opto/runtime.cpp @@ -2044,7 +2044,8 @@ NamedCounter* OptoRuntime::new_named_counter(JVMState* youngest_jvms, NamedCount return c; } -void OptoRuntime::initialize_types() { +void OptoRuntime::initialize_types(Compile* current) { + assert(current->type_arena() != current->compile_type_arena(), "should be shared arena"); new_instance_Type_init(); new_array_Type_init(); multianewarray2_Type_init(); diff --git a/src/hotspot/share/opto/runtime.hpp b/src/hotspot/share/opto/runtime.hpp index 14dc03fef25..b2c32366157 100644 --- a/src/hotspot/share/opto/runtime.hpp +++ b/src/hotspot/share/opto/runtime.hpp @@ -770,7 +770,7 @@ class OptoRuntime : public AllStatic { // dumps all the named counters static void print_named_counters(); - static void initialize_types(); + static void initialize_types(Compile* current); }; #endif // SHARE_OPTO_RUNTIME_HPP diff --git a/src/hotspot/share/opto/type.cpp b/src/hotspot/share/opto/type.cpp index 40d688085b5..86949786284 100644 --- a/src/hotspot/share/opto/type.cpp +++ b/src/hotspot/share/opto/type.cpp @@ -713,7 +713,7 @@ void Type::Initialize_shared(Compile* current) { mreg2type[Op_VecZ] = TypeVect::VECTZ; LockNode::lock_type_init(); - OptoRuntime::initialize_types(); + OptoRuntime::initialize_types(current); // Restore working type arena. current->set_type_arena(save); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2489319551 From amitkumar at openjdk.org Sun Nov 24 09:18:52 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 24 Nov 2024 09:18:52 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v2] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 20:00:48 GMT, Vladimir Ivanov wrote: >> "first" call is made from here because of shared space. Otherwise the object-allocation will deleted and VM will crash. That's what I observed. And again that was the reason why the initialization call is made from `Type::Initialize_shared`. > > My suggestion is about refactoring the code, so initialization is performed in `OptoRuntime` code (e.g., in `OptoRuntime::initialize_types()`). Then you call it from here. @iwanowww can you take another look. I have pushed the requested changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1850804697 From haosun at openjdk.org Mon Nov 25 01:10:21 2024 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Nov 2024 01:10:21 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? Hi @Bhavana-Kilambi , I would suggest making this patch as a clean one, i.e. better to move AArch64 related code to as one separate PR mainly because it may still take some time to review/merge your patch and we'd better **not** merge this PR with known jtreg failure. I noticed @jatin-bhateja has uploaded the cleanup commit and I will check the jtreg test on AArch64+SVE side. Will report the result back when the test finishes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2496479249 From fyang at openjdk.org Mon Nov 25 02:27:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 25 Nov 2024 02:27:12 GMT Subject: RFR: 8344916: RISC-V: Misaligned access in array fill stub Message-ID: Hi, Please review this small change. In `generate_fill`, we fill the remaining elements by a single 8-byte store when the remaining count is less than 8 bytes in size after `fill_words`. This may overwrite some elements and create misaligned access. While it's not an issue for mordern CPUs with fast misaligned access, this does affect performance on CPUs where misaligned access is emulated by a trap handler and thus is very slow. async-profiler tells 2.8% of `jshort_fill` in flame graph when sampling Specjbb2005 on these platforms. In this particular case, the copy address `to` is 8-byte aligned after `fill_words`. So if `AvoidUnalignedAccesses` is true, one choice would be directing control to `L_fill_elements` which avoids alignment issue while filling the remaining elements. Test on linux-riscv64 platform: - [x] tier1-3 (release) - [x] 2.5% Specjbb2005 performance benefit on both HiFive Unmatched and Premier P550 SBCs. - [x] No obvious performance impact witnessed on other platforms like BFI-F3 or Pioneer box. ------------- Commit messages: - 8344916: RISC-V: Misaligned access in array fill stub Changes: https://git.openjdk.org/jdk/pull/22347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344916 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22347/head:pull/22347 PR: https://git.openjdk.org/jdk/pull/22347 From haosun at openjdk.org Mon Nov 25 06:24:24 2024 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Nov 2024 06:24:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 01:07:43 GMT, Hao Sun wrote: > > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? > > Hi @Bhavana-Kilambi , I would suggest making this patch as a clean one, i.e. better to move AArch64 related code to as one separate PR mainly because it may still take some time to review/merge your patch and we'd better **not** merge this PR with known jtreg failure. I noticed @jatin-bhateja has uploaded the cleanup commit and I will check the jtreg test on AArch64+SVE side. Will report the result back when the test finishes. Previous test failure in file `TestFloat16VectorOperations.java` is gone now. tier1~3 passed on AArch64+SVE side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2496961595 From thartmann at openjdk.org Mon Nov 25 06:42:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 25 Nov 2024 06:42:19 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 11:11:55 GMT, Evgeny Nikitin wrote: > For CTW, zero classes in provided jar is now a failure. > This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. > > This PR makes this behaviour controllable. Default reaction is a failure, like before. Changes requested by thartmann (Reviewer). test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java line 179: > 177: boolean allowZeroClassCount = Boolean.getBoolean("sun.hotspot.tools.ctw.allow_zero_class_count"); > 178: if (allowZeroClassCount && classCount == 0L) { > 179: System.out.println("WARN: " + target + "(at " + targetPath + ") have not classes. Ignoring."); Suggestion: System.out.println("WARN: " + target + "(at " + targetPath + ") has no classes. Ignoring."); ------------- PR Review: https://git.openjdk.org/jdk/pull/22320#pullrequestreview-2457336184 PR Review Comment: https://git.openjdk.org/jdk/pull/22320#discussion_r1855913391 From chagedorn at openjdk.org Mon Nov 25 07:28:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 07:28:14 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 18:53:28 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). > > Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. > > In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. > > I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). > > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. > > Cheers, > Sonia Looks good to me. Can you also add a regression test for it? Since this already triggers with `--version`, you can just create a hello world like test and run with the mentioned flags. Just a side note: > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). You can directly insert the permalink (in the non-blame view) such that the code is inlined in the PR for easier reading :-) https://github.com/openjdk/jdk/blob/6f622da7fbae67d8c1cd9e795127adac58a246a9/src/hotspot/share/opto/compile.cpp#L4327 ------------- PR Review: https://git.openjdk.org/jdk/pull/22331#pullrequestreview-2457422083 From chagedorn at openjdk.org Mon Nov 25 07:33:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 07:33:18 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers [v2] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 11:26:59 GMT, Emanuel Peter wrote: >> This is a followup to: >> https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) >> >> @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. >> >> **The problem is this:** >> Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. >> >> -------------------------------- >> >> First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. >> >> ------------------- >> >> Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. >> >> I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. >> >> To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. >> >> In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > make failOf comment explicit for Roberto Still good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22199#pullrequestreview-2457438122 From chagedorn at openjdk.org Mon Nov 25 07:45:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 07:45:21 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput src/hotspot/share/opto/printinlining.hpp line 56: > 54: /** > 55: * Method may be null iff this is the root of the tree. > 56: */ Just a drive-by comment: We usually use "`//` style" method and class comments in the code base. We should probably stick to it for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1855996428 From rehn at openjdk.org Mon Nov 25 08:05:14 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 25 Nov 2024 08:05:14 GMT Subject: RFR: 8344916: RISC-V: Misaligned access in array fill stub In-Reply-To: References: Message-ID: On Sun, 24 Nov 2024 07:27:08 GMT, Fei Yang wrote: > Hi, Please review this small change. > > In `generate_fill`, we fill the remaining elements by a single 8-byte store when the remaining count is less than 8 bytes in size after `fill_words`. This may overwrite some elements and create misaligned access. While it's not an issue for mordern CPUs with fast misaligned access, this does affect performance on CPUs where misaligned access is emulated by a trap handler and thus is very slow. async-profiler tells 2.8% of `jshort_fill` in flame graph when sampling Specjbb2005 on these platforms. > > In this particular case, the copy address `to` is 8-byte aligned after `fill_words`. So if `AvoidUnalignedAccesses` is true, one choice would be directing control to `L_fill_elements` which avoids alignment issue while filling the remaining elements. > > Test on linux-riscv64 platform: > - [x] tier1-3 (release) > - [x] 2.5% Specjbb2005 performance benefit on both HiFive Unmatched and Premier P550 SBCs. > - [x] No obvious performance impact witnessed on other platforms like BFI-F3 or Pioneer box (-XX:+AvoidUnalignedAccesses). Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22347#pullrequestreview-2457513276 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups Wow, thanks for tackling this! Ok, lots of style comments. But again: I would have loved to see this split up into these parts: - Scalar - Scalar optimizations (value, ideal, identity) - Vector This will again take many many week to get reviewed because it is a 3k+ change with lots of details. Do you have any tests for the scalar constant folding optimizations? I did not find them. src/hotspot/cpu/x86/x86.ad line 10910: > 10908: %} > 10909: > 10910: instruct convF2HFAndS2HF(regF dst, regF src) I'm starting to see that you use sometimes `H` and sometimes `HF`. That needs to be consistent - unless they are 2 different things? src/hotspot/cpu/x86/x86.ad line 10930: > 10928: %} > 10929: > 10930: instruct scalar_sqrt_fp16_reg(regF dst, regF src) Hmm, and them you also use `fp16`... so now we have `H`, `HF` and `fp16`... src/hotspot/share/opto/addnode.cpp line 713: > 711: //------------------------------add_of_identity-------------------------------- > 712: // Check for addition of the identity > 713: const Type *AddHFNode::add_of_identity(const Type *t1, const Type *t2) const { I would generally drop out these comments, unless they actually have something useful to say that the name does not say. You could make a comment why you are returning `nullptr`, i.e. doing nothing. And for style: the `*` belongs with the type ;) Suggestion: const Type* AddHFNode::add_of_identity(const Type* t1, const Type* t2) const { src/hotspot/share/opto/addnode.cpp line 721: > 719: // This also type-checks the inputs for sanity. Guaranteed never to > 720: // be passed a TOP or BOTTOM type, these are filtered out by pre-check. > 721: const Type *AddHFNode::add_ring(const Type *t0, const Type *t1) const { Suggestion: // Supplied function returns the sum of the inputs. // This also type-checks the inputs for sanity. Guaranteed never to // be passed a TOP or BOTTOM type, these are filtered out by pre-check. const Type* AddHFNode::add_ring(const Type* t0, const Type* t1) const { Here the comments are great :) src/hotspot/share/opto/addnode.cpp line 1625: > 1623: > 1624: // handle min of 0.0, -0.0 case. > 1625: return (jint_cast(f0) < jint_cast(f1)) ? r0 : r1; Can you please add some comments for this here? Why is there an int-case on floats? Why not just do the ternary comparison on line 1621: `return f0 < f1 ? r0 : r1;`? src/hotspot/share/opto/addnode.hpp line 179: > 177: virtual Node* Identity(PhaseGVN* phase) { return this; } > 178: virtual uint ideal_reg() const { return Op_RegF; } > 179: }; Please put the `*` with the type everywhere. src/hotspot/share/opto/connode.cpp line 49: > 47: switch( t->basic_type() ) { > 48: case T_INT: return new ConINode( t->is_int() ); > 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); That will be quite confusing.... don't you think? src/hotspot/share/opto/connode.hpp line 122: > 120: class ConHNode : public ConNode { > 121: public: > 122: ConHNode( const TypeH *t ) : ConNode(t) {} Suggestion: ConHNode(const TypeH* t) : ConNode(t) {} src/hotspot/share/opto/connode.hpp line 129: > 127: return new ConHNode( TypeH::make(con) ); > 128: } > 129: Suggestion: src/hotspot/share/opto/convertnode.cpp line 256: > 254: //------------------------------Ideal------------------------------------------ > 255: Node* ConvF2HFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 256: // Optimize pattern - ConvHF2F (FP32BinOp) ConvF2HF ==> ReinterpretS2HF (FP16BinOp) ReinterpretHF2S. This is a little dense and I don't understand your notation. So we are pattern matching: `ConvF2HF( FP32BinOp(ConvHF2F(x), ConvHF2F(y)) )` <- I think that would be more readable. You could also create local variables for `x` and `y`, just so it is more readable. And then instead we generate: `ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))` Ok, so you are saying why lift to FP32, if we cast down to FP16 anyway... would be nice to have such a comment at the top to motivate the optimization! What confuses me a little here: why do we even have to cast from and to `short` here? Maybe a quick comment about that would also help. src/hotspot/share/opto/convertnode.cpp line 948: > 946: } > 947: > 948: bool Float16NodeFactory::is_binary_oper(int opc) { Suggestion: bool Float16NodeFactory::is_float32_binary_oper(int opc) { Just so it is explicit, since you have the parallel `get_float16_binary_oper` below. src/hotspot/share/opto/convertnode.hpp line 234: > 232: class ReinterpretHF2SNode : public Node { > 233: public: > 234: ReinterpretHF2SNode( Node *in1 ) : Node(0,in1) {} Suggestion: ReinterpretHF2SNode(Node* in1) : Node(0, in1) {} src/hotspot/share/opto/divnode.cpp line 759: > 757: const Type* t2 = phase->type(in(2)); > 758: if(t1 == Type::TOP) return Type::TOP; > 759: if(t2 == Type::TOP) return Type::TOP; Suggestion: if(t1 == Type::TOP) { return Type::TOP; } if(t2 == Type::TOP) { return Type::TOP; } Please use the brackets consistently. src/hotspot/share/opto/divnode.cpp line 765: > 763: if((t1 == bot) || (t2 == bot) || > 764: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) > 765: return bot; Suggestion: if((t1 == bot) || (t2 == bot) || (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) { return bot; } Again: please always use brackets. src/hotspot/share/opto/divnode.cpp line 776: > 774: > 775: if(t2 == TypeH::ONE) > 776: return t1; brackets src/hotspot/share/opto/divnode.cpp line 782: > 780: t2->base() == Type::HalfFloatCon && > 781: t2->getf() != 0.0) // could be negative zero > 782: return TypeH::make(t1->getf()/t2->getf()); Suggestion: // If divisor is a constant and not zero, divide the numbers if(t1->base() == Type::HalfFloatCon && t2->base() == Type::HalfFloatCon && t2->getf() != 0.0) { // could be negative zero return TypeH::make(t1->getf() / t2->getf()); } src/hotspot/share/opto/divnode.cpp line 789: > 787: > 788: if(t1 == TypeH::ZERO && !g_isnan(t2->getf()) && t2->getf() != 0.0) > 789: return TypeH::ZERO; brackets for if Ok, why not also do it for negative zero then? src/hotspot/share/opto/divnode.cpp line 797: > 795: //------------------------------isA_Copy--------------------------------------- > 796: // Dividing by self is 1. > 797: // If the divisor is 1, we are an identity on the dividend. Suggestion: // If the divisor is 1, we are an identity on the dividend. `Dividing by self is 1.` That does not seem to apply here. Maybe you meant `dividing by 1 is self`? src/hotspot/share/opto/divnode.cpp line 804: > 802: > 803: //------------------------------Idealize--------------------------------------- > 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { Suggestion: Node* DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { src/hotspot/share/opto/divnode.cpp line 805: > 803: //------------------------------Idealize--------------------------------------- > 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 805: if (in(0) && remove_dead_region(phase, can_reshape)) return this; Suggestion: if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { return this; } brackets for if and no implicit null checks please! src/hotspot/share/opto/divnode.cpp line 814: > 812: > 813: const TypeH* tf = t2->isa_half_float_constant(); > 814: if(!tf) return nullptr; no implicit booleans! src/hotspot/share/opto/divnode.cpp line 836: > 834: > 835: // return multiplication by the reciprocal > 836: return (new MulHFNode(in(1), phase->makecon(TypeH::make(reciprocal)))); Do we have good tests for this optimization? src/hotspot/share/opto/mulnode.cpp line 559: > 557: > 558: // Compute the product type of two half float ranges into this node. > 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { Suggestion: const Type* MulHFNode::mul_ring(const Type* t0, const Type* t1) const { src/hotspot/share/opto/mulnode.cpp line 561: > 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { > 560: if( t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT ) return Type::HALF_FLOAT; > 561: return TypeH::make( t0->getf() * t1->getf() ); I hope that `TypeH::make` handles the overflow cases well... does it? And do we have tests for this? src/hotspot/share/opto/mulnode.cpp line 1945: > 1943: return TypeH::make(fma(f1, f2, f3)); > 1944: #endif > 1945: } I need: - brackets for ifs - all `*` on the left with the type - An explanation what the `ifdef __STDC_IEC_559__` does. src/hotspot/share/opto/mulnode.hpp line 155: > 153: virtual const Type *mul_ring( const Type *, const Type * ) const; > 154: const Type *mul_id() const { return TypeH::ONE; } > 155: const Type *add_id() const { return TypeH::ZERO; } Suggestion: const Type* mul_id() const { return TypeH::ONE; } const Type* add_id() const { return TypeH::ZERO; } src/hotspot/share/opto/mulnode.hpp line 160: > 158: int max_opcode() const { return Op_MaxHF; } > 159: int min_opcode() const { return Op_MinHF; } > 160: const Type *bottom_type() const { return Type::HALF_FLOAT; } Suggestion: const Type* bottom_type() const { return Type::HALF_FLOAT; } src/hotspot/share/opto/subnode.cpp line 1975: > 1973: if( f < 0.0f ) return Type::HALF_FLOAT; > 1974: return TypeH::make( (float)sqrt( (double)f ) ); > 1975: } if brackets and asterisks with types please src/hotspot/share/opto/subnode.hpp line 143: > 141: const Type *bottom_type() const { return Type::HALF_FLOAT; } > 142: virtual uint ideal_reg() const { return Op_RegF; } > 143: }; Suggestion: //------------------------------SubHFNode-------------------------------------- // Subtract 2 half floats class SubHFNode : public SubFPNode { public: SubHFNode(Node* in1, Node* in2) : SubFPNode(in1, in2) {} virtual int Opcode() const; virtual const Type* sub(const Type *, const Type *) const; const Type* add_id() const { return TypeH::ZERO; } const Type* bottom_type() const { return Type::HALF_FLOAT; } virtual uint ideal_reg() const { return Op_RegF; } }; src/hotspot/share/opto/subnode.hpp line 552: > 550: } > 551: virtual int Opcode() const; > 552: const Type *bottom_type() const { return Type::HALF_FLOAT; } Suggestion: const Type* bottom_type() const { return Type::HALF_FLOAT; } src/hotspot/share/opto/type.cpp line 1487: > 1485: typerr(t); > 1486: > 1487: case HalfFloatCon: // Float-constant vs Float-constant? Suggestion: case HalfFloatCon: // Float-constant vs Float-constant? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21490#pullrequestreview-2457382009 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855943470 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855944584 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855948500 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855950333 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855954166 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855955074 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958333 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958773 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855959025 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855977560 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855981273 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855982405 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855984366 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855985484 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855988545 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855989752 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855992127 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855994876 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855995436 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855996454 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856000589 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856002336 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856007382 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856006524 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856009749 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010212 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010391 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013278 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013945 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856014893 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856016525 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:17:33 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Testpoints for new value transforms + code cleanups > > src/hotspot/share/opto/connode.cpp line 49: > >> 47: switch( t->basic_type() ) { >> 48: case T_INT: return new ConINode( t->is_int() ); >> 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); > > That will be quite confusing.... don't you think? I mean do we need this? We already have `ConHNode::make` below...? > src/hotspot/share/opto/divnode.cpp line 765: > >> 763: if((t1 == bot) || (t2 == bot) || >> 764: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) >> 765: return bot; > > Suggestion: > > if((t1 == bot) || (t2 == bot) || > (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) { > return bot; > } > > Again: please always use brackets. Apply the same below. > src/hotspot/share/opto/divnode.cpp line 804: > >> 802: >> 803: //------------------------------Idealize--------------------------------------- >> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > > Suggestion: > > Node* DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { Ok, and please add brackets for all the ifs below! > src/hotspot/share/opto/divnode.cpp line 805: > >> 803: //------------------------------Idealize--------------------------------------- >> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 805: if (in(0) && remove_dead_region(phase, can_reshape)) return this; > > Suggestion: > > if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { return this; } > > brackets for if and no implicit null checks please! https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855959810 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855985811 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855995743 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855999519 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 11:45:34 GMT, Bhavana Kilambi wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Testpoints for new value transforms + code cleanups > > src/hotspot/share/opto/node.cpp line 1600: > >> 1598: >> 1599: // Get a half float constant from a ConstNode. >> 1600: // Returns the constant if it is a float ConstNode > > half float ConstNode? Suggestion: // Returns the constant if it is a half float ConstNode ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856011460 From duke at openjdk.org Mon Nov 25 08:17:19 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 08:17:19 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Fri, 22 Nov 2024 14:34:43 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestDuplicatedLateInliningOutput > I pulled your latest changes, and I am seeing missing newlines in the output, just by running `java -XX:+PrintInlining`. With -XX:+PrintIntrinsics, there is no additional output, so I'm wondering how -XX:+PrintIntrinsics tests are passing. Maybe we are missing test coverage for that flag. > I'm also seeing missing method names: > > ``` > @ 10 java.lang.StringBuilder:: @ 7 jdk.internal.classfile.impl.SplitConstantPool::utf8Entry (45 bytes) failed to inline: callee is too large > ``` > > and weird indentation: > > ``` > @ 1 java.lang.Object:: (1 bytes) inline > @ 1 sun.invoke.util.Wrapper::basicTypeChar (18 bytes) inline > ``` @dean-long The reason for this is that multiple compile threads are trying to print at the same time. The odd formatting goes away with `-Xbatch`, preventing concurrent compilation. I didn't remove any explicit locking or synchronizing mechanism during refactoring. I think there was never any explicit mechanism to make this work without -Xbatch but it rather worked because the entire printinlining for one method was first dumped into a stringStream, which was then dumped onto tty in one go. With my refactoring though, InlinePrinter::IPInlineSite::dump will directly print individual segments of the output to tty, opening the door widely for bad interleavings with multiple compile threads. Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2497189139 From dfenacci at openjdk.org Mon Nov 25 08:19:15 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 25 Nov 2024 08:19:15 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 18:53:28 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). > > Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. > > In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. > > I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). > > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. > > Cheers, > Sonia Looks good to me (C1 seems to already have this check). Thanks for fixing this. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/22331#pullrequestreview-2457572649 From jbhateja at openjdk.org Mon Nov 25 08:20:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 08:20:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 08:02:55 GMT, Emanuel Peter wrote: > Wow, thanks for tackling this! > > Ok, lots of style comments. > > But again: I would have loved to see this split up into these parts: > > * Scalar > * Scalar optimizations (value, ideal, identity) > * Vector > > This will again take many many week to get reviewed because it is a 3k+ change with lots of details. > > Do you have any tests for the scalar constant folding optimizations? I did not find them. Hey @eme64 , The patch includes IR framework-based scalar constant folding test points. https://github.com/openjdk/jdk/blob/5f58eea62a0f4d2cd731242a0fb264316ff5000d/test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java#L170 Regarding vector operation inferencing, we are taking the standard route by adding new Vector IR and associated VectorNode::Opcode / making routine changes without changing the auto-vectorization core. Each new vector operation is backed by IR framework-based tests. https://github.com/openjdk/jdk/pull/21490/files#diff-30af2f4d6a92733f58967b0feab21ddbc58a8f1ac5d3d5660c0f60220f6fab0dR40 Our target is to get this integrated before JDK24-RDP1, your help and reviews will be highly appreciated. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2497192437 From duke at openjdk.org Mon Nov 25 08:28:18 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 08:28:18 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: <6qNbFcpA-OQneDpFlrCyhlqbxVW6HmYuKgqmR6iRTaM=.0f4331b5-27aa-4d03-b772-7e0a67c193ab@github.com> On Sat, 23 Nov 2024 00:12:02 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestDuplicatedLateInliningOutput > > src/hotspot/share/opto/compile.cpp line 675: > >> 673: _oom(false), >> 674: _replay_inline_data(nullptr), >> 675: _inline_printer(comp_arena(), directive->PrintInliningOption || PrintOptoInlining), > > How do we support print_intrinsics()? That's a good point. Inline printer should be enabled if C->print_intrinsics() or C->print_inlining() is true so this change does seem to change the behavior. I'll fix this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1856092003 From duke at openjdk.org Mon Nov 25 08:53:07 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 08:53:07 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v15] In-Reply-To: References: Message-ID: <_9xnkK0m49iSa3U3TFAylN1qc6-DhTMMEbB1IKIC18E=.6b423ef8-0901-4215-8bbd-9dde172e1cc7@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Change is_enabled to old pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/99c5cbc2..a0d4e66f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=13-14 Stats: 26 lines in 4 files changed: 12 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From epeter at openjdk.org Mon Nov 25 08:54:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:54:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java line 37: > 35: * @modules jdk.incubator.vector > 36: * @library /test/lib / > 37: * @requires vm.compiler2.enabled Is this necessary, to restrict to C2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856158768 From epeter at openjdk.org Mon Nov 25 08:59:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:59:26 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. > The patch includes IR framework-based scalar constant folding test points. You mention this IR test: https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 Here I only see the use of very trivial values. I think we need more complicated cases. What about these: - Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. - Same where it would overflow the FP16 range. - Negative zero tests. - Division by powers of 2. It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2497315686 From epeter at openjdk.org Mon Nov 25 09:08:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 09:08:19 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 09:45:37 GMT, theoweidmannoracle wrote: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly Nice improvements, some comments/ questions below about testing. src/hotspot/share/opto/divnode.cpp line 518: > 516: > 517: if (is_power_of_2(l)) { > 518: return make_urshift(div->in(1), phase->intcon(log2i_graceful(l))); Are we testing optimizations like these with random constants somewhere, and comparing it to the interpreter results? test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Check the others too. I don't think a new file requires old dates ;) test/hotspot/jtreg/compiler/c2/irTests/ModLNodeIdealizationTests.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2457735736 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856180636 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856174160 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856174977 From epeter at openjdk.org Mon Nov 25 09:08:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 09:08:19 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: <0dLoyfFTRcDX-HeZe3_XWGbR9FBKULzpG9lIwvCbWdY=.0073bc10-63b8-4012-a4cb-0aef4788d04f@github.com> On Mon, 25 Nov 2024 09:00:54 GMT, Emanuel Peter wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > test/hotspot/jtreg/compiler/c2/irTests/ModLNodeIdealizationTests.java line 2: > >> 1: /* >> 2: * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved. > > Suggestion: > > * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Same with the others ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856175504 From duke at openjdk.org Mon Nov 25 09:16:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 09:16:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 09:04:09 GMT, Emanuel Peter wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > src/hotspot/share/opto/divnode.cpp line 518: > >> 516: >> 517: if (is_power_of_2(l)) { >> 518: return make_urshift(div->in(1), phase->intcon(log2i_graceful(l))); > > Are we testing optimizations like these with random constants somewhere, and comparing it to the interpreter results? https://github.com/openjdk/jdk/pull/22061/files#diff-48b0b8da547a3fe6aae9ea3ef20b4d708e47f2332ff6884478336f39d9eb9459R82 and https://github.com/openjdk/jdk/pull/22061/files#diff-24679e6505fe23e8a3ba73decaaf97896899c0a10956c437b8721fca33706ee2R82 should cover this I think. The containing method is marked with @DontCompile. > test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 2: > >> 1: /* >> 2: * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved. > > Suggestion: > > * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. > > Check the others too. I don't think a new file requires old dates ;) The basic structure is based of the existing tests (I think mostly DivINodeIdealizationTests) so I was thinking this is a derivative work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856195539 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856191718 From epeter at openjdk.org Mon Nov 25 09:32:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 09:32:15 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 09:13:31 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/divnode.cpp line 518: >> >>> 516: >>> 517: if (is_power_of_2(l)) { >>> 518: return make_urshift(div->in(1), phase->intcon(log2i_graceful(l))); >> >> Are we testing optimizations like these with random constants somewhere, and comparing it to the interpreter results? > > https://github.com/openjdk/jdk/pull/22061/files#diff-48b0b8da547a3fe6aae9ea3ef20b4d708e47f2332ff6884478336f39d9eb9459R82 and https://github.com/openjdk/jdk/pull/22061/files#diff-24679e6505fe23e8a3ba73decaaf97896899c0a10956c437b8721fca33706ee2R82 should cover this I think. The containing method is marked with @DontCompile. You could use a similar trick with the constant method handles, as here: https://github.com/openjdk/jdk/pull/21521/files#diff-d69ed849846cce04a18fe13fb35cd975ad533f0ef76d923745d97bdb27db7073 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856223837 From duke at openjdk.org Mon Nov 25 09:58:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 09:58:03 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v16] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Change comment style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/a0d4e66f..86bd2476 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=14-15 Stats: 24 lines in 1 file changed: 0 ins; 12 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From chagedorn at openjdk.org Mon Nov 25 10:02:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 10:02:37 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v3] In-Reply-To: References: Message-ID: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Generalize BFS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22136/files - new: https://git.openjdk.org/jdk/pull/22136/files/5ae3a4fa..fbde5918 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=01-02 Stats: 113 lines in 3 files changed: 85 ins; 15 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22136/head:pull/22136 PR: https://git.openjdk.org/jdk/pull/22136 From chagedorn at openjdk.org Mon Nov 25 10:02:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 10:02:37 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v2] In-Reply-To: <6hDh2EszZ8wGi-KDaBVxm-Z4XBfIJNjBnyfbDUBfixM=.bf6488bf-313a-4081-a348-f3f67f209dce@github.com> References: <6hDh2EszZ8wGi-KDaBVxm-Z4XBfIJNjBnyfbDUBfixM=.bf6488bf-313a-4081-a348-f3f67f209dce@github.com> Message-ID: On Fri, 22 Nov 2024 11:20:49 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/node.hpp line 2139: > >> 2137: >> 2138: public: >> 2139: explicit DataNodeBFS(BFSActions& bfs_action) : _bfs_actions(bfs_action) {} > > Is this restricted to data-nodes? If so, you should verify that the `start_node` is a data node. But we could also generalize this to any BFS, and then check in `should_visit` if it is a data node of CFG. > > You should also say that it traverses inputs/def, not outputs/uses. Yes, it should only visit data nodes. Good point about generalizing it to any BFS, including CFG nodes. I gave it a shot (see new commit) but limited the patch to an input-only BFS (I guess it could be extended to an output-including or input/ouput-selecting BFS as well at some point but we should have a use case first). For now, I'm only defining a `DataNodeInputsBFS` but provided a simple way to add a `CFGNodeInputsBFS` or a "any node" `NodeInputsBFS` later (again, I'm leaving these additional implementation out of this patch since we should first have a use case - but should be simple to add). Let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22136#discussion_r1856278138 From duke at openjdk.org Mon Nov 25 10:10:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 10:10:23 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Sat, 23 Nov 2024 00:19:13 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestDuplicatedLateInliningOutput > > src/hotspot/share/opto/callGenerator.cpp line 422: > >> 420: >> 421: if (cg != nullptr) { >> 422: if (!allow_inline) { > > Is `!allow_inline` really the correct check here and in LateInlineVirtualCallGenerator::do_late_inline_check()? This is where I ran into trouble with the asserts in previous implementation. Note that even if allow_inline is passed as true to the CallGenerator factory, the factory can force it to false for a variety of reasons. Maybe we should be looking at what kind of CallGenerator we have in `cg`? With the new implementation it is always safe to call C->inline_printer()->record. In case inline printing is disabled, it is just a no-op. Also allow_inline seems to be coming from C->inlining_incrementally(). Is your concern that we might miss to print something? Or that we print something extra that is not true? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1856293304 From duke at openjdk.org Mon Nov 25 10:33:20 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 10:33:20 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Sat, 23 Nov 2024 08:48:42 GMT, Johan Sj?len wrote: > If there's a key-value relationship from bci -> whatever, then we have a balanced binary tree class `Treap` that can be used. There's no `TreapArena` right now, but this PR can add it. @jdksjolen Do you think a binary tree (Treap) will be more suitable than a hash map? I saw that there are also several hash map implementations available in hotspot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2497591789 From duke at openjdk.org Mon Nov 25 10:33:22 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 10:33:22 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: <78zTMV-WOGuk5zZiw_tJJLQdPjyDoW87RSOXrvRB_Bk=.00fe05f4-3c36-43ad-aed3-a14bf384ff83@github.com> References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> <78zTMV-WOGuk5zZiw_tJJLQdPjyDoW87RSOXrvRB_Bk=.00fe05f4-3c36-43ad-aed3-a14bf384ff83@github.com> Message-ID: <1OaqyG5aOLMJkJtSmizLdR97OqO3GYbMATJK1GnUGGo=.68063fff-466f-4691-b034-2600bf99c07f@github.com> On Sat, 23 Nov 2024 01:00:24 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestDuplicatedLateInliningOutput > > src/hotspot/share/compiler/compileTask.hpp line 222: > >> 220: >> 221: /** >> 222: * @deprecated Please rely on Compile::inline_printer. Do not directly write inlining information to tty. > > Can we get rid of these instead of making them deprecated? I would definitely like to get rid of this one but it's used in print_trace_type_profile to also print to unified logging and I do not want to touch this as part of this refactoring to keep the changes manageable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1856329624 From epeter at openjdk.org Mon Nov 25 10:42:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 10:42:24 GMT Subject: RFR: 8340010: Fix vectorization tests with compact headers [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:31:03 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> make failOf comment explicit for Roberto > > Still good! @chhagedorn @rkennke @robcasloz @Hamlin-Li thanks for looking at this! I ran another offline merge and test - looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22199#issuecomment-2497630175 From epeter at openjdk.org Mon Nov 25 10:42:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 10:42:25 GMT Subject: Integrated: 8340010: Fix vectorization tests with compact headers In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:32:44 GMT, Emanuel Peter wrote: > This is a followup to: > https://github.com/openjdk/jdk/pull/20677 / [JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895) Implement JEP 450: Compact Object Headers (Experimental) > > @rkennke fixed the vectorization tests in a very rudamentary way. I now took some time to see what exactly is affected. At the time I reviewed the JEP, I thought only very minor cases were affected, like hand-unrolling etc. But it turns out that there are some more important cases that are affected, like **mixed-type loops**, such as **conversion** between types. Another class of affected loops is **hand-unrolled loops**. > > **The problem is this:** > Since the offset from array-base to payload has changed (16 -> 12), some vector loads/stores can now no longer be aligned. This means vectorization under `+AlignVector` is not possible. > > -------------------------------- > > First: only platforms that require strict-alignment are affected (i.e. `+AlignVector` or `Matcher::misaligned_vectors_ok=false`). I filed [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424) for this, so the impact can be discussed. **The affected platforms seem to be exceptions**. > > ------------------- > > Now on to fixing the tests, which we need to do now anyway. Some actually were currently failing. > > I once ran over `tier1,tier2,tier3,tier4` plus our internal stress testing with `+-AlignVector` and `+-UseCompactObjectHeaders`, collected all failing tests. I also looked at all tests that were already guarding IR rules on `UseCompactObjectHeaders`. > > To almost all tests I added runs with `+-AlignVector` and `+-UseCompactObjectHeaders`. We could leave this also to global runs with these flag combinations. But it is rare that we ever run this, so I thought I want to directly run the "interesting" tests with all combinations. This requires extra test runtime, but I think it is warranted. > > In a few cases I also added stronger IR rules, in tests that were already affected - just to make sure we have the behavior we want. Some cases would not vectorize for other cases, and I put comments there for future reference. This pull request has now been integrated. Changeset: 811d08c0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/811d08c0a4e0da55f306686423aec40d29fabf00 Stats: 938 lines in 15 files changed: 798 ins; 23 del; 117 mod 8340010: Fix vectorization tests with compact headers Reviewed-by: chagedorn, rkennke, mli ------------- PR: https://git.openjdk.org/jdk/pull/22199 From duke at openjdk.org Mon Nov 25 11:35:19 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 11:35:19 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v17] In-Reply-To: References: Message-ID: <8kIFfR08uDMeFio72rq42mmPPOfYEb-ccW2PcG8Rw-U=.e9067604-622a-4c8d-b748-c7c9eb0ed736@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'master' into 8319850 - Change comment style - Change is_enabled to old pattern - Fix TestDuplicatedLateInliningOutput - Fix BCI -1 - Style changes - Fix more style issues - Undo accidental style changes - Add another missing header - Add precompiled header - ... and 12 more: https://git.openjdk.org/jdk/compare/5cbe63f3...5114d189 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/86bd2476..5114d189 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=15-16 Stats: 345845 lines in 4863 files changed: 182635 ins; 129472 del; 33738 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Mon Nov 25 11:56:18 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 11:56:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v2] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Resolve review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/117d1f41..996baf96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=00-01 Stats: 12 lines in 7 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From amitkumar at openjdk.org Mon Nov 25 12:20:51 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 25 Nov 2024 12:20:51 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand Message-ID: This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 ------------- Commit messages: - Adds TEMP effect Changes: https://git.openjdk.org/jdk/pull/22354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22354&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336356 Stats: 214 lines in 3 files changed: 155 ins; 6 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/22354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22354/head:pull/22354 PR: https://git.openjdk.org/jdk/pull/22354 From luhenry at openjdk.org Mon Nov 25 13:07:16 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 25 Nov 2024 13:07:16 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: On Thu, 14 Nov 2024 14:05:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. >> >> Thanks >> >> ## Performance >> Tests on bananapi, for other platform, please check jbs issue for test data. >> >> ### Before >> data >> >> Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op >> >> >> >> >> ### After >> data >> >> Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test typo Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22102#pullrequestreview-2458383738 From duke at openjdk.org Mon Nov 25 13:18:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 13:18:26 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Improve tests, remove edge case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/996baf96..750c241c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=01-02 Stats: 47 lines in 3 files changed: 32 ins; 6 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From qamai at openjdk.org Mon Nov 25 13:18:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 25 Nov 2024 13:18:27 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 11:56:18 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Resolve review comments src/hotspot/share/opto/divnode.cpp line 507: > 505: return nullptr; > 506: } > 507: Signed l = tl->get_con(); // Get divisor You should get the unsigned type instead. src/hotspot/share/opto/divnode.cpp line 513: > 511: } > 512: > 513: if (l == min_jint) { Why excluding this case? You can move the check of division by 1 down here I think. src/hotspot/share/opto/type.hpp line 2173: > 2171: > 2172: template <> > 2173: inline const TypeInt* Type::is() const { I think `cast` would be a better name ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856592007 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856591175 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856588364 From duke at openjdk.org Mon Nov 25 13:18:27 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 13:18:27 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:10:52 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Resolve review comments > > src/hotspot/share/opto/divnode.cpp line 507: > >> 505: return nullptr; >> 506: } >> 507: Signed l = tl->get_con(); // Get divisor > > You should get the unsigned type instead. Also fixed in my latest commit. > src/hotspot/share/opto/divnode.cpp line 513: > >> 511: } >> 512: >> 513: if (l == min_jint) { > > Why excluding this case? You can move the check of division by 1 down here I think. You're right. That's an error. I was working on this just now, see my latest commit. > src/hotspot/share/opto/type.hpp line 2173: > >> 2171: >> 2172: template <> >> 2173: inline const TypeInt* Type::is() const { > > I think `cast` would be a better name I named it `is` for consistency with all the other `is_*` methods but, of course, `cast` would be much less confusing. I'm not sure if we should be consistent or give this a more understandable name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856595486 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856593811 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856597066 From qamai at openjdk.org Mon Nov 25 13:21:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 25 Nov 2024 13:21:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:14:21 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/type.hpp line 2173: >> >>> 2171: >>> 2172: template <> >>> 2173: inline const TypeInt* Type::is() const { >> >> I think `cast` would be a better name > > I named it `is` for consistency with all the other `is_*` methods but, of course, `cast` would be much less confusing. I'm not sure if we should be consistent or give this a more understandable name. Using `cast` would be more consistent with other kinds of casting such as `static_cast` or `dynamic_cast`, so I think `t.cast()` fits into the scene more nicely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856603227 From qamai at openjdk.org Mon Nov 25 13:33:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 25 Nov 2024 13:33:29 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:18:26 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Improve tests, remove edge case src/hotspot/share/opto/divnode.cpp line 488: > 486: > 487: const Type* t = phase->type(div->in(2)); > 488: if (t == TypeClass::ONE) { // Identity? You can move this into `l == 0 || l == 1` below. src/hotspot/share/opto/divnode.cpp line 1184: > 1182: > 1183: if (con == 1) { > 1184: return ConNode::make(TypeClass::ZERO); This should be in `Value` instead. src/hotspot/share/opto/divnode.cpp line 1213: > 1211: } > 1212: // X MOD X is 0 > 1213: if (mod->in(1) == mod->in(2)) { `mod->in(1)->eqv_uncast(mod->in(2))` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856618565 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856619372 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856621633 From aph at openjdk.org Mon Nov 25 13:39:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Nov 2024 13:39:32 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: Message-ID: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> On Fri, 22 Nov 2024 16:53:34 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > revert to c for -1 check src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 280: > 278: > 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { > 280: unsigned int u_value = (juint)c; Keep the type names consistent. Suggestion: juint u_value = (juint)c; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1856624821 From aph at openjdk.org Mon Nov 25 13:39:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 25 Nov 2024 13:39:32 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> Message-ID: On Mon, 25 Nov 2024 13:32:57 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to c for -1 check > > src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 280: > >> 278: >> 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { >> 280: unsigned int u_value = (juint)c; > > Keep the type names consistent. > Suggestion: > > juint u_value = (juint)c; I think we're done now with this change. What are your plans to test the Arm 32-bit version? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1856630127 From amitkumar at openjdk.org Mon Nov 25 13:44:03 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 25 Nov 2024 13:44:03 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: Message-ID: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: unsigned int -> juint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/cad8a9bc..ba9e3867 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=06-07 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From amitkumar at openjdk.org Mon Nov 25 13:44:04 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 25 Nov 2024 13:44:04 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> Message-ID: <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> On Mon, 25 Nov 2024 13:36:32 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 280: >> >>> 278: >>> 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { >>> 280: unsigned int u_value = (juint)c; >> >> Keep the type names consistent. >> Suggestion: >> >> juint u_value = (juint)c; > > I think we're done now with this change. What are your plans to test the Arm 32-bit version? I don't have hardware for arm32 :( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1856636261 From duke at openjdk.org Mon Nov 25 13:44:15 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 13:44:15 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v4] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix bug in unsigned_mod_ideal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/750c241c..6d518bd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=02-03 Stats: 22 lines in 3 files changed: 16 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From mdoerr at openjdk.org Mon Nov 25 13:47:16 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 25 Nov 2024 13:47:16 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Mon, 25 Nov 2024 13:40:46 GMT, Amit Kumar wrote: >> I think we're done now with this change. What are your plans to test the Arm 32-bit version? > > I don't have hardware for arm32 :( We can ask @bulasevich (also see https://wiki.openjdk.org/display/HotSpot/Ports). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1856641219 From chagedorn at openjdk.org Mon Nov 25 13:55:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 13:55:53 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v4] In-Reply-To: References: Message-ID: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Fix local variable name - Revert "Generalize BFS" This reverts commit fbde591803ada158cacc11bc553e1b5061e59ae7. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22136/files - new: https://git.openjdk.org/jdk/pull/22136/files/fbde5918..1c8af282 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22136&range=02-03 Stats: 109 lines in 3 files changed: 15 ins; 84 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22136.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22136/head:pull/22136 PR: https://git.openjdk.org/jdk/pull/22136 From mli at openjdk.org Mon Nov 25 13:56:17 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 25 Nov 2024 13:56:17 GMT Subject: RFR: 8344916: RISC-V: Misaligned access in array fill stub In-Reply-To: References: Message-ID: <4aeQviWhx7WmIrvwMpAiB7VB5yijoOSr2fL40H5EfOY=.c7326025-6854-46e6-a441-56a04808850d@github.com> On Sun, 24 Nov 2024 07:27:08 GMT, Fei Yang wrote: > Hi, Please review this small change. > > In `generate_fill`, we fill the remaining elements by a single 8-byte store when the remaining count is less than 8 bytes in size after `fill_words`. This may overwrite some elements and create misaligned access. While it's not an issue for mordern CPUs with fast misaligned access, this does affect performance on CPUs where misaligned access is emulated by a trap handler and thus is very slow. async-profiler tells 2.8% of `jshort_fill` in flame graph when sampling Specjbb2005 on these platforms. > > In this particular case, the copy address `to` is 8-byte aligned after `fill_words`. So if `AvoidUnalignedAccesses` is true, one choice would be directing control to `L_fill_elements` which avoids alignment issue while filling the remaining elements. > > Test on linux-riscv64 platform: > - [x] tier1-3 (release) > - [x] 2.5% Specjbb2005 performance benefit on both HiFive Unmatched and Premier P550 SBCs. > - [x] No obvious performance impact witnessed on other platforms like BFI-F3 or Pioneer box (-XX:+AvoidUnalignedAccesses). Nice catch and fix! Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22347#pullrequestreview-2458502714 From mli at openjdk.org Mon Nov 25 13:57:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 25 Nov 2024 13:57:20 GMT Subject: RFR: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) [v2] In-Reply-To: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> References: <7vzHYiAWmcl6JzmBwJTiltRZ-yb_3i4yruZ7WOr07ac=.3ce701cd-373e-4266-b64f-1461b7a02820@github.com> Message-ID: <9Xfq3isUwnlM-0ujJ0gT1Ql8DltMtZcCeSNQ_U1hQgw=.33ca8227-438c-402e-bf8f-5fcd7fd2d195@github.com> On Thu, 14 Nov 2024 14:05:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch? >> It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. >> >> Thanks >> >> ## Performance >> Tests on bananapi, for other platform, please check jbs issue for test data. >> >> ### Before >> data >> >> Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op >> >> >> >> >> ### After >> data >> >> Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- >> o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op >> o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op >> o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op >> o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test typo Thanks for your reviewing and discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22102#issuecomment-2498084597 From mli at openjdk.org Mon Nov 25 13:57:21 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 25 Nov 2024 13:57:21 GMT Subject: Integrated: 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 11:45:13 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > It removes the previous added intrinsic for Long/Integer.expand/compress, as on several real hardware, I observe obvious performance regression. > > Thanks > > ## Performance > Tests on bananapi, for other platform, please check jbs issue for test data. > > ### Before > data > > Benchmark - keep intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 11710.439 | 17.936 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 14878.742 | 23.472 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 24555.06 | 2.632 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 35827.714 | 25.022 | ns/op > > > > > ### After > data > > Benchmark - remove intrinsic | (maxNumbits) | (size) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- > o.o.b.j.lang.Integers.compress | N/A | 500 | avgt | 10 | 9294.835 | 1.459 | ns/op > o.o.b.j.lang.Integers.expand | N/A | 500 | avgt | 10 | 5749.835 | 0.945 | ns/op > o.o.b.j.lang.Longs.compress | N/A | 500 | avgt | 10 | 4735.15 | 1.082 | ns/op > o.o.b.j.lang.Longs.expand | N/A | 500 | avgt | 10 | 5668.552 | 2.168 | ns/op > > This pull request has now been integrated. Changeset: 13341917 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/133419177d8ddcfafe0b2bd25ee918bdb3b16d3f Stats: 262 lines in 5 files changed: 0 ins; 261 del; 1 mod 8334474: RISC-V: verify perf of ExpandBits/CompressBits (rvv) Reviewed-by: fyang, rehn, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/22102 From epeter at openjdk.org Mon Nov 25 13:58:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 13:58:18 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v4] In-Reply-To: References: Message-ID: <_FSC260I9kUqy0Ql5m-KNhuTiT6lnoSs0Sa1keu8Lo8=.7ef38f2b-f125-49c7-b570-9bcb76464bac@github.com> On Mon, 25 Nov 2024 13:55:53 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix local variable name > - Revert "Generalize BFS" > > This reverts commit fbde591803ada158cacc11bc553e1b5061e59ae7. Looks good now :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22136#pullrequestreview-2458508039 From duke at openjdk.org Mon Nov 25 14:14:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 14:14:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:29:22 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve tests, remove edge case > > src/hotspot/share/opto/divnode.cpp line 1184: > >> 1182: >> 1183: if (con == 1) { >> 1184: return ConNode::make(TypeClass::ZERO); > > This should be in `Value` instead. This is analogous to code for ModI/LNode::Ideal. I'll file an RFE that this should be changed in all locations > src/hotspot/share/opto/divnode.cpp line 1213: > >> 1211: } >> 1212: // X MOD X is 0 >> 1213: if (mod->in(1) == mod->in(2)) { > > `mod->in(1)->eqv_uncast(mod->in(2))` This is analogous to code for ModI/LNode::Ideal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856682022 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856684659 From duke at openjdk.org Mon Nov 25 14:19:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 14:19:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: <93d_HYKwUltf138FuPWPLOVKo6jOyj7z3x2akf6nV-8=.f5b8d800-9d95-4bb5-9730-b851e9e7e154@github.com> On Mon, 25 Nov 2024 13:28:50 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve tests, remove edge case > > src/hotspot/share/opto/divnode.cpp line 488: > >> 486: >> 487: const Type* t = phase->type(div->in(2)); >> 488: if (t == TypeClass::ONE) { // Identity? > > You can move this into `l == 0 || l == 1` below. This is also the same for ModI/LNode::Ideal. I think all of this code should be reviewed as part of an RFE and then changed together ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856693101 From epeter at openjdk.org Mon Nov 25 14:19:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 14:19:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 14:11:45 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/divnode.cpp line 1213: >> >>> 1211: } >>> 1212: // X MOD X is 0 >>> 1213: if (mod->in(1) == mod->in(2)) { >> >> `mod->in(1)->eqv_uncast(mod->in(2))` > > This is analogous to code for ModI/LNode::Ideal. That may be true. But `uncast` would be still a good idea, so that we can see through `Cast` nodes. I think it could then also be added to `ModI/L`. @merykitty do you have an easy way to unit test this, i.e. to have a test with a Cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856693190 From duke at openjdk.org Mon Nov 25 14:23:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 14:23:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 14:16:50 GMT, Emanuel Peter wrote: >> This is analogous to code for ModI/LNode::Ideal. > > That may be true. But `uncast` would be still a good idea, so that we can see through `Cast` nodes. I think it could then also be added to `ModI/L`. > > @merykitty do you have an easy way to unit test this, i.e. to have a test with a Cast? @eme64 I didn't mean to imply it's not useful. But I think it would be better to do these changes to the existing code for ModI/L and DivI/L in a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856698963 From duke at openjdk.org Mon Nov 25 14:31:19 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 14:31:19 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v5] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Rename to cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/6d518bd1..a249d81b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=03-04 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From qamai at openjdk.org Mon Nov 25 14:31:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 25 Nov 2024 14:31:20 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 14:20:15 GMT, theoweidmannoracle wrote: >> That may be true. But `uncast` would be still a good idea, so that we can see through `Cast` nodes. I think it could then also be added to `ModI/L`. >> >> @merykitty do you have an easy way to unit test this, i.e. to have a test with a Cast? > > @eme64 I didn't mean to imply it's not useful. But I think it would be better to do these changes to the existing code for ModI/L and DivI/L in a separate RFE. I don't see the logic in "Doing it the same as `ModI/LNode` then changing all of them together" instead of "Doing the new thing in the better way then changing the old thing to match it" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856712025 From epeter at openjdk.org Mon Nov 25 14:39:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 14:39:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 14:27:44 GMT, Quan Anh Mai wrote: >> @eme64 I didn't mean to imply it's not useful. But I think it would be better to do these changes to the existing code for ModI/L and DivI/L in a separate RFE. > > I don't see the logic in "Doing it the same as `ModI/LNode` then changing all of them together" instead of "Doing the new thing in the better way then changing the old thing to match it" Yeah, it is a trade-off. I think this here is small enough to just change it now in the same RFE. But if you prefer to do it in a separate RFE, that's fine too for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856728255 From chagedorn at openjdk.org Mon Nov 25 14:47:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 14:47:21 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v4] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:55:53 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix local variable name > - Revert "Generalize BFS" > > This reverts commit fbde591803ada158cacc11bc553e1b5061e59ae7. Thanks Emanuel for your review and also for the offline discussion. I reverted my latest patch again in favor of a more fundamental design for doing BFS traversal on nodes. We'd like to explore that with a [separate RFE](https://bugs.openjdk.org/browse/JDK-8344957) and go with what I've had before the latest update (reverted last commit again). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22136#issuecomment-2498216456 From duke at openjdk.org Mon Nov 25 14:52:20 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 25 Nov 2024 14:52:20 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 14:36:52 GMT, Emanuel Peter wrote: >> I don't see the logic in "Doing it the same as `ModI/LNode` then changing all of them together" instead of "Doing the new thing in the better way then changing the old thing to match it" > > Yeah, it is a trade-off. I think this here is small enough to just change it now in the same RFE. But if you prefer to do it in a separate RFE, that's fine too for me. If I make these changes here now, I need to come up with tests for these uncasts, which will probably take me a while (I have no experience with this yet and I can't tell how long it might take me) and the PR will stall in the meantime. If I open a follow-up RFE, I can take a look at this independently and this PR, whose main intention was to take over the main applicable optimizations from the signed to unsigned can be merged already. So as @eme64 says, it's a trade-off between creating the perfect PR now or getting some good changes in and then making refinements in the same area in a later, follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856749607 From thartmann at openjdk.org Mon Nov 25 14:59:17 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 25 Nov 2024 14:59:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: Message-ID: <8f0PsjaW3DKlC4Y7TIPxYhaXICGK4Zq2vslCXQ4GZo0=.ef4f782b-7c51-4db5-b770-0b7aaecf9889@github.com> On Mon, 25 Nov 2024 14:49:14 GMT, theoweidmannoracle wrote: >> Yeah, it is a trade-off. I think this here is small enough to just change it now in the same RFE. But if you prefer to do it in a separate RFE, that's fine too for me. > > If I make these changes here now, I need to come up with tests for these uncasts, which will probably take me a while (I have no experience with this yet and I can't tell how long it might take me) and the PR will stall in the meantime. If I open a follow-up RFE, I can take a look at this independently and this PR, whose main intention was to take over the main applicable optimizations from the signed to unsigned can be merged already. So as @eme64 says, it's a trade-off between creating the perfect PR now or getting some good changes in and then making refinements in the same area in a later, follow-up PR. Drive-by comment: Such uncast optimizations are definitely non-trivial changes as they tend to trigger other issues by allowing subgraphs to be folded that would otherwise not be folded. So let's make sure we have proper tests for this. In my opinion, putting some of this work into a separate RFE is perfectly fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856762430 From qamai at openjdk.org Mon Nov 25 15:03:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 25 Nov 2024 15:03:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: <8f0PsjaW3DKlC4Y7TIPxYhaXICGK4Zq2vslCXQ4GZo0=.ef4f782b-7c51-4db5-b770-0b7aaecf9889@github.com> References: <8f0PsjaW3DKlC4Y7TIPxYhaXICGK4Zq2vslCXQ4GZo0=.ef4f782b-7c51-4db5-b770-0b7aaecf9889@github.com> Message-ID: On Mon, 25 Nov 2024 14:57:02 GMT, Tobias Hartmann wrote: >> If I make these changes here now, I need to come up with tests for these uncasts, which will probably take me a while (I have no experience with this yet and I can't tell how long it might take me) and the PR will stall in the meantime. If I open a follow-up RFE, I can take a look at this independently and this PR, whose main intention was to take over the main applicable optimizations from the signed to unsigned can be merged already. So as @eme64 says, it's a trade-off between creating the perfect PR now or getting some good changes in and then making refinements in the same area in a later, follow-up PR. > > Drive-by comment: Such uncast optimizations are definitely non-trivial changes as they tend to trigger other issues by allowing subgraphs to be folded that would otherwise not be folded. So let's make sure we have proper tests for this. In my opinion, putting some of this work into a separate RFE is perfectly fine. This is fair, but this point does not apply to my other suggestions. though. Btw `eqv_uncast` is used in `XorNode::Value`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1856768472 From mli at openjdk.org Mon Nov 25 15:08:50 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 25 Nov 2024 15:08:50 GMT Subject: RFR: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector Message-ID: Hi, Can you help to review this patch? Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/22363/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22363&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344960 Stats: 28 lines in 1 file changed: 23 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22363.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22363/head:pull/22363 PR: https://git.openjdk.org/jdk/pull/22363 From thartmann at openjdk.org Mon Nov 25 15:16:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 25 Nov 2024 15:16:19 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v4] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 13:55:53 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix local variable name > - Revert "Generalize BFS" > > This reverts commit fbde591803ada158cacc11bc553e1b5061e59ae7. Still good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22136#pullrequestreview-2458730889 From szaldana at openjdk.org Mon Nov 25 15:36:00 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 25 Nov 2024 15:36:00 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v2] In-Reply-To: References: Message-ID: <87KdDsQ75edFWc_fK1PgwayZZKwBY_LUilCUin0BzRc=.ba744f6a-224a-4a24-b11b-f540f03959f7@github.com> > Hi folks, > > This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). > > Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. > > In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. > > I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). > > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Adding regression test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22331/files - new: https://git.openjdk.org/jdk/pull/22331/files/2725d240..66778c89 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22331&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22331&range=00-01 Stats: 40 lines in 1 file changed: 40 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22331/head:pull/22331 PR: https://git.openjdk.org/jdk/pull/22331 From szaldana at openjdk.org Mon Nov 25 15:36:00 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 25 Nov 2024 15:36:00 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:25:29 GMT, Christian Hagedorn wrote: >> Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding regression test > > Looks good to me. Can you also add a regression test for it? Since this already triggers with `--version`, you can just create a hello world like test and run with the mentioned flags. > > Just a side note: > >> Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > You can directly insert the permalink (in the non-blame view) such that the code is inlined in the PR for easier reading :-) > > https://github.com/openjdk/jdk/blob/6f622da7fbae67d8c1cd9e795127adac58a246a9/src/hotspot/share/opto/compile.cpp#L4327 Thanks for the reviews @chhagedorn @dafedafe! I added a regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22331#issuecomment-2498343923 From chagedorn at openjdk.org Mon Nov 25 16:49:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 16:49:28 GMT Subject: RFR: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates [v4] In-Reply-To: References: Message-ID: <1AloHhOGradz0PG2Czsyh7-t57EdFz1ar8c4M4NOaQ8=.dc17bc68-ad29-4232-b868-8c576f004a45@github.com> On Mon, 25 Nov 2024 13:55:53 GMT, Christian Hagedorn wrote: >> This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. >> >> There are some places where the verification code is >> - missing >> - called twice in row with different methods >> - unnecessarily called >> >> This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. >> >> #### Details of this Patch >> - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. >> - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. >> - One can implement the new `BFSActions` interface to define >> - Whether a node's input should be further visited. >> - Whether a node is a target node for this BFS. >> - What action that should be performed with the target node. >> - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. >> - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: >> - Verify Template Assertion Predicates: >> - For init value: Only `OpaqueLoopInit` >> - For last value: Both `OpaqueLoop*Nodes` >> - Verify Initialized Assertion Predicates: >> - No `OpaqueLoop*Nodes` >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Fix local variable name > - Revert "Generalize BFS" > > This reverts commit fbde591803ada158cacc11bc553e1b5061e59ae7. Thanks Tobias for your re-review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22136#issuecomment-2498526660 From chagedorn at openjdk.org Mon Nov 25 16:49:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 16:49:29 GMT Subject: Integrated: 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 08:17:22 GMT, Christian Hagedorn wrote: > This patch cleans up the `OpaqueLoop*Node` verification code that is called with `PhaseIdeaLoop::assertion_predicate_has_loop_opaque_node()`. > > There are some places where the verification code is > - missing > - called twice in row with different methods > - unnecessarily called > > This patch cleans this up and moves the verification code inside the `TemplateAssertionPredicate` and the `InitializedAssertionPredicate` class. > > #### Details of this Patch > - Doing a simpler BFS similar to what `ReplaceOpaqueStrideInput::replace()` is doing. > - Noticed that the new code looks very similar, so I decided to create a dedicated `DataNodeBFS` class which could be reused again in the future to perform a BFS on data nodes. > - One can implement the new `BFSActions` interface to define > - Whether a node's input should be further visited. > - Whether a node is a target node for this BFS. > - What action that should be performed with the target node. > - Updated `ReplaceOpaqueStrideInput` to use the new `DataNodeBFS/BFSActions` classes. > - Implemented a new `OpaqueLoopNodesVerifier` class using `DataNodeBFS/BFSActions` which does the `OpaqueLoop*Node` verification previously done with `assertion_predicate_has_loop_opaque_node()`: > - Verify Template Assertion Predicates: > - For init value: Only `OpaqueLoopInit` > - For last value: Both `OpaqueLoop*Nodes` > - Verify Initialized Assertion Predicates: > - No `OpaqueLoop*Nodes` > > Thanks, > Christian This pull request has now been integrated. Changeset: 08dfc4a4 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/08dfc4a42e58a13a51fb7be2ebfa1c15daea28a9 Stats: 276 lines in 7 files changed: 151 ins; 92 del; 33 mod 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/22136 From yzheng at openjdk.org Mon Nov 25 16:49:54 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 25 Nov 2024 16:49:54 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v2] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/7a56b644..7f23a823 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=00-01 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From chagedorn at openjdk.org Mon Nov 25 16:55:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 16:55:41 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v2] In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/22136 which is not fully reviewed, yet, but I'd like to already send this PR out for review since I'm away for the rest of the week) > > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloni... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22275/files - new: https://git.openjdk.org/jdk/pull/22275/files/76c609b1..76c609b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22275/head:pull/22275 PR: https://git.openjdk.org/jdk/pull/22275 From yzheng at openjdk.org Mon Nov 25 16:56:14 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 25 Nov 2024 16:56:14 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v3] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge master - address comment. - Override ModifiersProvider.isConcrete in ResolvedJavaType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/7f23a823..15dd865f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=01-02 Stats: 201908 lines in 4047 files changed: 70839 ins; 116328 del; 14741 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From dnsimon at openjdk.org Mon Nov 25 17:06:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 25 Nov 2024 17:06:22 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v2] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Mon, 25 Nov 2024 16:49:54 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ModifiersProvider.java line 140: > 138: > 139: /** > 140: * Returns true if a method is with a real implementation, or if a type can "if this element is a method with a concrete implementation, or a type that can be instantiated" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22111#discussion_r1856967686 From kvn at openjdk.org Mon Nov 25 17:55:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 25 Nov 2024 17:55:14 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) I agree with this fix. We can improve this later, based on your discussion with Dean (please, file RFE). ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22038#pullrequestreview-2459234889 From dlong at openjdk.org Mon Nov 25 19:36:22 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 25 Nov 2024 19:36:22 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 08:14:18 GMT, theoweidmannoracle wrote: > Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2498871342 From jbhateja at openjdk.org Mon Nov 25 20:04:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:09 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21490/files - new: https://git.openjdk.org/jdk/pull/21490/files/5f58eea6..746c970e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=01-02 Stats: 129 lines in 14 files changed: 37 ins; 4 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From jbhateja at openjdk.org Mon Nov 25 20:04:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:12 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> References: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> Message-ID: On Mon, 25 Nov 2024 08:56:31 GMT, Emanuel Peter wrote: > I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. > > > The patch includes IR framework-based scalar constant folding test points. > > You mention this IR test: > > https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 > > Here I only see the use of very trivial values. I think we need more complicated cases. > > What about these: > > * Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. > * Same where it would overflow the FP16 range. > * Negative zero tests. > * Division by powers of 2. > > It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. > src/hotspot/share/opto/divnode.cpp line 789: > >> 787: >> 788: if(t1 == TypeH::ZERO && !g_isnan(t2->getf()) && t2->getf() != 0.0) >> 789: return TypeH::ZERO; > > brackets for if > > Ok, why not also do it for negative zero then? Same as above, IEEE 754 spec treats both +ve and -ve zeros equally during comparison operations. jshell> 0.0f != 0.0f $1 ==> false jshell> 0.0f != -0.0f $2 ==> false jshell> -0.0f != -0.0f $3 ==> false jshell> -0.0f != 0.0f $4 ==> false > src/hotspot/share/opto/divnode.cpp line 797: > >> 795: //------------------------------isA_Copy--------------------------------------- >> 796: // Dividing by self is 1. >> 797: // If the divisor is 1, we are an identity on the dividend. > > Suggestion: > > // If the divisor is 1, we are an identity on the dividend. > > `Dividing by self is 1.` That does not seem to apply here. Maybe you meant `dividing by 1 is self`? The comment mentions the divisor being 1. Looks fine. > src/hotspot/share/opto/divnode.cpp line 836: > >> 834: >> 835: // return multiplication by the reciprocal >> 836: return (new MulHFNode(in(1), phase->makecon(TypeH::make(reciprocal)))); > > Do we have good tests for this optimization? I have added a test point https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R203 I also added detailed comments to explain this better. > src/hotspot/share/opto/mulnode.cpp line 561: > >> 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { >> 560: if( t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT ) return Type::HALF_FLOAT; >> 561: return TypeH::make( t0->getf() * t1->getf() ); > > I hope that `TypeH::make` handles the overflow cases well... does it? > And do we have tests for this? Please refer to following lines of code. https://github.com/openjdk/jdk/pull/21490/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1446 There are two versions of TypeH::make, one with short and the other accepting floating point parameter, in the latter version we explicitly invoke a runtime help to convert float to float16 value, this shall take care of overflow scenario where we return an infinite Float16 value. There is no underflow in the case of a floating point number, for graceful degradation we enter into a sub-normal range and eventually return a zero value. On the other end of the spectrum i.e -ve values range we return a NEGATIVE_INFINITE, existing runtime helper is fully equipped to handle these cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2498908764 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857267174 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266958 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266304 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266117 From jbhateja at openjdk.org Mon Nov 25 20:04:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:13 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:18:41 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/connode.cpp line 49: >> >>> 47: switch( t->basic_type() ) { >>> 48: case T_INT: return new ConINode( t->is_int() ); >>> 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); >> >> That will be quite confusing.... don't you think? > > I mean do we need this? We already have `ConHNode::make` below...? JVM treats, byte and short as constrained integer type, which is why we create ConI and not ConB or ConS. In addition, transform routines of PhaseGVN and PhaseIterGVN use ConNode::make interface to create a constant IR node, it will not be appropriate to add a specialization over there. I have modified the check to remove unnecessary ambiguity while still maintaining the constant creation interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857268078 From chagedorn at openjdk.org Mon Nov 25 20:25:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 20:25:16 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v2] In-Reply-To: <87KdDsQ75edFWc_fK1PgwayZZKwBY_LUilCUin0BzRc=.ba744f6a-224a-4a24-b11b-f540f03959f7@github.com> References: <87KdDsQ75edFWc_fK1PgwayZZKwBY_LUilCUin0BzRc=.ba744f6a-224a-4a24-b11b-f540f03959f7@github.com> Message-ID: <1N7ilg9jHYktMejooWiU99Fa5YK6_O0FdhT4Tq4ZScY=.c346a316-5a0e-4e19-8526-cec3faf13498@github.com> On Mon, 25 Nov 2024 15:36:00 GMT, Sonia Zaldana Calles wrote: >> Hi folks, >> >> This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). >> >> Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. >> >> In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. >> >> I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). >> >> Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). >> >> However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding regression test Thanks for adding the test! Two minor comments, otherwise, looks good. test/hotspot/jtreg/compiler/debug/TestLogStackAssert.java line 31: > 29: * @requires vm.debug == true & vm.compiler2.enabled > 30: * @summary Verify the xmlStream log stack is not left in a bad state > 31: * @library /test/lib / Not required and can be removed. Suggestion: test/hotspot/jtreg/compiler/debug/TestLogStackAssert.java line 32: > 30: * @summary Verify the xmlStream log stack is not left in a bad state > 31: * @library /test/lib / > 32: * @run main/othervm -XX:+LogCompilation -XX:CompileCommand=log,*.* -XX:+CITimeVerbose -Xcomp -Xbatch compiler.debug.TestLogStackAssert `-Xbatch` is implied by `-Xcomp`. Suggestion: * @run main/othervm -XX:+LogCompilation -XX:CompileCommand=log,*.* -XX:+CITimeVerbose -Xcomp compiler.debug.TestLogStackAssert ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22331#pullrequestreview-2459549758 PR Review Comment: https://git.openjdk.org/jdk/pull/22331#discussion_r1857296710 PR Review Comment: https://git.openjdk.org/jdk/pull/22331#discussion_r1857297827 From chagedorn at openjdk.org Mon Nov 25 20:29:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 25 Nov 2024 20:29:32 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v3] In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: > (Note: This is a dependent PR on https://github.com/openjdk/jdk/pull/22136 which is not fully reviewed, yet, but I'd like to already send this PR out for review since I'm away for the rest of the week) > > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloni... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8344171 - Update comment - 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order - Apply suggestions from code review Co-authored-by: Tobias Hartmann - 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates ------------- Changes: https://git.openjdk.org/jdk/pull/22275/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=02 Stats: 87 lines in 4 files changed: 41 ins; 33 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22275/head:pull/22275 PR: https://git.openjdk.org/jdk/pull/22275 From szaldana at openjdk.org Mon Nov 25 20:43:36 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 25 Nov 2024 20:43:36 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v3] In-Reply-To: References: Message-ID: <4v95roGlt4jgnrQQ4kDK1AP0uCyV7j6lWEydZHvyKlo=.0fafd6ea-8071-49f7-8aad-3d6cbefe9ab3@github.com> > Hi folks, > > This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). > > Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. > > In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. > > I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). > > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. > > Cheers, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Changes based on feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22331/files - new: https://git.openjdk.org/jdk/pull/22331/files/66778c89..f55e9ea7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22331&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22331&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22331/head:pull/22331 PR: https://git.openjdk.org/jdk/pull/22331 From dhanalla at openjdk.org Mon Nov 25 21:07:28 2024 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Mon, 25 Nov 2024 21:07:28 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 07:02:37 GMT, Christian Hagedorn wrote: >>> > > Hi @dhanalla, this is not the right way to handle this assertion failure. The assertion is here to catch real issues when creating too many nodes due to a bug in the code. For example, in [JDK-8256934](https://bugs.openjdk.org/browse/JDK-8256934), we hit this assert due to an inefficient cloning algorithm in Partial Peeling. We should not remove the assert. >>> > > For such bugs, you first need to investigate why we hit the node limit with your reproducer. Once you find the problem, it can usually be put into one of the following categories: >>> > > >>> > > 1. We have a real bug and by fixing it, we no longer create this many nodes. >>> > > 2. It is a false-positive and it is expected to create this many nodes (note that the node limit of 80000 is quite large, so it needs to be explained well why it is a false-positive - more often than not, there is still a bug somewhere that is first missed). >>> > > 3. We have a real bug but the fix is too hard, risky, or just not worth the complexity, especially for a real edge-case (also needs to be explained and justified well). >>> > > >>> > > Note that for category 2 and 3, when we cannot easily fix the problem of creating too many nodes, we should implement a bail out fix from the current optimization and not the entire compilation to reduce the performance impact. This was, for example, done in JDK-8256934, where a fix was too risky at that point during the release and a proper fix was delayed. The fix was to bail out from Partial Peeling when hitting a critically high amount of live nodes (an estimate to ensure we never hit the node limit). >>> > > You should then describe your analysis in the PR then explain your proposed solution. You should also add the reproducer as test case to your patch. >>> > >>> > >>> > Thanks @chhagedorn for reviewing this PR. This scenario corresponds to Case 2 mentioned above, where more than 80,000 nodes are expected to be created. As an alternative solution, could we consider limiting the JVM option `EliminateAllocationArraySizeLimit` (in `c2_globals.hpp`) to a range between 0 and 1024, instead of the current range of 0 to `max_jint`, as the upper limit of `max_jint` may not be practical? >>> >>> Hi @dhanalla, can you elaborate more why it is expected and not an actual bug where we unnecessarily create too many nodes? >> >> The test case (ReductionPerf.java) involves multiple arrays, each with a size of 8k. Using the JVM option -XX:EliminateAllocationArraySizeLimit=10240 (which is la... > >> > > > Hi @dhanalla, this is not the right way to handle this assertion failure. The assertion is here to catch real issues when creating too many nodes due to a bug in the code. For example, in [JDK-8256934](https://bugs.openjdk.org/browse/JDK-8256934), we hit this assert due to an inefficient cloning algorithm in Partial Peeling. We should not remove the assert. >> > > > For such bugs, you first need to investigate why we hit the node limit with your reproducer. Once you find the problem, it can usually be put into one of the following categories: >> > > > >> > > > 1. We have a real bug and by fixing it, we no longer create this many nodes. >> > > > 2. It is a false-positive and it is expected to create this many nodes (note that the node limit of 80000 is quite large, so it needs to be explained well why it is a false-positive - more often than not, there is still a bug somewhere that is first missed). >> > > > 3. We have a real bug but the fix is too hard, risky, or just not worth the complexity, especially for a real edge-case (also needs to be explained and justified well). >> > > > >> > > > Note that for category 2 and 3, when we cannot easily fix the problem of creating too many nodes, we should implement a bail out fix from the current optimization and not the entire compilation to reduce the performance impact. This was, for example, done in JDK-8256934, where a fix was too risky at that point during the release and a proper fix was delayed. The fix was to bail out from Partial Peeling when hitting a critically high amount of live nodes (an estimate to ensure we never hit the node limit). >> > > > You should then describe your analysis in the PR then explain your proposed solution. You should also add the reproducer as test case to your patch. >> > > >> > > >> > > Thanks @chhagedorn for reviewing this PR. This scenario corresponds to Case 2 mentioned above, where more than 80,000 nodes are expected to be created. As an alternative solution, could we consider limiting the JVM option `EliminateAllocationArraySizeLimit` (in `c2_globals.hpp`) to a range between 0 and 1024, instead of the current range of 0 to `max_jint`, as the upper limit of `max_jint` may not be practical? >> > >> > >> > Hi @dhanalla, can you elaborate more why it is expected and not an actual bug where we unnecessarily create too many nodes? >> >> The test case (ReductionPerf.java) involves multiple arrays, each with a size of 8k. Using the JVM option -XX:EliminateAllocationArraySizeLimit... Thanks @chhagedorn, I have addressed your feedback. Could you please review the latest changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2499029907 From jsjolen at openjdk.org Mon Nov 25 21:10:16 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 25 Nov 2024 21:10:16 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 19:33:58 GMT, Dean Long wrote: > > Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? > > Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. Don't use `ttyLock`, we really want to get rid of that mechanism. The best would be to port the output to UL, but if that's not possible use a `stringStream` as Dean said. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2499035914 From dlong at openjdk.org Mon Nov 25 22:20:41 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 25 Nov 2024 22:20:41 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22323#pullrequestreview-2459743745 From dlong at openjdk.org Mon Nov 25 22:35:59 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 25 Nov 2024 22:35:59 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 10:07:30 GMT, theoweidmannoracle wrote: > With the new implementation it is always safe to call C->inline_printer()->record. In case inline printing is disabled, it is just a no-op. Also allow_inline seems to be coming from C->inlining_incrementally(). Is your concern that we might miss to print something? Yes, that we might miss printing something when allow_inline was true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1857427666 From fyang at openjdk.org Tue Nov 26 01:03:44 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Nov 2024 01:03:44 GMT Subject: Integrated: 8344916: RISC-V: Misaligned access in array fill stub In-Reply-To: References: Message-ID: On Sun, 24 Nov 2024 07:27:08 GMT, Fei Yang wrote: > Hi, Please review this small change. > > In `generate_fill`, we fill the remaining elements by a single 8-byte store when the remaining count is less than 8 bytes in size after `fill_words`. This may overwrite some elements and create misaligned access. While it's not an issue for mordern CPUs with fast misaligned access, this does affect performance on CPUs where misaligned access is emulated by a trap handler and thus is very slow. async-profiler tells 2.8% of `jshort_fill` in flame graph when sampling Specjbb2005 on these platforms. > > In this particular case, the copy address `to` is 8-byte aligned after `fill_words`. So if `AvoidUnalignedAccesses` is true, one choice would be directing control to `L_fill_elements` which avoids alignment issue while filling the remaining elements. > > Test on linux-riscv64 platform: > - [x] tier1-3 (release) > - [x] 2.5% Specjbb2005 performance benefit on both HiFive Unmatched and Premier P550 SBCs. > - [x] No obvious performance impact witnessed on other platforms like BFI-F3 or Pioneer box (-XX:+AvoidUnalignedAccesses). This pull request has now been integrated. Changeset: 5e0d42b6 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/5e0d42b6a633d58d7303257569a7b45483f2db53 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344916: RISC-V: Misaligned access in array fill stub Reviewed-by: rehn, mli ------------- PR: https://git.openjdk.org/jdk/pull/22347 From fyang at openjdk.org Tue Nov 26 01:03:43 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Nov 2024 01:03:43 GMT Subject: RFR: 8344916: RISC-V: Misaligned access in array fill stub In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 08:02:49 GMT, Robbin Ehn wrote: >> Hi, Please review this small change. >> >> In `generate_fill`, we fill the remaining elements by a single 8-byte store when the remaining count is less than 8 bytes in size after `fill_words`. This may overwrite some elements and create misaligned access. While it's not an issue for mordern CPUs with fast misaligned access, this does affect performance on CPUs where misaligned access is emulated by a trap handler and thus is very slow. async-profiler tells 2.8% of `jshort_fill` in flame graph when sampling Specjbb2005 on these platforms. >> >> In this particular case, the copy address `to` is 8-byte aligned after `fill_words`. So if `AvoidUnalignedAccesses` is true, one choice would be directing control to `L_fill_elements` which avoids alignment issue while filling the remaining elements. >> >> Test on linux-riscv64 platform: >> - [x] tier1-3 (release) >> - [x] 2.5% Specjbb2005 performance benefit on both HiFive Unmatched and Premier P550 SBCs. >> - [x] No obvious performance impact witnessed on other platforms like BFI-F3 or Pioneer box (-XX:+AvoidUnalignedAccesses). > > Thanks! @robehn @Hamlin-Li : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22347#issuecomment-2499340720 From fyang at openjdk.org Tue Nov 26 07:17:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 26 Nov 2024 07:17:39 GMT Subject: RFR: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 15:03:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. > It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. > > Thanks Seems fine to me. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22363#pullrequestreview-2460587790 From epeter at openjdk.org Tue Nov 26 07:36:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 07:36:46 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> Message-ID: On Mon, 25 Nov 2024 19:55:27 GMT, Jatin Bhateja wrote: >> I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. >> >>> The patch includes IR framework-based scalar constant folding test points. >> You mention this IR test: >> https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 >> >> Here I only see the use of very trivial values. I think we need more complicated cases. >> >> What about these: >> - Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. >> - Same where it would overflow the FP16 range. >> - Negative zero tests. >> - Division by powers of 2. >> >> It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. > >> I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. >> >> > The patch includes IR framework-based scalar constant folding test points. >> > You mention this IR test: >> > https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 >> >> Here I only see the use of very trivial values. I think we need more complicated cases. >> >> What about these: >> >> * Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. >> * Same where it would overflow the FP16 range. >> * Negative zero tests. >> * Division by powers of 2. >> >> It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. > > [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) > Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. @jatin-bhateja > [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. Maybe I'm not making myself clear here. The test vectors will never constant fold - the values you read from an array load will always be the full range of their type, and not a constant. And you added constant folding IGVN optimizations. So we should test both: - Compile-time variables: for this you can use array element loads. You have to generate the values randomly beforehand, spanning the whole Float16 value range. This I think is covered somewhat adequately. - Compile-time constants: for this you cannot use array element loads - they will not be constants. You have to use literals, or you can set `static final int val = RANDOM.nextInt();`, which will constant fold during compilation, or you can use `MethodHandles.constant(int.class, 1);` to get compile-time constants, that you can change and trigger recompilation with the new "constant". It starts with something as simple as your constant folding of addition: // Supplied function returns the sum of the inputs. // This also type-checks the inputs for sanity. Guaranteed never to // be passed a TOP or BOTTOM type, these are filtered out by pre-check. const Type* AddHFNode::add_ring(const Type* t0, const Type* t1) const { if (!t0->isa_half_float_constant() || !t1->isa_half_float_constant()) { return bottom_type(); } return TypeH::make(t0->getf() + t1->getf()); } Which uses this code: const TypeH *TypeH::make(float f) { assert( StubRoutines::f2hf_adr() != nullptr, ""); short hf = StubRoutines::f2hf(f); return (TypeH*)(new TypeH(hf))->hashcons(); } You are doing the addition in `float`, and then casting back to `half_float`. Probably correct. But does it do the rounding correctly? Does it deal with `infty` and `NaN` correctly? Probably, but I would like to see tests for that. This is the simple stuff. Then there are more complex cases. const Type* MinHFNode::add_ring(const Type* t0, const Type* t1) const { const TypeH* r0 = t0->isa_half_float_constant(); const TypeH* r1 = t1->isa_half_float_constant(); if (r0 == nullptr || r1 == nullptr) { return bottom_type(); } if (r0->is_nan()) { return r0; } if (r1->is_nan()) { return r1; } float f0 = r0->getf(); float f1 = r1->getf(); if (f0 != 0.0f || f1 != 0.0f) { return f0 < f1 ? r0 : r1; } // As per IEEE 754 specification, floating point comparison consider +ve and -ve // zeros as equals. Thus, performing signed integral comparison for max value // detection. return (jint_cast(f0) < jint_cast(f1)) ? r0 : r1; } Is this adequately tested over the whole range of inputs? Of course the inputs have to be **constant**, otherwise if you only do array loads, the values are obviously variable, i.e. they would fail at the `isa_half_float_constant` check. You do have some constant folding tests like this: @Test @IR(counts = {IRNode.MIN_HF, " 0 ", IRNode.REINTERPRET_S2HF, " 0 ", IRNode.REINTERPRET_HF2S, " 0 "}, applyIfCPUFeature = {"avx512_fp16", "true"}) public void testMinConstantFolding() { assertResult(min(valueOf(1.0f), valueOf(2.0f)).floatValue(), 1.0f, "testMinConstantFolding"); assertResult(min(valueOf(0.0f), valueOf(-0.0f)).floatValue(), -0.0f, "testMinConstantFolding"); } But this is **only 2 examples for min**. It does not cover all cases by a long shot. It covers 2 "nice" cases. I do not think that is sufficient. Often the bugs are hiding in special cases. Testing is really important to me. I've made the experience myself where I did not test optimizations well and later it can turn into a bug. Comments like these do not give me much confidence: > functional validation is covering the entire Float16 value range. Then I review the tests, and see: not all cases are covered. Now what am I supposed to do as a reviewer? It does not make me trust what you say in the future. Maybe this is all a misunderstanding - if so I hope my lengthy explanation clarifies what I mean. What do you think @Bhavana-Kilambi @PaulSandoz ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499876085 From epeter at openjdk.org Tue Nov 26 07:44:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 07:44:45 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Another example where I asked if we have good tests: ![image](https://github.com/user-attachments/assets/8fafd51e-9fed-453f-aedb-7dc6d6d17cc1) And the test you point to is this: ![image](https://github.com/user-attachments/assets/0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff) It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://github.com/user-attachments/assets/8f2260e5-0075-4d34-9d30-2cec817c72f2) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499889305 From qxing at openjdk.org Tue Nov 26 08:21:49 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 26 Nov 2024 08:21:49 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` Message-ID: Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. ------------- Commit messages: - Remove unused variable `after_transition` in `generate_native_wrapper`. - Remove unused variable `tmp_vmreg` and `temploc` in `generate_native_wrapper`. - Remove unused variable `in_elem_bt` in `generate_native_wrapper`. Changes: https://git.openjdk.org/jdk/pull/22384/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22384&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345040 Stats: 29 lines in 6 files changed: 0 ins; 29 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22384.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22384/head:pull/22384 PR: https://git.openjdk.org/jdk/pull/22384 From jbhateja at openjdk.org Tue Nov 26 08:28:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Nov 2024 08:28:44 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution > Another example where I asked if we have good tests: ![image](https://private-user-images.githubusercontent.com/32593061/389841818-8fafd51e-9fed-453f-aedb-7dc6d6d17cc1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE4MTgtOGZhZmQ1MWUtOWZlZC00NTNmLWFlZGItN2RjNmQ2ZDE3Y2MxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMwZTBhOTVjOGRmNzViY2ZjYWU0M2E3ZmE1ZWEzYmYzY2E1YmQxN2JiZDkwOGJiYjZhNTcxZTFmZDc3MGU2ZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-qd93PHlVMGcEbMblqKRIgdGc6tj-M7sq4oglGpgtSA) > > And the test you point to is this: ![image](https://private-user-images.githubusercontent.com/32593061/389841921-0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE5MjEtMGJmZGExZDctN2JjMC00ZTViLThlYTctMTcxYTAyYTgwNWZmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiMWIzYWUzYjY0NDE0NWUzMzYwMTAxMDk3MzM2YmU1MzdhNjlhZjk0ODdjN2U4OTZjMmI5YWVlMTZmMDkwZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bpkhyUSEqf80pl8reM1Wa7OCvPX6Z3muzqlWOVMCnjs) > > It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://private-user-images.githubusercontent.com/32593061/389842530-8f2260e5-0075-4d34-9d30-2cec817c72f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDI1MzAtOGYyMjYwZTUtMDA3NS00ZDM0LTlkMzAtMmNlYzgxN2M3MmYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ1YjNiNmY0NzQ2ZjEzMjk5ZTM1N2ZkZjk4MGRlYjYzNGRiYjg1NTQxZGViMTNhMTI1MDEyN2YxMjViYWNiNjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7ThWV8y58sDmCuTzt g62HlvKu93Is1R6OiomwmSM8u8) Please refer to my detailed comments on divide by power of two transformation, test point specifically test division to multiplication transformation if divisor is POT. https://github.com/openjdk/jdk/pull/21490/files#diff-ff6734d21eacbbdeae65d3b11f5261cbb6158752a9ccf5fb13eb0d2e5eb3f414R829 https://github.com/openjdk/jdk/pull/21490/files#diff-ff6734d21eacbbdeae65d3b11f5261cbb6158752a9ccf5fb13eb0d2e5eb3f414R839 Hi @eme64 I can feel the reviewer's pain, I think adding one gtest makes sense here to test various newly added Type primitives like geth, is_nan etc and idioms being folded in newly added value transformation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499970345 From epeter at openjdk.org Tue Nov 26 08:31:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 08:31:47 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:25:46 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > >> Another example where I asked if we have good tests: ![image](https://private-user-images.githubusercontent.com/32593061/389841818-8fafd51e-9fed-453f-aedb-7dc6d6d17cc1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE4MTgtOGZhZmQ1MWUtOWZlZC00NTNmLWFlZGItN2RjNmQ2ZDE3Y2MxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMwZTBhOTVjOGRmNzViY2ZjYWU0M2E3ZmE1ZWEzYmYzY2E1YmQxN2JiZDkwOGJiYjZhNTcxZTFmZDc3MGU2ZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-qd93PHlVMGcEbMblqKRIgdGc6tj-M7sq4oglGpgtSA) >> >> And the test you point to is this: ![image](https://private-user-images.githubusercontent.com/32593061/389841921-0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE5MjEtMGJmZGExZDctN2JjMC00ZTViLThlYTctMTcxYTAyYTgwNWZmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiMWIzYWUzYjY0NDE0NWUzMzYwMTAxMDk3MzM2YmU1MzdhNjlhZjk0ODdjN2U4OTZjMmI5YWVlMTZmMDkwZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bpkhyUSEqf80pl8reM1Wa7OCvPX6Z3muzqlWOVMCnjs) >> >> It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://private-user-images.githubusercontent.com/32593061/389842530-8f2260e5-0075-4d34-9d30-2cec817c72f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDI1MzAtOGYyMjYwZTUtMDA3NS00ZDM0LTlkMzAtMmNlYzgxN2M3MmYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXh... @jatin-bhateja > I can feel the reviewer's pain Then please do something about it! Your comments are helpful. But they do not answer my request for better test coverage. Yes, `gtest` would be helpful. But also Java end-to-end tests are required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499977879 From rcastanedalo at openjdk.org Tue Nov 26 08:54:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 26 Nov 2024 08:54:57 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: <4x786-yQhv56X2UOLUv7cmYHnO2lqTWAj-K5g2dZ4jY=.ab5ee8d4-6edc-4284-875e-908e9b7d32f1@github.com> On Mon, 25 Nov 2024 17:52:22 GMT, Vladimir Kozlov wrote: > I agree with this fix. Thanks for reviewing, Vladimir! > We can improve this later, based on your discussion with Dean (please, file RFE). Will do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2500021303 From rcastanedalo at openjdk.org Tue Nov 26 08:54:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 26 Nov 2024 08:54:59 GMT Subject: Integrated: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 12:37:21 GMT, Roberto Casta?eda Lozano wrote: > This changeset takes into account the presence of `BoxLock` nodes in a basic block when determining whether the block is empty and [can be removed](https://github.com/openjdk/jdk/blob/5729227651969f542f040e5d0bfbf9b0b99b5698/src/hotspot/share/opto/compile.cpp#L2997). Special treatment of `BoxLock` nodes is required because these are not Mach nodes, yet they [are preserved in C2's back-end](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/share/opto/matcher.cpp#L2278) and result in [actual machine code being generated](https://github.com/openjdk/jdk/blob/f0b251d76078e8d5b47e967b0449c4cbdcb5a005/src/hotspot/cpu/x86/x86_64.ad#L1544). The proposed change avoids wrongly removing basic blocks consisting only of `BoxLock` and other non-Mach nodes, and crashing when the register that should have been defined by the wrongly removed `BoxLock` node is used (see complete failure analysis in the [JBS description](https://bugs.openjdk.org/browse/JDK-833766 0)). > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode) This pull request has now been integrated. Changeset: 01052035 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/0105203575182e24a56a38a12da7c1af58ea0a78 Stats: 93 lines in 2 files changed: 88 ins; 0 del; 5 mod 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty Co-authored-by: Emanuel Peter Reviewed-by: qamai, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22038 From rcastanedalo at openjdk.org Tue Nov 26 09:07:56 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 26 Nov 2024 09:07:56 GMT Subject: RFR: 8337660: C2: basic blocks with only BoxLock nodes are wrongly treated as empty In-Reply-To: <4x786-yQhv56X2UOLUv7cmYHnO2lqTWAj-K5g2dZ4jY=.ab5ee8d4-6edc-4284-875e-908e9b7d32f1@github.com> References: <4x786-yQhv56X2UOLUv7cmYHnO2lqTWAj-K5g2dZ4jY=.ab5ee8d4-6edc-4284-875e-908e9b7d32f1@github.com> Message-ID: On Tue, 26 Nov 2024 08:50:05 GMT, Roberto Casta?eda Lozano wrote: > We can improve this later, based on your discussion with Dean (please, file RFE). Reported here: [JDK-8345042](https://bugs.openjdk.org/browse/JDK-8345042). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22038#issuecomment-2500052778 From mli at openjdk.org Tue Nov 26 09:29:41 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Nov 2024 09:29:41 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. Thanks for catching and fix. riscv part looks good to me, seems other platforms have the similar issue, but good to have others have another look. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22384#pullrequestreview-2460890657 From mli at openjdk.org Tue Nov 26 09:44:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Nov 2024 09:44:45 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22323#pullrequestreview-2460931431 From dnsimon at openjdk.org Tue Nov 26 09:53:44 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 26 Nov 2024 09:53:44 GMT Subject: Integrated: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. This pull request has now been integrated. Changeset: 3a625f38 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/3a625f38aa4ab611fe5c7dffe420abce826d0d7e Stats: 19 lines in 1 file changed: 13 ins; 0 del; 6 mod 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails Reviewed-by: syan, dlong, mli ------------- PR: https://git.openjdk.org/jdk/pull/22323 From dnsimon at openjdk.org Tue Nov 26 09:53:43 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 26 Nov 2024 09:53:43 GMT Subject: RFR: 8344628: Test TestEnableJVMCIProduct.java run with virtual thread intermittent fails In-Reply-To: References: Message-ID: <5UkUmmHOtpPKn6yXxLhwCe1R-a8WCju6lGNyTmWNVlE=.e747f8c0-8dc9-40c4-81a0-ff4bd7dfa3aa@github.com> On Fri, 22 Nov 2024 13:56:51 GMT, Doug Simon wrote: > This PR prevents a rare, intermittent failure of TestEnableJVMCIProduct.java. > It does this by writing the expected test output to a file instead of stdout to avoid issues with VM error logging interleaving with the test output. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22323#issuecomment-2500151428 From luhenry at openjdk.org Tue Nov 26 10:26:43 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 26 Nov 2024 10:26:43 GMT Subject: RFR: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 15:03:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. > It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. > > Thanks Marked as reviewed by luhenry (Committer). test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java line 53: > 51: > 52: public static void main(String args[]) { > 53: TestFramework framework = new TestFramework(TestFloatConversionsVectorNaN.class); Is it more usual to take this approach or to call `TestFramework.runWithFlags` multiple times? ------------- PR Review: https://git.openjdk.org/jdk/pull/22363#pullrequestreview-2461036952 PR Review Comment: https://git.openjdk.org/jdk/pull/22363#discussion_r1858212083 From kbarrett at openjdk.org Tue Nov 26 11:04:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 26 Nov 2024 11:04:47 GMT Subject: RFR: 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor Message-ID: Please review this trivial change to use nullptr instead of a literal 0 in a call to Node::dump_bfs by the MemPointer ctor. Testing: mach5 tier1 ------------- Commit messages: - fix backsliding Changes: https://git.openjdk.org/jdk/pull/22388/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22388&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345050 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22388.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22388/head:pull/22388 PR: https://git.openjdk.org/jdk/pull/22388 From mli at openjdk.org Tue Nov 26 11:05:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Nov 2024 11:05:39 GMT Subject: RFR: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector In-Reply-To: References: Message-ID: <74oM0ZXeo6vqlPeiFI7arlCwybgk3_nGAxKZwaf2O3s=.4de3d26e-7383-4b72-a359-efef6d44ae39@github.com> On Tue, 26 Nov 2024 10:24:11 GMT, Ludovic Henry wrote: >> Hi, >> Can you help to review this patch? >> Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. >> It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. >> >> Thanks > > test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java line 53: > >> 51: >> 52: public static void main(String args[]) { >> 53: TestFramework framework = new TestFramework(TestFloatConversionsVectorNaN.class); > > Is it more usual to take this approach or to call `TestFramework.runWithFlags` multiple times? Not quite sure, but seems the similar tests changes are following the same pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22363#discussion_r1858271624 From chagedorn at openjdk.org Tue Nov 26 11:12:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 26 Nov 2024 11:12:44 GMT Subject: RFR: 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 11:00:56 GMT, Kim Barrett wrote: > Please review this trivial change to use nullptr instead of a literal 0 in a > call to Node::dump_bfs by the MemPointer ctor. > > Testing: mach5 tier1 Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22388#pullrequestreview-2461145711 From chagedorn at openjdk.org Tue Nov 26 11:13:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 26 Nov 2024 11:13:45 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v3] In-Reply-To: <4v95roGlt4jgnrQQ4kDK1AP0uCyV7j6lWEydZHvyKlo=.0fafd6ea-8071-49f7-8aad-3d6cbefe9ab3@github.com> References: <4v95roGlt4jgnrQQ4kDK1AP0uCyV7j6lWEydZHvyKlo=.0fafd6ea-8071-49f7-8aad-3d6cbefe9ab3@github.com> Message-ID: On Mon, 25 Nov 2024 20:43:36 GMT, Sonia Zaldana Calles wrote: >> Hi folks, >> >> This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). >> >> Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. >> >> In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. >> >> I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). >> >> Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). >> >> However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22331#pullrequestreview-2461147427 From mli at openjdk.org Tue Nov 26 11:13:46 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Nov 2024 11:13:46 GMT Subject: RFR: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 15:03:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. > It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. > > Thanks Thanks for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22363#issuecomment-2500326603 From mli at openjdk.org Tue Nov 26 11:13:46 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 26 Nov 2024 11:13:46 GMT Subject: Integrated: 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 15:03:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Some background: COH change the header size of an object (from 12 to 8) when compress klass headers is on, this invalidate some of the alignment check in SLP, so the fix is to disable the IR checks when either UseCompactObjectHeaders or AlignVector is on. > It's a follow-up of JDK-8343827 and JDK-8340010 on riscv. > > Thanks This pull request has now been integrated. Changeset: 6da3ecd6 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/6da3ecd65ddeb94587933c69ca8b9c279c70ac24 Stats: 28 lines in 1 file changed: 23 ins; 0 del; 5 mod 8344960: RISC-V: fix TestFloatConversionsVectorNaN for COH and AlignVector Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/22363 From shade at openjdk.org Tue Nov 26 11:21:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 26 Nov 2024 11:21:38 GMT Subject: RFR: 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 11:00:56 GMT, Kim Barrett wrote: > Please review this trivial change to use nullptr instead of a literal 0 in a > call to Node::dump_bfs by the MemPointer ctor. > > Testing: mach5 tier1 Agree this is trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22388#pullrequestreview-2461164095 From duke at openjdk.org Tue Nov 26 14:13:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 26 Nov 2024 14:13:03 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v11] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 18:47:05 GMT, Johan Sj?len wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix more style issues > > src/hotspot/share/opto/printinlining.cpp line 61: > >> 59: >> 60: return locate_call(state->caller(), nullptr)->at_bci(state->bci(), create_for); >> 61: } > > Can we be on the safe side and convert this into an iterative process instead, so that we don't have to worry about stack usage? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1858607835 From duke at openjdk.org Tue Nov 26 14:13:02 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 26 Nov 2024 14:13:02 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Sat, 23 Nov 2024 01:07:18 GMT, Dean Long wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestDuplicatedLateInliningOutput > > src/hotspot/share/opto/printinlining.cpp line 60: > >> 58: return locate(state->caller(), nullptr)->at_bci(state->bci(), callee); >> 59: } >> 60: > > It looks like you are building a tree, just like InlineTree. I wonder if it would make sense to unify them somehow in the future. I thought about this too. Should I open an RFE for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1858608940 From duke at openjdk.org Tue Nov 26 14:13:02 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 26 Nov 2024 14:13:02 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 22:33:21 GMT, Dean Long wrote: >> With the new implementation it is always safe to call C->inline_printer()->record. In case inline printing is disabled, it is just a no-op. Also allow_inline seems to be coming from C->inlining_incrementally(). Is your concern that we might miss to print something? Or that we print something extra that is not true? > >> With the new implementation it is always safe to call C->inline_printer()->record. In case inline printing is disabled, it is just a no-op. Also allow_inline seems to be coming from C->inlining_incrementally(). Is your concern that we might miss to print something? > > Yes, that we might miss printing something when allow_inline was true. I think I didn't change the logic behind when it would print now as compared to before my refactoring. I think investigating whether this is really the correct check is a bit beyond the scope about this PR. Should I file a RFE for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1858604707 From duke at openjdk.org Tue Nov 26 14:13:02 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 26 Nov 2024 14:13:02 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v18] In-Reply-To: References: Message-ID: <07brJy6xgJmiLKatpzZJbjKsIQZpejKi9ovKiW5Ipxc=.be477b8a-524e-4b37-b4c9-e0ca0416f69c@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Fix style - Derecursify locate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/5114d189..ff76d160 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=16-17 Stats: 13 lines in 1 file changed: 10 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From bkilambi at openjdk.org Tue Nov 26 15:10:49 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:10:49 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 201: > 199: > 200: @Test > 201: @IR(counts = {IRNode.MUL_HF, " >0 ", IRNode.REINTERPRET_S2HF, " >0 ", IRNode.REINTERPRET_HF2S, " >0 "}, There's a bit of inconsistency in format for " >0 ". In some of the IR rules above, it's "> 0" and here it's " >0 ". Maybe follow a single format? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858720659 From bkilambi at openjdk.org Tue Nov 26 15:18:54 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:18:54 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 70: > 68: @Warmup(10000) > 69: @IR(counts = {IRNode.ADD_VHF, ">= 1"}, > 70: applyIfCPUFeatureOr = {"avx512_fp16", "true"}) this should be just `applyIfCPUFeature`. When I add the `sve` feature to this list, I will change it to `applyIfCPUFeatureOr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858733245 From bkilambi at openjdk.org Tue Nov 26 15:24:54 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:24:54 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 43: > 41: > 42: @Test > 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) Would it probably be more readable if `applyIfCPUFeatureAnd` and `counts` are in separate lines? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858743792 From bkilambi at openjdk.org Tue Nov 26 15:42:52 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:42:52 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: <7n50_F8vrK70EijMOWNg_OPZZdrB4qp0LVi429w0McU=.0673ea33-7980-4f26-8a24-377753797276@github.com> On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/share/opto/library_call.cpp line 8659: > 8657: return true; > 8658: } > 8659: This line can be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858776695 From dfenacci at openjdk.org Tue Nov 26 17:06:41 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 26 Nov 2024 17:06:41 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. Thanks for cleaning up. Looks good to me but could only test x86_64 and aarch64. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/22384#pullrequestreview-2462201733 From yzheng at openjdk.org Tue Nov 26 17:14:35 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 17:14:35 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge master - address comments. - Merge master - address comment. - Override ModifiersProvider.isConcrete in ResolvedJavaType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/15dd865f..3b4d58fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=02-03 Stats: 12558 lines in 235 files changed: 8304 ins; 2820 del; 1434 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From never at openjdk.org Tue Nov 26 17:14:35 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 26 Nov 2024 17:14:35 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Tue, 26 Nov 2024 17:11:01 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - address comments. > - Merge master > - address comment. > - Override ModifiersProvider.isConcrete in ResolvedJavaType Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22111#pullrequestreview-2462220057 From dlong at openjdk.org Tue Nov 26 17:21:15 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Nov 2024 17:21:15 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v9] In-Reply-To: References: Message-ID: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - redo without bailout - Merge remote-tracking branch 'origin/master' into 8340141 - add missing bailout checks - C1 fix - remove blank line - Merge master - bail out on old methods - redo VM state - fix errors - make sure to be in VM state when checking is_old - ... and 2 more: https://git.openjdk.org/jdk/compare/4d4cef80...7a7bdb86 ------------- Changes: https://git.openjdk.org/jdk/pull/21148/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21148&range=08 Stats: 40 lines in 3 files changed: 21 ins; 18 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21148.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21148/head:pull/21148 PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Tue Nov 26 17:21:16 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Nov 2024 17:21:16 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v8] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 22:08:21 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request incrementally with two additional commits since the last revision: > > - add missing bailout checks > - C1 fix New version. I was going to add dependencies for any old methods found, but then I realized we check jvmti_state_changed() at the end, which will throw out the compilation if any methods are redefined during the compilation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2499498959 From kvn at openjdk.org Tue Nov 26 17:23:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Nov 2024 17:23:42 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: <6QuXcF-ii5ZH26MYXRQ_QrUawdsw2UeylYYFkyYPk9w=.2dbf44ae-9aa4-4324-ada0-c0bbce6f54fe@github.com> On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22384#pullrequestreview-2462244909 From kvn at openjdk.org Tue Nov 26 17:34:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Nov 2024 17:34:44 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional In-Reply-To: References: Message-ID: <0afLSZbCNxy6H8PbPzuIDzz4zqKh-xDt1YFGt17bejw=.4362d5a8-f8b9-44e2-a96d-e2421f316c63@github.com> On Fri, 22 Nov 2024 11:11:55 GMT, Evgeny Nikitin wrote: > For CTW, zero classes in provided jar is now a failure. > This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. > > This PR makes this behaviour controllable. Default reaction is a failure, like before. What is default value of `allow_zero_class_count` and where it is set? Why you even need this to be controlled and not default behavior? What is benefit of having error vs warning for empty `jar`? Should you check `totalClassCount` too to catch empty `jar`? As I see `classCount` could be 0 if specified `classStart` and `classStop` as the same which could happened regardless number of classes in `jar` file. ------------- PR Review: https://git.openjdk.org/jdk/pull/22320#pullrequestreview-2462269167 From kvn at openjdk.org Tue Nov 26 17:36:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Nov 2024 17:36:41 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v2] In-Reply-To: References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: <2uZlBd9wrRHLiJS7ZuFuVc5EydPOI3yaXjQcEzlOhCE=.a07b3be2-4f38-466c-84b9-d45d38daaeec@github.com> On Wed, 20 Nov 2024 14:35:36 GMT, Emanuel Peter wrote: >> Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 >> >> It will be important for the efforts in: >> [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count >> >> I ran the plots for `byte, int, long`. >> We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. >> >> We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. >> >> --------------------------------------------------- >> **Results** >> >> red: normal -> saw-tooth >> green: randomized offsets -> more "smooth" >> >> linux_x64 >> ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) >> >> linux_aarch64 >> ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) >> >> macosx_x64 >> ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) >> >> macosx_aarch64 >> ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) >> >> windows_x64 >> ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - whitespace > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - JDK-8344118 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22070#pullrequestreview-2462277544 From kvn at openjdk.org Tue Nov 26 17:36:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Nov 2024 17:36:42 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v3] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Mon, 25 Nov 2024 20:29:32 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8344171 > - Update comment > - 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order > - Apply suggestions from code review > > Co-authored-by: Tobias Hartmann > - 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates Seems fine to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22275#pullrequestreview-2462275300 From dlong at openjdk.org Tue Nov 26 17:45:52 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Nov 2024 17:45:52 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: <2HSq6ZKhkH0zYLic1j3sLQN0vuGfRWl59HFoFRC5fus=.55908db0-4664-490c-bae1-ddd1eec3e33a@github.com> On Tue, 26 Nov 2024 14:05:39 GMT, theoweidmannoracle wrote: >>> With the new implementation it is always safe to call C->inline_printer()->record. In case inline printing is disabled, it is just a no-op. Also allow_inline seems to be coming from C->inlining_incrementally(). Is your concern that we might miss to print something? >> >> Yes, that we might miss printing something when allow_inline was true. > > I think I didn't change the logic behind when it would print now as compared to before my refactoring. I think investigating whether this is really the correct check is a bit beyond the scope about this PR. Should I file a RFE for this? Right, it's an existing issue not related to your changes. Yes, please file an RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1858986649 From dlong at openjdk.org Tue Nov 26 17:45:53 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 26 Nov 2024 17:45:53 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Tue, 26 Nov 2024 14:08:17 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/printinlining.cpp line 60: >> >>> 58: return locate(state->caller(), nullptr)->at_bci(state->bci(), callee); >>> 59: } >>> 60: >> >> It looks like you are building a tree, just like InlineTree. I wonder if it would make sense to unify them somehow in the future. > > I thought about this too. Should I open an RFE for this? Yes, please. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1858987366 From kvn at openjdk.org Tue Nov 26 18:02:46 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 26 Nov 2024 18:02:46 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> Message-ID: <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> On Wed, 13 Nov 2024 14:09:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation I have few comments. src/hotspot/cpu/x86/x86.ad line 2771: > 2769: int offset = i * type2aelembytes(bt); > 2770: switch (bt) { > 2771: case T_BYTE: val->at(i) = con; break; I don't like that switch is executed for each copied element. What is typical `len` value? src/hotspot/share/opto/constantTable.cpp line 65: > 63: } > 64: > 65: int ConstantTable::alignment() const { Add comment that it is used for nmethod's constant section size and alignment. ------------- PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2462310312 PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1859020040 PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1858997584 From qamai at openjdk.org Tue Nov 26 18:20:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:20:22 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - whitespace - Merge branch 'master' into shufflerefactor - [vectorapi] Refactor VectorShuffle implementation ------------- Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=02 Stats: 4912 lines in 64 files changed: 2602 ins; 1066 del; 1244 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Tue Nov 26 18:20:23 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:20:23 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 23:17:35 GMT, Sandhya Viswanathan wrote: >> I have adapted the patch in accordance with https://github.com/openjdk/jdk/pull/20634, I moved the index wrapping into C2 instead of making it a separate step as I think it seems clearer. Also, I think in the future we can eliminate this step so putting it in C2 would make the progress easier. >> >> Please take a look, thanks a lot. > > @merykitty Could you please merge with the latest and resolve conflicts? @sviswa7 @PaulSandoz @eme64 @jatin-bhateja Thanks for taking a look, I have merged the PR with a more recent master and resolved the sematic difference with newly added intrinsics, too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2501633760 From qamai at openjdk.org Tue Nov 26 18:20:24 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:20:24 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 15 Oct 2024 16:25:04 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> [vectorapi] Refactor VectorShuffle implementation > > src/hotspot/share/opto/vectornode.hpp line 1618: > >> 1616: public: >> 1617: VectorLoadShuffleNode(Node* in, const TypeVect* vt) >> 1618: : VectorNode(in, vt) {} > > Can you add a comment above "class VectorLoadShuffleNode" to say what its semantics are? Done, I would refrain from renaming it right now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1859030030 From qamai at openjdk.org Tue Nov 26 18:20:25 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:20:25 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 20:27:20 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> [vectorapi] Refactor VectorShuffle implementation > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 228: > >> 226: } >> 227: >> 228: AbstractVector iota = vspecies().asIntegral().iota(); > > I suspect the non-power of two code is more efficient. (Even better if the MUL could be transformed to a shift for power of two values.) > > Separately, it makes me wonder if we should revisit the shuffle factories if it is now much more efficient to construct a shuffle from a vector. `shuffleFromOp` is a slow path op so I don't think it is. Additionally, our vector multiplication is against a scalar, too. So we can optimize it if `step` is a constant. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 870: > >> 868: @Override >> 869: public final Int256Shuffle rearrange(VectorShuffle shuffle) { >> 870: return (Int256Shuffle) toBitsVector().rearrange(((Int256Shuffle) shuffle) > > I think the cast is redundant for all vector kinds. Similarly the explicit cast is redundant for the integral vectors, perhaps in the template separate out the expressions to avoid it where not needed? > > We could also refer to `VSPECIES` directly rather than calling `vspecies()`, same applies in other methods in the concrete vector classes. The cast is added so that we have the concrete type of the shuffle, the result of `toShuffle` is only `VectorShuffle` > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 908: > >> 906: } >> 907: >> 908: private static boolean indicesInRange(int[] indices) { > > Since this method is only called from an assert statement in the constructor we could avoid the clever checking that assertions are enabled and the explicit throwing on an AssertionError by using a second expression that produces an error message when the assertion fails : e.g., > > assert indicesInRange(indices) : outOfBoundsAssertMessage(indices); Yes you are right, since this method is only called in `assert` I think we can just remove the `assert` trick inside instead. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java line 2473: > >> 2471: final >> 2472: VectorShuffle toShuffle(AbstractSpecies dsp, boolean wrap) { >> 2473: assert(dsp.elementSize() == vspecies().elementSize()); > > Even though we force inline I cannot quite decide if it is better to forego the assert since it unduly increases method size. Regardless it may be useful to place the partial wrapping logic in a separate method, given it is less likely to be used. You don't have to worry too much about inlining of Vector API methods since it is done during late inlining and we have a pretty huge budget there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1859037153 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1859033054 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1859033749 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1859032221 From qamai at openjdk.org Tue Nov 26 18:24:07 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:24:07 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v4] In-Reply-To: References: Message-ID: > Hi, > > This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. > > Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. > > This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - add comment to ConstantTable::alignment - Merge branch 'master' into constanttable - indentation - Merge branch 'master' into constanttable - Merge branch 'master' into constanttable - refactor array constant, fix codebuffer reallocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21596/files - new: https://git.openjdk.org/jdk/pull/21596/files/bd0628ea..b8a8d9a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=02-03 Stats: 194422 lines in 3958 files changed: 64861 ins; 115245 del; 14316 mod Patch: https://git.openjdk.org/jdk/pull/21596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21596/head:pull/21596 PR: https://git.openjdk.org/jdk/pull/21596 From qamai at openjdk.org Tue Nov 26 18:24:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:24:11 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> Message-ID: On Tue, 26 Nov 2024 17:50:12 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - indentation >> - Merge branch 'master' into constanttable >> - Merge branch 'master' into constanttable >> - refactor array constant, fix codebuffer reallocation > > src/hotspot/share/opto/constantTable.cpp line 65: > >> 63: } >> 64: >> 65: int ConstantTable::alignment() const { > > Add comment that it is used for nmethod's constant section size and alignment. I have added it to the header file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1859042113 From qamai at openjdk.org Tue Nov 26 18:27:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 18:27:47 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> Message-ID: On Tue, 26 Nov 2024 18:00:16 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - indentation >> - Merge branch 'master' into constanttable >> - Merge branch 'master' into constanttable >> - refactor array constant, fix codebuffer reallocation > > src/hotspot/cpu/x86/x86.ad line 2771: > >> 2769: int offset = i * type2aelembytes(bt); >> 2770: switch (bt) { >> 2771: case T_BYTE: val->at(i) = con; break; > > I don't like that switch is executed for each copied element. What is typical `len` value? `len` is at most 16 and is typically 1 (you only emit 1 element and the broadcast instruction will fill the whole register). Also, this function is only invoked a couple of times for each compilation and I think the compiler can do unswitching, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1859047709 From chagedorn at openjdk.org Tue Nov 26 18:34:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 26 Nov 2024 18:34:42 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v3] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Mon, 25 Nov 2024 20:29:32 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8344171 > - Update comment > - 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order > - Apply suggestions from code review > > Co-authored-by: Tobias Hartmann > - 8344213: Cleanup OpaqueLoop*Node verification code for Assertion Predicates Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22275#issuecomment-2501660123 From vlivanov at openjdk.org Tue Nov 26 19:05:44 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 26 Nov 2024 19:05:44 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v9] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 17:21:15 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - redo without bailout > - Merge remote-tracking branch 'origin/master' into 8340141 > - add missing bailout checks > - C1 fix > - remove blank line > - Merge master > - bail out on old methods > - redo VM state > - fix errors > - make sure to be in VM state when checking is_old > - ... and 2 more: https://git.openjdk.org/jdk/compare/4d4cef80...7a7bdb86 Interesting observation! Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2462460764 From duke at openjdk.org Tue Nov 26 19:06:57 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 26 Nov 2024 19:06:57 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:47:47 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: > > - num_8b_elems_in_vec --> nof_vec_elems > - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2501715725 From qamai at openjdk.org Tue Nov 26 19:33:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 26 Nov 2024 19:33:04 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v4] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/4cee07eb..36ee750a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=02-03 Stats: 11 lines in 7 files changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From szaldana at openjdk.org Tue Nov 26 19:46:45 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 26 Nov 2024 19:46:45 GMT Subject: Integrated: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 18:53:28 GMT, Sonia Zaldana Calles wrote: > Hi folks, > > This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). > > Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. > > In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. > > I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). > > Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). > > However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. > > Cheers, > Sonia This pull request has now been integrated. Changeset: 3689f390 Author: Sonia Zaldana Calles URL: https://git.openjdk.org/jdk/commit/3689f3909ee87e79b350a739878cd0a358810c99 Stats: 42 lines in 2 files changed: 42 ins; 0 del; 0 mod 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose Reviewed-by: chagedorn, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/22331 From yzheng at openjdk.org Tue Nov 26 20:53:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 20:53:46 GMT Subject: Integrated: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: <8sKPrYPCb3iDvKsANye2c_APRaEMeg5L_y3R8iR4lg4=.6b200dde-f896-4338-b2b5-2ad95c99f4c5@github.com> On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. This pull request has now been integrated. Changeset: 8da6435d Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/8da6435d4d2b94b72d2f3872f2fd2cc71a66499a Stats: 20 lines in 3 files changed: 17 ins; 0 del; 3 mod 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/22111 From yzheng at openjdk.org Tue Nov 26 20:53:45 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 20:53:45 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Tue, 26 Nov 2024 17:14:35 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - address comments. > - Merge master > - address comment. > - Override ModifiersProvider.isConcrete in ResolvedJavaType Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2501909324 From amitkumar at openjdk.org Wed Nov 27 03:44:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 27 Nov 2024 03:44:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Mon, 25 Nov 2024 13:44:09 GMT, Martin Doerr wrote: >> I don't have hardware for arm32 :( > > We can ask @bulasevich (also see https://wiki.openjdk.org/display/HotSpot/Ports). Should I revert arm32 changes ? Maybe it can be done with another JBS issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1859791993 From qxing at openjdk.org Wed Nov 27 06:13:38 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 27 Nov 2024 06:13:38 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 09:27:18 GMT, Hamlin Li wrote: >> Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. > > Thanks for catching and fix. > riscv part looks good to me, seems other platforms have the similar issue, but good to have others have another look. @Hamlin-Li @dafedafe @vnkozlov Thanks for your review! Would anyone be willing to help review/test x86_32, ppc and s390? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22384#issuecomment-2502986398 From amitkumar at openjdk.org Wed Nov 27 06:43:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 27 Nov 2024 06:43:42 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> Message-ID: On Mon, 18 Nov 2024 19:15:39 GMT, Dean Long wrote: >>> For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". >> >> There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. > >> > For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". >> >> There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. > > Good idea. @dean-long @iwanowww any suggestion for this one ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2503027991 From kbarrett at openjdk.org Wed Nov 27 06:45:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 27 Nov 2024 06:45:42 GMT Subject: RFR: 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 11:09:46 GMT, Christian Hagedorn wrote: >> Please review this trivial change to use nullptr instead of a literal 0 in a >> call to Node::dump_bfs by the MemPointer ctor. >> >> Testing: mach5 tier1 > > Looks good and trivial. Thanks for reviews @chhagedorn and @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/22388#issuecomment-2503029376 From kbarrett at openjdk.org Wed Nov 27 06:45:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 27 Nov 2024 06:45:43 GMT Subject: Integrated: 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 11:00:56 GMT, Kim Barrett wrote: > Please review this trivial change to use nullptr instead of a literal 0 in a > call to Node::dump_bfs by the MemPointer ctor. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 1f6144ef Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/1f6144ef26096da46ca04f188afb483ea237bb0e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8345050: Fix -Wzero-as-null-pointer warning in MemPointer ctor Reviewed-by: chagedorn, shade ------------- PR: https://git.openjdk.org/jdk/pull/22388 From epeter at openjdk.org Wed Nov 27 06:46:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Nov 2024 06:46:25 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v3] In-Reply-To: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: > Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 > > It will be important for the efforts in: > [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count > > I ran the plots for `byte, int, long`. > We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. > > We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. > > --------------------------------------------------- > **Results** > > red: normal -> saw-tooth > green: randomized offsets -> more "smooth" > > linux_x64 > ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) > > linux_aarch64 > ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) > > macosx_x64 > ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) > > macosx_aarch64 > ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) > > windows_x64 > ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - whitespace - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark - JDK-8344118 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22070/files - new: https://git.openjdk.org/jdk/pull/22070/files/c3930c4d..1316709d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22070&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22070&range=01-02 Stats: 42116 lines in 846 files changed: 21575 ins; 16619 del; 3922 mod Patch: https://git.openjdk.org/jdk/pull/22070.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22070/head:pull/22070 PR: https://git.openjdk.org/jdk/pull/22070 From jbhateja at openjdk.org Wed Nov 27 07:59:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 07:59:39 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v3] In-Reply-To: References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: On Wed, 27 Nov 2024 06:46:25 GMT, Emanuel Peter wrote: >> Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 >> >> It will be important for the efforts in: >> [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count >> >> I ran the plots for `byte, int, long`. >> We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. >> >> We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. >> >> --------------------------------------------------- >> **Results** >> >> red: normal -> saw-tooth >> green: randomized offsets -> more "smooth" >> >> linux_x64 >> ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) >> >> linux_aarch64 >> ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) >> >> macosx_x64 >> ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) >> >> macosx_aarch64 >> ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) >> >> windows_x64 >> ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - whitespace > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - JDK-8344118 Good ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22070#pullrequestreview-2464173569 From chagedorn at openjdk.org Wed Nov 27 08:14:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 27 Nov 2024 08:14:39 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v3] In-Reply-To: References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: On Wed, 27 Nov 2024 06:46:25 GMT, Emanuel Peter wrote: >> Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 >> >> It will be important for the efforts in: >> [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count >> >> I ran the plots for `byte, int, long`. >> We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. >> >> We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. >> >> --------------------------------------------------- >> **Results** >> >> red: normal -> saw-tooth >> green: randomized offsets -> more "smooth" >> >> linux_x64 >> ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) >> >> linux_aarch64 >> ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) >> >> macosx_x64 >> ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) >> >> macosx_aarch64 >> ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) >> >> windows_x64 >> ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - whitespace > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark > - JDK-8344118 Nice plots, looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22070#pullrequestreview-2464248536 From chagedorn at openjdk.org Wed Nov 27 08:43:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 27 Nov 2024 08:43:43 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 02:23:46 GMT, Dhamoder Nalla wrote: >> In the debug build, the assert is triggered during the parsing (before Code_Gen). In the Release build, however, the compilation bails out at `Compile::check_node_count()` during the code generation phase and completes execution without any issues. >> >> When I commented out the assert(C->live_nodes() <= C->max_node_limit()), both the debug and release builds exhibited the same behavior: the compilation bails out during code_gen after building the ideal graph with more than 80K nodes. >> >> The proposed fix will check the live node count and bail out during compilation while building the graph for scalarization of the elements in the array when the live node count crosses the limit of 80K, instead of unnecessarily building the entire graph and bailing out in code_gen. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > CR comments test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 2: > 1: /* > 2: * Copyright (c) 2015, 2023, Oracle and/or its affiliates. All rights reserved. You should update the copyright year: Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 36: > 34: > 35: public class TestScalarizeBailout { > 36: static Object var1; You should indent Java code with 4 spaces. test/hotspot/jtreg/compiler/escapeAnalysis/TestScalarizeBailout.java line 42: > 40: try { > 41: // load the class to initialize the static object and trigger the EA > 42: Class Class37 = Class.forName("compiler.escapeAnalysis.TestScalarizeBailout"); I'm still unclear why we need this line. Is it possible to trigger the assert without it somehow? I tried to run your JTreg test but could not trigger the assert. On what platform/setup could you trigger this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1860213391 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1860212953 PR Review Comment: https://git.openjdk.org/jdk/pull/20504#discussion_r1860215585 From jbhateja at openjdk.org Wed Nov 27 08:53:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 08:53:15 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations Message-ID: This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 It adds IR validation tests for newly added saturated vector add / sub operations. ------------- Commit messages: - Use feature check - Update test tags - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342677 - 8342677: Add IR validation tests for newly added saturated add / sub vector operations Changes: https://git.openjdk.org/jdk/pull/21603/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21603&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342677 Stats: 514 lines in 2 files changed: 514 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21603/head:pull/21603 PR: https://git.openjdk.org/jdk/pull/21603 From enikitin at openjdk.org Wed Nov 27 08:57:29 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 27 Nov 2024 08:57:29 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v2] In-Reply-To: References: Message-ID: > For CTW, zero classes in provided jar is now a failure. > This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. > > This PR makes this behaviour controllable. Default reaction is a failure, like before. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary/ctw/src/sun/hotspot/tools/ctw/CtwRunner.java Fix comment Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22320/files - new: https://git.openjdk.org/jdk/pull/22320/files/286c47e0..a091e15d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22320&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22320&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22320/head:pull/22320 PR: https://git.openjdk.org/jdk/pull/22320 From duke at openjdk.org Wed Nov 27 09:08:53 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 27 Nov 2024 09:08:53 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 19:33:58 GMT, Dean Long wrote: >>> I pulled your latest changes, and I am seeing missing newlines in the output, just by running `java -XX:+PrintInlining`. With -XX:+PrintIntrinsics, there is no additional output, so I'm wondering how -XX:+PrintIntrinsics tests are passing. Maybe we are missing test coverage for that flag. >> >> >> >>> I'm also seeing missing method names: >>> >>> ``` >>> @ 10 java.lang.StringBuilder:: @ 7 jdk.internal.classfile.impl.SplitConstantPool::utf8Entry (45 bytes) failed to inline: callee is too large >>> ``` >>> >>> and weird indentation: >>> >>> ``` >>> @ 1 java.lang.Object:: (1 bytes) inline >>> @ 1 sun.invoke.util.Wrapper::basicTypeChar (18 bytes) inline >>> ``` >> >> @dean-long The reason for this is that multiple compile threads are trying to print at the same time. The odd formatting goes away with `-Xbatch`, preventing concurrent compilation. I didn't remove any explicit locking or synchronizing mechanism during refactoring. I think there was never any explicit mechanism to make this work without -Xbatch but it rather worked because the entire printinlining for one method was first dumped into a stringStream, which was then dumped onto tty in one go. With my refactoring though, InlinePrinter::IPInlineSite::dump will directly print individual segments of the output to tty, opening the door widely for bad interleavings with multiple compile threads. >> >> Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? > >> Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? > > Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. @dean-long After further investigation, I discovered that also before my refactoring the formatting can be messed up severely. For example, running javac -J-XX:+PrintCompilation -J-XX:+TieredCompilation -J-XX:+PrintInlining test/jdk/com/sun/jdi/HelloWorld.java on a recent master debug build, will also produce messed up output. So `stringStream`s do not really help with the problem. Since it wasn't synchronized before, I propose that we do not change it as part of this PR. Should we file an RFE to look into this in the future? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2503310881 From epeter at openjdk.org Wed Nov 27 09:18:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Nov 2024 09:18:50 GMT Subject: RFR: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark [v3] In-Reply-To: <2uZlBd9wrRHLiJS7ZuFuVc5EydPOI3yaXjQcEzlOhCE=.a07b3be2-4f38-466c-84b9-d45d38daaeec@github.com> References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> <2uZlBd9wrRHLiJS7ZuFuVc5EydPOI3yaXjQcEzlOhCE=.a07b3be2-4f38-466c-84b9-d45d38daaeec@github.com> Message-ID: <4Mj0tLXOaftZXgo_I-7iNUzYmpeV2YkxO78eoe9KZrg=.6b07dc56-d3cd-4dc8-aa76-c25ec498f9f4@github.com> On Tue, 26 Nov 2024 17:34:30 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - whitespace >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - Merge branch 'master' into JDK-8344118-VectorThroughputForIterationCount-benchmark >> - JDK-8344118 > > Good. @vnkozlov @chhagedorn @jatin-bhateja thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22070#issuecomment-2503332379 From epeter at openjdk.org Wed Nov 27 09:18:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 27 Nov 2024 09:18:51 GMT Subject: Integrated: 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark In-Reply-To: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> References: <286tKUBJ5Gxn3-iw2w1b0wf3cHCHIa7nP3vyeEaNL0k=.cad88247-2794-4c04-9ee1-7d74aa2ddb9e@github.com> Message-ID: On Wed, 13 Nov 2024 13:30:52 GMT, Emanuel Peter wrote: > Took idea of benchmark from here https://github.com/openjdk/jdk/pull/14581 > > It will be important for the efforts in: > [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count > > I ran the plots for `byte, int, long`. > We have aligned/unaligned scenarios, and compute-bound vs memory-bound scenarios. > > We can very clearly see the effect of vectorization, and that with increasing `size`, we get increasingly better performance. But we can also see the effect of pre/post loops: this creates the saw-tooth curve. > > --------------------------------------------------- > **Results** > > red: normal -> saw-tooth > green: randomized offsets -> more "smooth" > > linux_x64 > ![linux_x64](https://github.com/user-attachments/assets/1e63b47f-16a6-4766-985d-9da4cad25505) > > linux_aarch64 > ![linux_aarch64](https://github.com/user-attachments/assets/77e9a880-32eb-43f8-a84b-16f39c1c2a62) > > macosx_x64 > ![macosx_x64](https://github.com/user-attachments/assets/ab730367-d684-475c-b96d-e1093f56e776) > > macosx_aarch64 > ![macosx_aarch64](https://github.com/user-attachments/assets/551484f5-79c7-41ea-b54d-e038d8c7b048) > > windows_x64 > ![windows_x64](https://github.com/user-attachments/assets/3801577f-ac53-48ce-9b46-a2c9f0a2ddfe) This pull request has now been integrated. Changeset: b3986bdb Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b3986bdbdbafabde5beb15300444034363723449 Stats: 436 lines in 1 file changed: 436 ins; 0 del; 0 mod 8344118: C2 SuperWord: add VectorThroughputForIterationCount benchmark Reviewed-by: kvn, jbhateja, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22070 From dlong at openjdk.org Wed Nov 27 10:45:43 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Nov 2024 10:45:43 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v18] In-Reply-To: <07brJy6xgJmiLKatpzZJbjKsIQZpejKi9ovKiW5Ipxc=.be477b8a-524e-4b37-b4c9-e0ca0416f69c@github.com> References: <07brJy6xgJmiLKatpzZJbjKsIQZpejKi9ovKiW5Ipxc=.be477b8a-524e-4b37-b4c9-e0ca0416f69c@github.com> Message-ID: On Tue, 26 Nov 2024 14:13:02 GMT, theoweidmannoracle wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: > > - Fix style > - Derecursify locate Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2464650594 From dlong at openjdk.org Wed Nov 27 10:45:44 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Nov 2024 10:45:44 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Wed, 27 Nov 2024 09:06:24 GMT, theoweidmannoracle wrote: > Since it wasn't synchronized before, I propose that we do not change it as part of this PR. Should we file an RFE to look into this in the future? Yes, that seems best. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2503533936 From chagedorn at openjdk.org Wed Nov 27 11:37:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 27 Nov 2024 11:37:40 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v4] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 04:25:07 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Make long tests check IR The IR rule updates look good. Maybe you want to wait with integrating this until after the fork next Thursday. So, this only goes into JDK 25. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21439#pullrequestreview-2464765125 From jbhateja at openjdk.org Wed Nov 27 13:49:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 13:49:40 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 15:19:25 GMT, Volodymyr Paprotski wrote: >> This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR >> >> Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.491 ? 0.356 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.899 ? 0.013 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.477 ? 1.006 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.921 ? 0.038 ops/s >> >> After: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.910 ? 1.991 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 426.414 ? 2.988 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.882 ? 2.446 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 425.402 ? 4.205 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix date src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java line 57: > 55: > 56: // chunkSize is a multiple of block size and used to divide up > 57: // input data to trigger the intrinsic. This comment looks incorrect, a method marked as an intrinsic is always inline expanded by C2 compile during parsing or during incremental inlining if -XX:+InlineIncrement is used. I guess what you intend here is triggering an OSR compilation of loop by C2 compiler which in turn trigger intrinsic since C1 never intrinsifies crypto APIs ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860693705 From jbhateja at openjdk.org Wed Nov 27 13:52:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 13:52:43 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> On Wed, 27 Nov 2024 13:46:51 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix date > > src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java line 57: > >> 55: >> 56: // chunkSize is a multiple of block size and used to divide up >> 57: // input data to trigger the intrinsic. > > This comment looks incorrect, a method marked as an intrinsic is always inline expanded by C2 compile during parsing or during incremental inlining if -XX:+InlineIncrement is used. > > I guess what you intend here is triggering an OSR compilation of loop by C2 compiler which in turn trigger intrinsic since C1 never intrinsifies crypto APIs For CRC32 digest computation we do support intrinsic at interpreter and c1 compiler level to overcome such warmup related penalties. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860698309 From duke at openjdk.org Wed Nov 27 14:34:06 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 27 Nov 2024 14:34:06 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v19] In-Reply-To: References: Message-ID: <3s62v1wxJtcoMvIsLMd5MEBt_AHkdHwEIw0VR4UnUtY=.4bb1a418-ab8a-4397-bcb5-39ff28575c39@github.com> > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update memory management and use treap ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/ff76d160..fb59ac90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=17-18 Stats: 139 lines in 4 files changed: 74 ins; 14 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From aph at openjdk.org Wed Nov 27 14:48:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 27 Nov 2024 14:48:37 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 13:49:51 GMT, Jatin Bhateja wrote: >> src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java line 57: >> >>> 55: >>> 56: // chunkSize is a multiple of block size and used to divide up >>> 57: // input data to trigger the intrinsic. >> >> This comment looks incorrect, a method marked as an intrinsic is always inline expanded by C2 compile during parsing or during incremental inlining if -XX:+InlineIncrement is used. >> >> I guess what you intend here is triggering an OSR compilation of loop by C2 compiler which in turn trigger intrinsic since C1 never intrinsifies crypto APIs > > For CRC32 digest computation we do support intrinsic at interpreter and c1 compiler level to overcome such warmup related penalties. This is not just a good idea to trigger OSR and therefore use the intrinsic, it's a good idea because very long data causes an extended time to safepoint. I'd support in all cases limiting the size to about a megabyte, which is what we have here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860794141 From duke at openjdk.org Wed Nov 27 15:01:24 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 27 Nov 2024 15:01:24 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v20] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add missing header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/fb59ac90..5364a488 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=18-19 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From vpaprotski at openjdk.org Wed Nov 27 15:05:38 2024 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 27 Nov 2024 15:05:38 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 14:45:36 GMT, Andrew Haley wrote: >> For CRC32 digest computation we do support intrinsic at interpreter and c1 compiler level to overcome such warmup related penalties. > > This is not just a good idea to trigger OSR and therefore use the intrinsic, it's a good idea because very long data causes an extended time to safepoint. I'd support in all cases limiting the size to about a megabyte, which is what we have here. As Andrew points out, giving an intrinsic lots of data, 'backdoors/breaks' a lot of existing algorithms.. from GC not happening because of no safepoint inside the intrinsic, to OSR.. .. and (what I believe to be issue for performance here) the call count (CompilationThreshold) to get the intrinsic to compile (well, the callee) in the first place. Though as I pointed in the original issue, I am not entirely convinced it was the call count that got the intrinsic back in; experimentally, chunking got the 'outer intrinsic' to compile. (There is an inner intrinsic that works on 16 byte chunks) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860822482 From jbhateja at openjdk.org Wed Nov 27 15:12:36 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 15:12:36 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 14:45:36 GMT, Andrew Haley wrote: >> For CRC32 digest computation we do support intrinsic at interpreter and c1 compiler level to overcome such warmup related penalties. > > This is not just a good idea to trigger OSR and therefore use the intrinsic, it's a good idea because very long data causes an extended time to safepoint. I'd support in all cases limiting the size to about a megabyte, which is what we have here. Agree with @theRealAph , loop induces safe point on back edges which gives opportunity to gc epochs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860833185 From jbhateja at openjdk.org Wed Nov 27 15:12:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 15:12:37 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 15:09:18 GMT, Jatin Bhateja wrote: >> This is not just a good idea to trigger OSR and therefore use the intrinsic, it's a good idea because very long data causes an extended time to safepoint. I'd support in all cases limiting the size to about a megabyte, which is what we have here. > > Agree with @theRealAph , loop induces safe point on back edges which gives opportunity to gc epochs. > As Andrew points out, giving an intrinsic lots of data, 'backdoors/breaks' a lot of existing algorithms.. from GC not happening because of no safepoint inside the intrinsic, to OSR.. > > .. and (what I believe to be issue for performance here) the call count (CompilationThreshold) to get the intrinsic to compile (well, the callee) in the first place. Though as I pointed in the original issue, I am not entirely convinced it was the call count that got the intrinsic back in; experimentally, chunking got the 'outer intrinsic' to compile. (There is an inner intrinsic that works on 16 byte chunks) Please update the comments in the code accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860834512 From vpaprotski at openjdk.org Wed Nov 27 15:16:43 2024 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 27 Nov 2024 15:16:43 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 15:10:09 GMT, Jatin Bhateja wrote: >> Agree with @theRealAph , loop induces safe point on back edges which gives opportunity to gc epochs. > >> As Andrew points out, giving an intrinsic lots of data, 'backdoors/breaks' a lot of existing algorithms.. from GC not happening because of no safepoint inside the intrinsic, to OSR.. >> >> .. and (what I believe to be issue for performance here) the call count (CompilationThreshold) to get the intrinsic to compile (well, the callee) in the first place. Though as I pointed in the original issue, I am not entirely convinced it was the call count that got the intrinsic back in; experimentally, chunking got the 'outer intrinsic' to compile. (There is an inner intrinsic that works on 16 byte chunks) > > Please update the comments in the code accordingly. Not sure what about the comment needs to be updated. Maybe provide a suggestion? Also, please have a look at the original issue, we had a similar discussion about this same comment and this was the result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860841172 From jbhateja at openjdk.org Wed Nov 27 15:28:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 15:28:39 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 15:14:15 GMT, Volodymyr Paprotski wrote: >>> As Andrew points out, giving an intrinsic lots of data, 'backdoors/breaks' a lot of existing algorithms.. from GC not happening because of no safepoint inside the intrinsic, to OSR.. >>> >>> .. and (what I believe to be issue for performance here) the call count (CompilationThreshold) to get the intrinsic to compile (well, the callee) in the first place. Though as I pointed in the original issue, I am not entirely convinced it was the call count that got the intrinsic back in; experimentally, chunking got the 'outer intrinsic' to compile. (There is an inner intrinsic that works on 16 byte chunks) >> >> Please update the comments in the code accordingly. > > Not sure what about the comment needs to be updated. Maybe provide a suggestion? > > Also, please have a look at the original issue, we had a similar discussion about this same comment and this was the result. I dont see any harm in adding descriptive comments giving good justification. Here is my suggestion:- "Change facilitate eager intrinsification due to OSR compilation, in addition safe point induced at loop back edge reduce time to safepoint before GC epoch." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860861603 From thartmann at openjdk.org Wed Nov 27 15:38:45 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 27 Nov 2024 15:38:45 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:56:31 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Looks good to me. Nice cleanup. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2465379252 From duke at openjdk.org Wed Nov 27 15:38:46 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 27 Nov 2024 15:38:46 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:48:27 GMT, Vladimir Kozlov wrote: >>> Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > >> > Do we have other places (not new constant node) where we set Root as control? May be we can add `set_root_as_ctrl(n)` method in `loop node.hpp` in such case. >> >> There's only three locations where control is set to the root in the loop files now (not counting the ones in the new methods I added). The main reason for this patch is bugs caused by people forgetting to set control for constants (e.g. https://bugs.openjdk.org/browse/JDK-8343137), which is now prevented if the new helper methods are used. >> >> Do you think there would be any benefit from introducing `set_root_as_ctrl(n)` given there's only about three places where this pattern occurs now? > > My suggesting is about additional cleaning code. I think 3 + 5 places are enough to justify to have a new function in header file. Also `set_root_as_ctrl(n)` could be copy of `set_ctrl(n, ctrl)` without 2 asserts which checks `ctrl`. It will be faster. @vnkozlov Could you take another look at the latest changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2504177215 From ascarpino at openjdk.org Wed Nov 27 15:42:38 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Wed, 27 Nov 2024 15:42:38 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: <5XsIqT0IvxEZyKy9Ym5NP4Q9i3MSHh91ErEKxFyeiII=.ec808b21-9c0c-4c9f-a67b-55f033ece46d@github.com> Message-ID: On Wed, 27 Nov 2024 15:25:02 GMT, Jatin Bhateja wrote: >> Not sure what about the comment needs to be updated. Maybe provide a suggestion? >> >> Also, please have a look at the original issue, we had a similar discussion about this same comment and this was the result. > > I dont see any harm in adding descriptive comments giving good justification. > > Here is my suggestion:- > > "Change facilitate eager intrinsification due to OSR compilation, in addition safe point induced at loop back edge reduce time to safepoint before GC epoch." I disagree with changing the comment. What @jatin-bhateja suggests, few will understand. The current comment describes what is being done in the simplest terms. Please leave the comment as is ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860885254 From ascarpino at openjdk.org Wed Nov 27 15:46:40 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Wed, 27 Nov 2024 15:46:40 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: <4jgB7fErggCSumNACIfinFeXTIcJC_MblxaZ-oAEvyQ=.7a87a0f8-7111-4b54-a706-5f766ebac8d6@github.com> On Tue, 26 Nov 2024 15:19:25 GMT, Volodymyr Paprotski wrote: >> This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR >> >> Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.491 ? 0.356 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.899 ? 0.013 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.477 ? 1.006 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.921 ? 0.038 ops/s >> >> After: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.910 ? 1.991 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 426.414 ? 2.988 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.882 ? 2.446 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 425.402 ? 4.205 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix date Marked as reviewed by ascarpino (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22300#pullrequestreview-2465397775 From jbhateja at openjdk.org Wed Nov 27 15:51:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 27 Nov 2024 15:51:37 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 15:19:25 GMT, Volodymyr Paprotski wrote: >> This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR >> >> Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.491 ? 0.356 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.899 ? 0.013 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.477 ? 1.006 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.921 ? 0.038 ops/s >> >> After: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.910 ? 1.991 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 426.414 ? 2.988 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.882 ? 2.446 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 425.402 ? 4.205 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix date Ok, we have already recorded detailed comments on the PR. This is not a compiler side change set so a simpler comment work. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22300#pullrequestreview-2465411735 From vpaprotski at openjdk.org Wed Nov 27 16:08:45 2024 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 27 Nov 2024 16:08:45 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 15:19:25 GMT, Volodymyr Paprotski wrote: >> This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR >> >> Before: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.491 ? 0.356 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.899 ? 0.013 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.477 ? 1.006 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.921 ? 0.038 ops/s >> >> After: >> >> Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.910 ? 1.991 ops/s >> AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 426.414 ? 2.988 ops/s >> AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.882 ? 2.446 ops/s >> AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 425.402 ? 4.205 ops/s > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > fix date Thanks for the approvals! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22300#issuecomment-2504250342 From vpaprotski at openjdk.org Wed Nov 27 16:08:46 2024 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 27 Nov 2024 16:08:46 GMT Subject: RFR: 8344766: AES/CTR slow at big payloads [v2] In-Reply-To: References: Message-ID: On Wed, 27 Nov 2024 10:59:06 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix date > > src/java.base/share/classes/com/sun/crypto/provider/CounterMode.java line 192: > >> 190: processed += implCrypt(in, inOff, chunkSize, out, outOff); >> 191: } >> 192: // note: above loop always leaves some data to process (more than zero, > > Suggestion: > > // Note: Above loop always leaves some data to process (more than zero, Since I have 2 approvals, going to integrate; dont want to loose the approvals and launch another build to fix this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22300#discussion_r1860921149 From vpaprotski at openjdk.org Wed Nov 27 16:08:47 2024 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Wed, 27 Nov 2024 16:08:47 GMT Subject: Integrated: 8344766: AES/CTR slow at big payloads In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 18:03:44 GMT, Volodymyr Paprotski wrote: > This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR > > Before: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.491 ? 0.356 ops/s > AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.899 ? 0.013 ops/s > AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.477 ? 1.006 ops/s > AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 16.921 ? 0.038 ops/s > > After: > > Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESBench.decrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.910 ? 1.991 ops/s > AESBench.decrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 426.414 ? 2.988 ops/s > AESBench.encrypt AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 218.882 ? 2.446 ops/s > AESBench.encrypt2 AES/CTR/NoPadding 30000000 128 SunJCE thrpt 3 425.402 ? 4.205 ops/s This pull request has now been integrated. Changeset: 75f3ec77 Author: Volodymyr Paprotski Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/75f3ec77e46831725ef927f0dda16a4dfd24b9a7 Stats: 17 lines in 1 file changed: 15 ins; 0 del; 2 mod 8344766: AES/CTR slow at big payloads Reviewed-by: ascarpino, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/22300 From tholenstein at openjdk.org Wed Nov 27 17:35:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 27 Nov 2024 17:35:58 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code Message-ID: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, ### LayoutGraph The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. ### LayoutLayer The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. ### LayoutNode The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. ### LayoutEdge The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (from), the ending node (to), and the positions where the edge connects to these nodes. It also keeps track of whether the edge has been reversed, which is useful for handling edges that go against the main flow in hierarchical layouts (like loops or back edges). This class ensures that edges are drawn correctly between nodes, helping to create clear and understandable visualizations of the graph. `LayoutGraph` with `LayoutLayers` LayoutGraph `LayoutLayer` with `LayoutNodes` LayoutLayer ### Keeping edges straight until they leave the LayoutLayer `before` old `now` new ------------- Commit messages: - remove executability of igv.sh - update Figure height calculation for Slots - run IGV without asserts - batch add connectionLayer.addChildren(newWidgets); - remove dead code in LineWidget - cached - Fix crash: missing Figure after filter applied - Node labels in the CFG view left aligned - 8314512: IGV: clean up hierarchical layout code - Revert "AllSoFar" - ... and 2 more: https://git.openjdk.org/jdk/compare/b9c6ce90...ad4d0761 Changes: https://git.openjdk.org/jdk/pull/22402/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8314512 Stats: 4980 lines in 41 files changed: 1840 ins; 2186 del; 954 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From rcastanedalo at openjdk.org Wed Nov 27 17:35:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 27 Nov 2024 17:35:58 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Tue, 26 Nov 2024 23:17:15 GMT, Tobias Holenstein wrote: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Hi Toby, thanks for doing this! I realize this changeset is not a pure clean up but actually affects the layouts. Could you summarize the proposed layout edges? (condense output edges, extra vertical margin for nodes with input/output ports, changes to the crossing reduction heuristic, etc). I found a couple of issues in the CFG view by playing around manually: 1. I get the following assertion error when opening the CFG view of the "Optimized finished" graph in [this file](https://github.com/user-attachments/files/17931456/foo.zip) and enabling the "Simplify graph" filter: ![Screenshot from 2024-11-27 09-33-33](https://github.com/user-attachments/assets/1bb83bfc-dd51-4282-96b2-491f05f4f323) 2. With this change, node labels in the CFG view are center-aligned (see right part of screenshot below), instead of left-aligned as expected (see left part of screenshot below): ![Screenshot from 2024-11-27 09-44-46](https://github.com/user-attachments/assets/4aca2179-342d-48ab-a71b-4c0b8d19ba2d) The changeset introduces a significant regression (a slowdown of almost 2x) in the time to compute and show a layout, see full results here: [results.ods](https://github.com/user-attachments/files/17933245/results.ods). I used the following ten medium-size graphs: [igv-suite.zip](https://github.com/user-attachments/files/17933256/igv-suite.zip) and instrumented a baseline IGV and the IGV from this PR with the following change: https://github.com/openjdk/jdk/commit/862ed698454d45102335208203144ee9ac32a079. Is the slowdown expected? Perhaps with a bit of profiling we could find whether there is some simple way to recover the original performance. ------------- PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2464306425 Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2464330565 PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2464701035 From tholenstein at openjdk.org Wed Nov 27 17:35:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 27 Nov 2024 17:35:58 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Tue, 26 Nov 2024 23:17:15 GMT, Tobias Holenstein wrote: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... > I found a couple of issues in the CFG view by playing around manually: > > 1. I get the following assertion error when opening the CFG view of the "Optimized finished" graph in [this file](https://github.com/user-attachments/files/17931456/foo.zip) and enabling the "Simplify graph" filter: > > ![Screenshot from 2024-11-27 09-33-33](https://private-user-images.githubusercontent.com/8792647/390332268-1bb83bfc-dd51-4282-96b2-491f05f4f323.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI3MDI0ODUsIm5iZiI6MTczMjcwMjE4NSwicGF0aCI6Ii84NzkyNjQ3LzM5MDMzMjI2OC0xYmI4M2JmYy1kZDUxLTQyODItOTZiMi00OTFmMDVmNGYzMjMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTEyNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDExMjdUMTAwOTQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NWM5ZWQzMmRjYjc1MWY5MTQwMWVmYzdjNTgxMTk0MTkxMjQyNzM2ODBlMWM5MjBjNjI4YjFjYzVlYzRlZGY1ZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.eRstMMZXJCLiYcFIBlLVspVWAxKjpD_UhfC2nEGP28g) > > 2. With this change, node labels in the CFG view are center-aligned (see right part of screenshot below), instead of left-aligned as expected (see left part of screenshot below): > > ![Screenshot from 2024-11-27 09-44-46](https://private-user-images.githubusercontent.com/8792647/390333115-4aca2179-342d-48ab-a71b-4c0b8d19ba2d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI3MDI0ODUsIm5iZiI6MTczMjcwMjE4NSwicGF0aCI6Ii84NzkyNjQ3LzM5MDMzMzExNS00YWNhMjE3OS0zNDJkLTQ4YWItYTcxYi00YzBiOGQxOWJhMmQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTEyNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDExMjdUMTAwOTQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjE1ZWM4NDQ2M2UxMmU5OGIyZTYwOWRjNGU2ODM5MTJiOWFkM2IxNTk0YzA5MmI3ZjE0MTE4OGM1ZmEyZWI1NiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.gCuPTpA_MN31H589FKmkvnbU-xlmwFoz0dF3c9zG8uw) Thanks for pointing those two things out. They should be fixed now! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2503470163 From rcastanedalo at openjdk.org Wed Nov 27 17:35:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 27 Nov 2024 17:35:58 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <6s816us9ylGk_kZUGBd8RWdIIIwjVGv2FcaQkg_lK08=.e94ed5f1-9889-486d-9a16-f11b8f5cd53d@github.com> On Wed, 27 Nov 2024 10:13:13 GMT, Tobias Holenstein wrote: > Thanks for pointing those two things out. They should be fixed now! Thanks for fixing these! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2503565348 From enikitin at openjdk.org Wed Nov 27 18:29:18 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 27 Nov 2024 18:29:18 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: Message-ID: > For CTW, zero classes in provided jar is now a failure. > This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. > > This PR makes this behaviour controllable. Default reaction is a failure, like before. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Use totalClassCount instead of the classCount ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22320/files - new: https://git.openjdk.org/jdk/pull/22320/files/a091e15d..d1e57aa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22320&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22320&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22320/head:pull/22320 PR: https://git.openjdk.org/jdk/pull/22320 From enikitin at openjdk.org Wed Nov 27 18:29:18 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 27 Nov 2024 18:29:18 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: <0afLSZbCNxy6H8PbPzuIDzz4zqKh-xDt1YFGt17bejw=.4362d5a8-f8b9-44e2-a96d-e2421f316c63@github.com> References: <0afLSZbCNxy6H8PbPzuIDzz4zqKh-xDt1YFGt17bejw=.4362d5a8-f8b9-44e2-a96d-e2421f316c63@github.com> Message-ID: On Tue, 26 Nov 2024 17:31:51 GMT, Vladimir Kozlov wrote: > What is default value of `allow_zero_class_count` and where it is set? It's False, as per [Boolean.getBoolean(...) specification](https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/lang/Boolean.html#getBoolean(java.lang.String)) > Why you even need this to be controlled and not default behavior? What is benefit of having error vs warning for empty `jar`? For mass-running of CTW against unchecked/random jars from various jar repositories, like Maven Central. Another solution would be filtering out such jars in advance, but that's a more difficult (read the jar file, check the class' count, etc.) solution. > Should you check `totalClassCount` too to catch empty `jar`? As I see `classCount` could be 0 if specified `classStart` and `classStop` as the same which could happened regardless number of classes in `jar` file. I've missed it. You're right, relying on `totalClassCount` seems a better idea. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22320#issuecomment-2504547999 From rcastanedalo at openjdk.org Wed Nov 27 19:39:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 27 Nov 2024 19:39:39 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Tue, 26 Nov 2024 23:17:15 GMT, Tobias Holenstein wrote: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Thanks for mitigating the performance regression, Toby! I benchmarked a version with your latest Java code changes ("layout-after-opts") and the version with the latest Java code changes and assertions disabled ("layout-after-opts-no-asserts"), see [results.ods](https://github.com/user-attachments/files/17939304/results.ods). In summary, the Java code changes mitigate the regression to some extent (from the original 1.7x mean slowdown down to 1.3x), and additionally disabling the assertions flips the sign of the result and brings a mean 1.5x **speedup** over the original layout time. That's great, I wasn't aware that enabling assertions on IGV was so expensive! I have also tested IGV, both manually and automatically by re-enabling assertions and laying out thousands of graphs in the sea of nodes, clustered sea of nodes, and control-flow graph views, with different combinations of filters enabled. I did not find any issue. I plan to review the actual code changes tomorrow. ------------- PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2465835515 From tholenstein at openjdk.org Wed Nov 27 22:06:39 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 27 Nov 2024 22:06:39 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Wed, 27 Nov 2024 11:05:07 GMT, Roberto Casta?eda Lozano wrote: > Perhaps I think I removed the performance bottlenecks. (it was not the layouting). - I set setToolTipText(?? + generateToolTipText(this.connections) + ??); only when needed for `LineWidget` - I add all created `LineWidgets` as a batch with `connectionLayer.addChildren(newWidgets);` - This left me with a last bottleneck where 90% of runtime for a large graph was spend in `connectionLayer.addChildren` -> `Widget.addChild` -> `Widget.setConstraint`: public final void setChildConstraint(Widget child, Object constraint) { assert this.children.contains(child); // we spend 90% of runtime on this assert if (constraint == null) { if (this.constraints != null) { this.constraints.remove(child); } } else { if (this.constraints == null) { this.constraints = new HashMap(); } this.constraints.put(child, constraint); } } We end up spending almost all runtime for large graphs on this assert that is part of the Netbeans library code. It has never failed. So the best thing is to run Java with asserts disabled, except for when we debug: this can be done be running `mvn` with -Dnetbeans.run.params="-J-da" I have modified the igv.sh with this command. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2504874316 From dlong at openjdk.org Wed Nov 27 23:20:43 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 27 Nov 2024 23:20:43 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation In-Reply-To: References: <-qkA-r13m-wg6I9W7mhtl0PJsTVrichUj5DP6hICRDk=.d67fa3e9-7474-4434-9051-0bef80508384@github.com> Message-ID: On Mon, 18 Nov 2024 19:15:39 GMT, Dean Long wrote: >>> For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". >> >> There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. > >> > For example, new_array_Type() when passes NOTNULL and INT and returns NOTNULL could be represented by something like "NIN". >> >> There's definitely some room for improvement here, but, frankly speaking, stringy descriptors don't look appealing to me. Why not simply introduce `TypeFunc` factory methods which explicitly accept argument/return `Type`s instead? Probably, variadic functions are a good fit here, but even if it's not the case, there are rather few arities used (single return value - void, 1 slot, or 2 slots, plus up to 8 arguments). And that would eliminate lots of boilerplate code as well. > > Good idea. > @dean-long @iwanowww any suggestion for this one ? You have several choices for creating factories for TypeTuple::make_domain() and TypeFunc::make() that take a variable number of arguments: 1) varargs TypeFunc* TypeFunc::make(int nargs, Type* return_type, ...) { ... } 2) overloading TypeFunc* TypeFunc::make(Type* return_type, Type* arg1) { /* 1 arg */ } TypeFunc* TypeFunc::make(Type* return_type, Type* arg1, Type* arg2) { /* 2 args */ } 3) default value TypeFunc* TypeFunc::make(Type* return_type, Type* arg1, Type* arg2 = nullptr, Type* arg3 = nullptr) { ... } 4) arrays TypeFunc* TypeFunc::make(int nargs, Type* types[]) { ... } Varargs is probably easiest, but I would be tempted to choose arrays if I thought it could get us all the way to read-only static constexpr data. Unfortunately, there are some details that get in the way, such as adding Type::HALF for LP64 and adding boiler-plate fields before the TypeFunc::Parms slot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2504966794 From vlivanov at openjdk.org Thu Nov 28 01:25:48 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 28 Nov 2024 01:25:48 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v2] In-Reply-To: References: Message-ID: On Sun, 24 Nov 2024 09:18:52 GMT, Amit Kumar wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > refactor FYI I played a bit with template functions as TypeFunc factories [1]. Overall, it looks promising. I'm perfectly fine to cover that in a follow-up PR, but it would be good to align this patch accordingly. Some observations follow: (1) There's a bunch of other TypeFunc usages you may want to cover. (2) It turns out there are 2 types of `TypeFunc`s declared on OptoRuntime: those related to OptoRuntime stubs (populated by `C2_STUBS_DO`) and adhoc types for generated stubs and runtime calls. For the former a dedicated init method colocated with runtime entry looks preferred (`*_Type_Init()`) since both representations can be easily compared. For the former, an inline construction in `OptoRuntime::initialize_types()` looks more appropriate. [1] https://github.com/openjdk/jdk/compare/pr/21782...iwanowww:jdk:pr/21782 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2505088404 From chagedorn at openjdk.org Thu Nov 28 07:36:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 07:36:47 GMT Subject: RFR: 8345154: IGV: Show Parse and Assertion Predicate type as extra label Message-ID: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> `ParsePredicate` and `If/RangeCheck` nodes for Assertion Predicates dump their respective type to the `dump_spec` string. We can parse this info and show it in the "Show custom node info" filter as done for other nodes already: ![Screenshot from 2024-11-28 08-16-23](https://github.com/user-attachments/assets/0de53873-8aa2-47b9-9275-d90c05deecd3) Thanks, Christian ------------- Commit messages: - 8345154: IGV: Show Parse and Assertion Predicate type as extra label Changes: https://git.openjdk.org/jdk/pull/22428/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22428&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345154 Stats: 26 lines in 2 files changed: 24 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22428/head:pull/22428 PR: https://git.openjdk.org/jdk/pull/22428 From chagedorn at openjdk.org Thu Nov 28 07:43:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 07:43:39 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <9oL8S-PC-j-Q0QPz-ex2_r31LX1c3akJZ_O1jrt_YhQ=.10a5784b-2a04-4fca-a9b8-a49c96faaf84@github.com> On Tue, 26 Nov 2024 23:17:15 GMT, Tobias Holenstein wrote: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Drive-by comment: You could update the class comments for the four classes described above in the PR description with the actual PR descriptions which are more detailed than the current class comments found in the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2505445274 From tholenstein at openjdk.org Thu Nov 28 08:33:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 08:33:23 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v2] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: fixed graph objects equality ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/ad4d0761..2f1157ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=00-01 Stats: 79 lines in 7 files changed: 63 ins; 7 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From chagedorn at openjdk.org Thu Nov 28 08:37:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 08:37:15 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v4] In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in-order cloning. > > #### Why Does Loop Unswitching Use In-Order? > The main reason wa... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add some visualization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22275/files - new: https://git.openjdk.org/jdk/pull/22275/files/dfcbf459..07fca8a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=02-03 Stats: 25 lines in 1 file changed: 24 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22275/head:pull/22275 PR: https://git.openjdk.org/jdk/pull/22275 From epeter at openjdk.org Thu Nov 28 08:37:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Nov 2024 08:37:15 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v4] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Thu, 28 Nov 2024 08:33:57 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add some visualization Nice refactoring, looks good! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22275#pullrequestreview-2467124879 From chagedorn at openjdk.org Thu Nov 28 08:44:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 08:44:41 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v4] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: <--PfO-Jikt7-6cbgwmG-sL6a_6gt_G-euosBeBE-IHg=.4a8ac154-f665-4330-8af2-2e864580e4e5@github.com> On Thu, 28 Nov 2024 08:37:15 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add some visualization Thanks Emanuel! I think I will wait until after the fork to integrate this since we are semantically changing the order which might have some unforeseeable side effects. I don't think this needs to go into JDK 24. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22275#issuecomment-2505551385 From chagedorn at openjdk.org Thu Nov 28 09:00:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 09:00:25 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v5] In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in-order cloning. > > #### Why Does Loop Unswitching Use In-Order? > The main reason wa... Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Revert "8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor" This reverts commit 550933659a8021131d9d1424fc6ff77b51745cbe. - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22275/files - new: https://git.openjdk.org/jdk/pull/22275/files/07fca8a2..8ea09a10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22275&range=03-04 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22275.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22275/head:pull/22275 PR: https://git.openjdk.org/jdk/pull/22275 From epeter at openjdk.org Thu Nov 28 09:00:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Nov 2024 09:00:25 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v5] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Thu, 28 Nov 2024 08:56:53 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor" > > This reverts commit 550933659a8021131d9d1424fc6ff77b51745cbe. > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22275#pullrequestreview-2467211464 From chagedorn at openjdk.org Thu Nov 28 09:00:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 09:00:26 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v4] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Thu, 28 Nov 2024 08:37:15 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add some visualization Accidentally pushed the next patch to the wrong branch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22275#issuecomment-2505578921 From rcastanedalo at openjdk.org Thu Nov 28 09:19:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 09:19:39 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v2] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 08:33:23 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > fixed graph objects equality src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalLayoutManager.java line 75: > 73: > 74: static public void apply(LayoutGraph graph) { > 75: removeSelfEdges(graph); The proposed code removes self-edges unconditionally. This is OK for the sea-of-nodes layout, but for the CFG layout we do need to draw self-edges (think about single basic block loops). Here is an (artificially edited) example of how the new algorithm misses drawing a self-edge for B7 (left is current IGV, right is IGV with your proposed changes): ![Screenshot from 2024-11-28 10-09-09](https://github.com/user-attachments/assets/e91b55af-0fbf-4f28-b7e6-558dfcada42a) Here is the artificially edited graph file that illustrates the issue: [self-edges.zip](https://github.com/user-attachments/files/17945729/self-edges.zip) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22402#discussion_r1861795723 From tholenstein at openjdk.org Thu Nov 28 09:30:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 09:30:26 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v3] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: revert copyright changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/2f1157ef..c08d99e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From rcastanedalo at openjdk.org Thu Nov 28 09:41:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 09:41:40 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v3] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 09:30:26 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > revert copyright changes Before this change, when the layout quality in the "Stable sea of nodes" view had degraded too much, one could switch to the regular "Sea of nodes" view and back to "Stable sea of nodes", to start again from the regular layout. This is not possible anymore with these proposed changes, which severely limits the usability of the "Stable sea of nodes" view in my opinion. Is this something that can be easily fixed? ------------- PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2467376735 From rcastanedalo at openjdk.org Thu Nov 28 09:45:41 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 09:45:41 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v3] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 09:30:26 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > revert copyright changes In the "sea of nodes" view, clicking on an edge used to select its connected nodes, but not after this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2467392021 From duke at openjdk.org Thu Nov 28 10:02:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 10:02:03 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v6] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge branch 'master' into unsigned-div-opts - Rename to cast - Fix bug in unsigned_mod_ideal - Improve tests, remove edge case - Resolve review comments - Remove transform_unsigned_* and inline - Fix test comments - Minor fixes - Add 2^k-1 test - Fix code style - ... and 11 more: https://git.openjdk.org/jdk/compare/40ecd3fa...5bc40642 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/a249d81b..5bc40642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=04-05 Stats: 244505 lines in 4737 files changed: 95553 ins; 129533 del; 19419 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Thu Nov 28 10:07:22 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 10:07:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v7] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Maker smaller adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/5bc40642..d4df20b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=05-06 Stats: 14 lines in 1 file changed: 5 ins; 8 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Thu Nov 28 10:07:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 10:07:23 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: <93d_HYKwUltf138FuPWPLOVKo6jOyj7z3x2akf6nV-8=.f5b8d800-9d95-4bb5-9730-b851e9e7e154@github.com> References: <93d_HYKwUltf138FuPWPLOVKo6jOyj7z3x2akf6nV-8=.f5b8d800-9d95-4bb5-9730-b851e9e7e154@github.com> Message-ID: <9olLl5nlZC3Sgm-PPpOzqIhLfQvtjBQSnT4KJCSXbzw=.2e9ce64a-dc29-4697-baaa-c35b37134287@github.com> On Mon, 25 Nov 2024 14:16:47 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/divnode.cpp line 488: >> >>> 486: >>> 487: const Type* t = phase->type(div->in(2)); >>> 488: if (t == TypeClass::ONE) { // Identity? >> >> You can move this into `l == 0 || l == 1` below. > > This is also the same for ModI/LNode::Ideal. I think all of this code should be reviewed as part of an RFE and then changed together Fixed >> src/hotspot/share/opto/divnode.cpp line 1184: >> >>> 1182: >>> 1183: if (con == 1) { >>> 1184: return ConNode::make(TypeClass::ZERO); >> >> This should be in `Value` instead. > > This is analogous to code for ModI/LNode::Ideal. I'll file an RFE that this should be changed in all locations Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1861867154 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1861867030 From duke at openjdk.org Thu Nov 28 10:11:39 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 10:11:39 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v3] In-Reply-To: References: <8f0PsjaW3DKlC4Y7TIPxYhaXICGK4Zq2vslCXQ4GZo0=.ef4f782b-7c51-4db5-b770-0b7aaecf9889@github.com> Message-ID: On Mon, 25 Nov 2024 15:00:40 GMT, Quan Anh Mai wrote: >> Drive-by comment: Such uncast optimizations are definitely non-trivial changes as they tend to trigger other issues by allowing subgraphs to be folded that would otherwise not be folded. So let's make sure we have proper tests for this. In my opinion, putting some of this work into a separate RFE is perfectly fine. > > This is fair, but this point does not apply to my other suggestions. though. Btw `eqv_uncast` is used in `XorNode::Value`. Other comments were addressed. RFE for this one: https://bugs.openjdk.org/browse/JDK-8345170 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1861875156 From rcastanedalo at openjdk.org Thu Nov 28 10:42:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 10:42:43 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v3] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 09:42:45 GMT, Roberto Casta?eda Lozano wrote: > In the "sea of nodes" view, clicking on an edge used to select its connected nodes, but not after this change. Here is a patch on top of this PR that re-introduces the missing functionality: https://github.com/openjdk/jdk/commit/389ab05ed5505930fbcda7c864316567f3e0ff08. Fee free to incorporate it into this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2505799479 From shade at openjdk.org Thu Nov 28 10:58:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Nov 2024 10:58:17 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only Message-ID: Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/22432/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22432&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345172 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22432/head:pull/22432 PR: https://git.openjdk.org/jdk/pull/22432 From shade at openjdk.org Thu Nov 28 11:07:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Nov 2024 11:07:13 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v2] In-Reply-To: References: Message-ID: <3A8oM7u8ktLs3B52t9Ik1Le5Oc2TZkTQcmWrxmkzFnc=.d14a4920-1fa5-4063-bc07-80dbdd340899@github.com> > Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Tja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22432/files - new: https://git.openjdk.org/jdk/pull/22432/files/abdc1bd0..02db6386 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22432&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22432&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22432/head:pull/22432 PR: https://git.openjdk.org/jdk/pull/22432 From kbarrett at openjdk.org Thu Nov 28 12:10:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 28 Nov 2024 12:10:45 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub Message-ID: Please review this change to RISCV code to remove a -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. It was calling MacroAssembler::movptr with the second (address) argument being a literal 0. Rather than changing it to use nullptr for that argument, I've instead changed it to call the movptr2 helper function, which takes the target address as a unint64_t. This eliminates the conversion of 0 to a pointer and then back to an integer 0. It seemed to me more natural to use that helper directly, as it was presumed that was what ended up being called anyway. But the riscv porters should weigh in on whether that's a good approach to dealing with this case. Testing: GHA sanity tests, which includes building for linux-riscv64. I don't have the capability to run tests for this platform, so hoping someone from the riscv porters can do more testing. ------------- Commit messages: - fix riscv Changes: https://git.openjdk.org/jdk/pull/22435/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22435&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345159 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22435.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22435/head:pull/22435 PR: https://git.openjdk.org/jdk/pull/22435 From tholenstein at openjdk.org Thu Nov 28 12:44:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 12:44:16 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout Message-ID: This PR depends on https://github.com/openjdk/jdk/pull/22402 and also includes the code of the depending PR. Checkout this PR locally: `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` `git checkout pull/22430` Enhance IGV with an interactive feature that allows the user to move nodes within the layout by dragging them to new positions. Manual adjusting the node layout will help the user better understand the graph structure. When users drag nodes, connections will dynamically adjust to maintain the graph structure, and nodes will stay in their new positions until the layout is reset. 1 2 3 ------------- Depends on: https://git.openjdk.org/jdk/pull/22402 Commit messages: - Merge branch 'pr/22402' into JDK-8343705 - 8343705: IGV: Interactive Node Moving in Hierarchical Layout Changes: https://git.openjdk.org/jdk/pull/22430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343705 Stats: 1300 lines in 9 files changed: 1287 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430 PR: https://git.openjdk.org/jdk/pull/22430 From thartmann at openjdk.org Thu Nov 28 12:50:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 28 Nov 2024 12:50:38 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 08:57:08 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 and also includes the code of the depending PR. > Checkout this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > Enhance IGV with an interactive feature that allows the user to move nodes within the layout by dragging them to new positions. > > Manual adjusting the node layout will help the user better understand the graph structure. When users drag nodes, connections will dynamically adjust to maintain the graph structure, and nodes will stay in their new positions until the layout is reset. > > > > 1 > > 2 > > 3 I thoroughly tested this with a few complex graphs and it works great. This is an awesome enhancement! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2468066200 From amitkumar at openjdk.org Thu Nov 28 13:15:59 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 28 Nov 2024 13:15:59 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v3] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: cover more TypeFunc objects ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21782/files - new: https://git.openjdk.org/jdk/pull/21782/files/a3a90b23..e9736788 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=01-02 Stats: 197 lines in 8 files changed: 136 ins; 55 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Thu Nov 28 13:20:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 28 Nov 2024 13:20:42 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v2] In-Reply-To: References: Message-ID: <2jyj0PGAJAvyE3zmwQqaDdjaZsMtYkVjdI7bhzdz1FE=.ced468e6-5450-4c4f-8597-e882d7f65380@github.com> On Thu, 28 Nov 2024 01:23:29 GMT, Vladimir Ivanov wrote: > FYI I played a bit with template functions as TypeFunc factories [1]. Overall, it looks promising. I'm perfectly fine to cover that in a follow-up PR, but it would be good to align this patch accordingly. Sure, that would be fine by me as you have already covered part of it. > > Some observations follow: > > (1) There's a bunch of other TypeFunc usages you may want to cover. Okay I have increased the coverage. But still some of them I left alone as they are class-specific, they were accepting argument and using that in the Type creation. Which would be perfectly fine with the variadic templates but I am not sure if it could be done with current pattern of changes. static const TypeFunc* alloc_type(const Type* t) { const Type** fields = TypeTuple::fields(ParmLimit - TypeFunc::Parms); fields[AllocSize] = TypeInt::POS; fields[KlassNode] = TypeInstPtr::NOTNULL; fields[InitialTest] = TypeInt::BOOL; fields[ALength] = t; // length (can be a bad length) fields[ValidLengthTest] = TypeInt::BOOL; const TypeTuple *domain = TypeTuple::make(ParmLimit, fields); // create result type (range) fields = TypeTuple::fields(1); fields[TypeFunc::Parms+0] = TypeRawPtr::NOTNULL; // Returned oop const TypeTuple *range = TypeTuple::make(TypeFunc::Parms+1, fields); return TypeFunc::make(domain, range); } ``` > (2) It turns out there are 2 types of `TypeFunc`s declared on OptoRuntime: those related to OptoRuntime stubs (populated by `C2_STUBS_DO`) and adhoc types for generated stubs and runtime calls. For the former a dedicated init method colocated with runtime entry looks preferred (`*_Type_Init()`) since both representations can be easily compared. For the former, an inline construction in `OptoRuntime::initialize_types()` looks more appropriate. > Most of them are now moved into `OptoRuntime::initialize_types()` method. And their `*_init` method is called from there. The other one are you talking about the class specific ones ? Or something else I need to update in this PR ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2506110873 From rcastanedalo at openjdk.org Thu Nov 28 13:29:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 13:29:37 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 08:57:08 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... I tested and benchmarked this change using the same methodology as in [JDK-8314512](https://github.com/openjdk/jdk/pull/22402). Both testing and performance results are OK. The change seems to introduce a slight overhead of around 10% w.r.t. JDK-8314512 but still speeds up the baseline (current IGV) by around 1.4x thanks to disabling assertions in JDK-8314512. See full results here: [results.ods](https://github.com/user-attachments/files/17948626/results.ods). I will start to review the actual code changes. ------------- PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2468184458 From amitkumar at openjdk.org Thu Nov 28 13:34:26 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 28 Nov 2024 13:34:26 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v4] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into tf_v2 - fixing the merge conflict - cover more TypeFunc objects - refactor - extra space - inline accessor methods - Revert "mac build workaround" This reverts commit ac2fe31d1f134f8d29aea5e6816ec77f0719845a. - final change - mac build workaround - init change ------------- Changes: https://git.openjdk.org/jdk/pull/21782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=03 Stats: 1057 lines in 9 files changed: 734 ins; 73 del; 250 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From duke at openjdk.org Thu Nov 28 13:36:01 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 13:36:01 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: abs(MIN_INT) is not positive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/d4df20b7..b0d72683 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=06-07 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From rcastanedalo at openjdk.org Thu Nov 28 13:49:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 13:49:38 GMT Subject: RFR: 8345154: IGV: Show Parse and Assertion Predicate type as extra label In-Reply-To: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> References: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> Message-ID: <_bvZrVkClUnXvuQuGOomRQDAomiCTwqAO0q5PK9x0LQ=.1153e27b-d7ef-4b44-a08d-e157082b9fed@github.com> On Thu, 28 Nov 2024 07:31:55 GMT, Christian Hagedorn wrote: > `ParsePredicate` and `If/RangeCheck` nodes for Assertion Predicates dump their respective type to the `dump_spec` string. We can parse this info and show it in the "Show custom node info" filter as done for other nodes already: > > ![Screenshot from 2024-11-28 08-16-23](https://github.com/user-attachments/assets/0de53873-8aa2-47b9-9275-d90c05deecd3) > > Thanks, > Christian Looks good and trivial. Thanks for doing this, Christian! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22428#pullrequestreview-2468229953 From roland at openjdk.org Thu Nov 28 13:56:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 28 Nov 2024 13:56:59 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v4] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - more - merge - more - one more test - Merge branch 'master' into JDK-8342692 - more - more - Merge branch 'master' into JDK-8342692 - more - more - ... and 9 more: https://git.openjdk.org/jdk/compare/ac3bbf7d...32d8d630 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=03 Stats: 1310 lines in 23 files changed: 1247 ins; 14 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From chagedorn at openjdk.org Thu Nov 28 14:01:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 14:01:42 GMT Subject: RFR: 8345154: IGV: Show Parse and Assertion Predicate type as extra label In-Reply-To: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> References: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> Message-ID: On Thu, 28 Nov 2024 07:31:55 GMT, Christian Hagedorn wrote: > `ParsePredicate` and `If/RangeCheck` nodes for Assertion Predicates dump their respective type to the `dump_spec` string. We can parse this info and show it in the "Show custom node info" filter as done for other nodes already: > > ![Screenshot from 2024-11-28 08-16-23](https://github.com/user-attachments/assets/0de53873-8aa2-47b9-9275-d90c05deecd3) > > Thanks, > Christian Thanks Roberto for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22428#issuecomment-2506186882 From chagedorn at openjdk.org Thu Nov 28 14:01:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 14:01:43 GMT Subject: Integrated: 8345154: IGV: Show Parse and Assertion Predicate type as extra label In-Reply-To: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> References: <4WSS-rn66PtP8MNQYWiB0zneuMGmB3z6XjvAptQp80I=.1c5aa71a-9a57-412a-a231-9e7dae647535@github.com> Message-ID: On Thu, 28 Nov 2024 07:31:55 GMT, Christian Hagedorn wrote: > `ParsePredicate` and `If/RangeCheck` nodes for Assertion Predicates dump their respective type to the `dump_spec` string. We can parse this info and show it in the "Show custom node info" filter as done for other nodes already: > > ![Screenshot from 2024-11-28 08-16-23](https://github.com/user-attachments/assets/0de53873-8aa2-47b9-9275-d90c05deecd3) > > Thanks, > Christian This pull request has now been integrated. Changeset: 7dc00d39 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/7dc00d39b4e184a59cbcd644d22db61b1abe8a4b Stats: 26 lines in 2 files changed: 24 ins; 0 del; 2 mod 8345154: IGV: Show Parse and Assertion Predicate type as extra label Reviewed-by: rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22428 From tholenstein at openjdk.org Thu Nov 28 14:18:56 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 14:18:56 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout Message-ID: This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 git checkout pull/22438 Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. ------------- Depends on: https://git.openjdk.org/jdk/pull/22430 Commit messages: - JDK-8345041 IGV: Free Placement Mode in IGV Layout Changes: https://git.openjdk.org/jdk/pull/22438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345041 Stats: 612 lines in 11 files changed: 592 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From rcastanedalo at openjdk.org Thu Nov 28 14:29:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 14:29:40 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 08:57:08 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2468317217 From qamai at openjdk.org Thu Nov 28 14:31:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 14:31:41 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:36:01 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > abs(MIN_INT) is not positive src/hotspot/share/opto/divnode.cpp line 488: > 486: > 487: const Type* t = phase->type(div->in(2)); > 488: const TypeClass* tl = t->cast(); I believe `is_int()` will assert when `t` is not a `TypeInt`, what you want here is a `try_cast` that calls `isa_int()` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862295237 From qamai at openjdk.org Thu Nov 28 14:37:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 14:37:42 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:36:01 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > abs(MIN_INT) is not positive test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 88: > 86: > 87: @Test > 88: @IR(failOn = {IRNode.DIV}) All these should be `failOn = {IRNode.UDIV}` test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 129: > 127: IRNode.DIV_BY_ZERO_TRAP, "1" > 128: }) > 129: // Hotspot should keep the division because it may cause a division by zero trap This comment is actually incorrect, the division is kept because the transformation is wrong. test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 146: > 144: @IR(counts = {IRNode.URSHIFT, "1"}) > 145: public int divByPow2Big(int x) { > 146: return Integer.divideUnsigned(x, -2147483648); // -2147483648 = Integer.parseUnsignedInt("2147483648") You should use `Integer.MIN_VALUE` test/hotspot/jtreg/compiler/c2/irTests/UDivLNodeIdealizationTests.java line 146: > 144: @IR(counts = {IRNode.URSHIFT, "1"}) > 145: public long divByPow2Big(long x) { > 146: return Long.divideUnsigned(x, -9223372036854775808L); // -9223372036837998592 = Long.parseUnsignedLong("9223372036854775808") Similarly, this should be `Long.MIN_VALUE` or `1 << 63` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862298357 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862300217 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862301584 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862304464 From roland at openjdk.org Thu Nov 28 14:42:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 28 Nov 2024 14:42:23 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v5] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JDK-8342692 - whitespaces - more - merge - more - one more test - Merge branch 'master' into JDK-8342692 - more - more - Merge branch 'master' into JDK-8342692 - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=04 Stats: 1310 lines in 23 files changed: 1247 ins; 14 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From mli at openjdk.org Thu Nov 28 14:50:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 28 Nov 2024 14:50:37 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: <5JsCO75AM4JE0lGzKRcWYWEe9cQIWE8nsh0XiRkH-Ug=.6278750b-5516-42d9-be60-cf96249effe0@github.com> On Thu, 28 Nov 2024 12:05:28 GMT, Kim Barrett wrote: > Please review this change to RISCV code to remove a > -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. > > It was calling MacroAssembler::movptr with the second (address) argument being > a literal 0. Rather than changing it to use nullptr for that argument, I've > instead changed it to call the movptr2 helper function, which takes the target > address as a unint64_t. This eliminates the conversion of 0 to a pointer and > then back to an integer 0. It seemed to me more natural to use that helper > directly, as it was presumed that was what ended up being called anyway. But > the riscv porters should weigh in on whether that's a good approach to dealing > with this case. > > Testing: GHA sanity tests, which includes building for linux-riscv64. I don't > have the capability to run tests for this platform, so hoping someone from the > riscv porters can do more testing. Looks good to me. It might be good to have @robehn have a look too. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22435#pullrequestreview-2468369808 From epeter at openjdk.org Thu Nov 28 14:57:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Nov 2024 14:57:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. This is really amazing, the feature. Really would make me start using IGV. I scanned the code changes quickly, and it seems reasonable, at least to a non-IGV developer ? src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/FreeInteractiveLayoutManager.java line 219: > 217: > 218: double deltaX = posX - otherNode.getX(); > 219: double deltaY = posY - otherNode.getY(); What happens if this distance is zero? Does the division below behave ok? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2468377365 PR Review Comment: https://git.openjdk.org/jdk/pull/22438#discussion_r1862333177 From epeter at openjdk.org Thu Nov 28 14:57:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Nov 2024 14:57:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:51:23 GMT, Emanuel Peter wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/FreeInteractiveLayoutManager.java line 219: > >> 217: >> 218: double deltaX = posX - otherNode.getX(); >> 219: double deltaY = posY - otherNode.getY(); > > What happens if this distance is zero? Does the division below behave ok? If we get issues here, we can always check for zero and add some random non-zero noise to force different position to get the two nodes to separate in a sane way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22438#discussion_r1862335424 From tholenstein at openjdk.org Thu Nov 28 15:20:12 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 15:20:12 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML Message-ID: This PR depends on https://github.com/openjdk/jdk/pull/22402 [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. color ### Whats new - Now colors are saved with the XML as well - Colors are kept when changing to a different graph - The user can remove the color again: This uses the color from the filter or WHITE otherwise ------------- Depends on: https://git.openjdk.org/jdk/pull/22402 Commit messages: - JDK-8345039: IGV: save user-defined node colors to XML Changes: https://git.openjdk.org/jdk/pull/22440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345039 Stats: 60 lines in 5 files changed: 54 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22440/head:pull/22440 PR: https://git.openjdk.org/jdk/pull/22440 From epeter at openjdk.org Thu Nov 28 15:36:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 28 Nov 2024 15:36:41 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:15:06 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Looks good, thanks Toby! A quick review from a non-IGV engineer ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22440#pullrequestreview-2468462635 From roland at openjdk.org Thu Nov 28 15:37:16 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 28 Nov 2024 15:37:16 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 Message-ID: Crash occurs when attempting to create a `Replicate` node that's input to a `VectorCast` node (for a `ConvL2I`) that's not supported by the platform (when run with `MaxVectorSize=8`). I think the pack for the `VectorCast` should be filtered out earlier as not implemented and I propose adding a test to `VectorCastNode::implemented()` for the type of its input to handle that corner case. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/22442/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22442&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343747 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22442/head:pull/22442 PR: https://git.openjdk.org/jdk/pull/22442 From rehn at openjdk.org Thu Nov 28 15:39:41 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 28 Nov 2024 15:39:41 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 12:05:28 GMT, Kim Barrett wrote: > Please review this change to RISCV code to remove a > -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. > > It was calling MacroAssembler::movptr with the second (address) argument being > a literal 0. Rather than changing it to use nullptr for that argument, I've > instead changed it to call the movptr2 helper function, which takes the target > address as a unint64_t. This eliminates the conversion of 0 to a pointer and > then back to an integer 0. It seemed to me more natural to use that helper > directly, as it was presumed that was what ended up being called anyway. But > the riscv porters should weigh in on whether that's a good approach to dealing > with this case. > > Testing: GHA sanity tests, which includes building for linux-riscv64. I don't > have the capability to run tests for this platform, so hoping someone from the > riscv porters can do more testing. Change is okay and I sanity tested, all ok. But I must object to keep the underlying type of address. For all our purposes it is an integer value. I.e. `address dest_end = dest->_total_start + dest->_total_size;` Furthermore address 0 is a valid _address_, changing that to nullptr make little sense. So I believe the correct fix is to change the type of address. Anyhow thanks for fixing the warning. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22435#pullrequestreview-2468467704 From roland at openjdk.org Thu Nov 28 15:43:39 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 28 Nov 2024 15:43:39 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v5] In-Reply-To: References: Message-ID: <0QhHkO9hp_uxL9EC5AYIhm95Gw2DeXXQZgVG2L0NDCw=.6076af9b-5922-48f7-ae41-66edcbe943ca@github.com> On Thu, 28 Nov 2024 14:42:23 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8342692 > - whitespaces > - more > - merge > - more > - one more test > - Merge branch 'master' into JDK-8342692 > - more > - more > - Merge branch 'master' into JDK-8342692 > - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 I pushed an update that should fix all test failures except the one in `compiler/escapeAnalysis/TestMissingAntiDependency.java` (covered by JDK-8341976). A lot of them were caused by the following part of the change: > In the case of a long counted loop, the loop is transformed into a regular loop with a new limit and transformed range checks that's later turned into an in counted loop. The int counted loop doesn't need loop limit checks because of the way it's constructed. There's an assert that catches that we don't attempt to add one. I ran into test failures where, by the time the int counted loop is created, the fact that the number of iterations of the loop is small enough to not need a loop limit check gets lost. I added a cast to make sure the narrowed limit's type is not lost (I had to do something similar for loop nests). But then, I ran into the same issue again because the cast was pushed through a sub or add and the narrowed type was lost. I propose that pushing casts through sub/add be only done after loop opts are over (same as what's done for range check CastII). So I removed that part of the initial change and instead added some logic to pattern match the `CastLL` used by the loop nest for which the transformation of `(CastLL (AddL ...))` shouldn't be performed until the inner loop is turned into a counted loop. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2506391106 From duke at openjdk.org Thu Nov 28 16:02:04 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:02:04 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v9] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Improve tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/b0d72683..b32df25d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=07-08 Stats: 107 lines in 6 files changed: 93 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Thu Nov 28 16:02:04 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:02:04 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v9] In-Reply-To: References: Message-ID: <8JYRefwqi8ZEep9T2GCIjv81EnGGzZ9KdNEgC2m_mhg=.3695ae54-c908-49c8-99e8-d5a1670b38fd@github.com> On Mon, 25 Nov 2024 09:29:28 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/pull/22061/files#diff-48b0b8da547a3fe6aae9ea3ef20b4d708e47f2332ff6884478336f39d9eb9459R82 and https://github.com/openjdk/jdk/pull/22061/files#diff-24679e6505fe23e8a3ba73decaaf97896899c0a10956c437b8721fca33706ee2R82 should cover this I think. The containing method is marked with @DontCompile. > > You could use a similar trick with the constant method handles, as here: > https://github.com/openjdk/jdk/pull/21521/files#diff-d69ed849846cce04a18fe13fb35cd975ad533f0ef76d923745d97bdb27db7073 Tried to address this in my latest push. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862429181 From duke at openjdk.org Thu Nov 28 16:05:39 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:05:39 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: <2_D0mvT9AgzJivCwCkHN_wLCGcbi9ELWeK6qAdf0348=.26cb888f-6d6a-436e-ba20-2c1e830c8e99@github.com> On Thu, 28 Nov 2024 14:28:50 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> abs(MIN_INT) is not positive > > src/hotspot/share/opto/divnode.cpp line 488: > >> 486: >> 487: const Type* t = phase->type(div->in(2)); >> 488: const TypeClass* tl = t->cast(); > > I believe `is_int()` will assert when `t` is not a `TypeInt`, what you want here is a `try_cast` that calls `isa_int()` instead. If I understand correctly, you are referring to the fact that `t` might be top, which is not correctly handled here, right? If that is what you mean, I think I will fix it by handling this case separately like in unsigned_mod_ideal blow because `t` should only ever be TypeClass (i.e. long or int) or Top. Or am I missing something? const Type* t = phase->type(mod->in(2)); if (t == Type::TOP) { return nullptr; } const TypeClass* ti = t->cast(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862433331 From duke at openjdk.org Thu Nov 28 16:15:57 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:15:57 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v10] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add UDIV ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/b32df25d..ceeba78d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=08-09 Stats: 21 lines in 3 files changed: 5 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Thu Nov 28 16:15:58 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:15:58 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:30:45 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> abs(MIN_INT) is not positive > > test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 88: > >> 86: >> 87: @Test >> 88: @IR(failOn = {IRNode.DIV}) > > All these should be `failOn = {IRNode.UDIV}` Good catch, thanks! Fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862447426 From duke at openjdk.org Thu Nov 28 16:21:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:21:17 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v11] In-Reply-To: References: Message-ID: <1NZchZ-VRJghwgJn06Ab7jtWyTkXUsU6k6oooMNbITM=.dd6b300d-3937-4533-83ac-9743731e1f15@github.com> > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Stylistic improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/ceeba78d..38c6cb4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=09-10 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From qamai at openjdk.org Thu Nov 28 16:21:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 16:21:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: <2_D0mvT9AgzJivCwCkHN_wLCGcbi9ELWeK6qAdf0348=.26cb888f-6d6a-436e-ba20-2c1e830c8e99@github.com> References: <2_D0mvT9AgzJivCwCkHN_wLCGcbi9ELWeK6qAdf0348=.26cb888f-6d6a-436e-ba20-2c1e830c8e99@github.com> Message-ID: On Thu, 28 Nov 2024 16:01:52 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/divnode.cpp line 488: >> >>> 486: >>> 487: const Type* t = phase->type(div->in(2)); >>> 488: const TypeClass* tl = t->cast(); >> >> I believe `is_int()` will assert when `t` is not a `TypeInt`, what you want here is a `try_cast` that calls `isa_int()` instead. > > If I understand correctly, you are referring to the fact that `t` might be top, which is not correctly handled here, right? If that is what you mean, I think I will fix it by handling this case separately like in unsigned_mod_ideal blow because `t` should only ever be TypeClass (i.e. long or int) or Top. Or am I missing something? > > > const Type* t = phase->type(mod->in(2)); > if (t == Type::TOP) { > return nullptr; > } > const TypeClass* ti = t->cast(); Yes, I saw you checking for `nullptr` in the following line and thought you confused `is_int()` with `isa_int()`, the former cannot return a `nullptr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862450302 From qamai at openjdk.org Thu Nov 28 16:21:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 16:21:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v10] In-Reply-To: References: Message-ID: <3JHZLFop8N7LuJZ_8xTQQLmCUlkeeS7IeZSW14e9R7Y=.e02189b8-0e28-4874-94ef-3772fb6d9d51@github.com> On Thu, 28 Nov 2024 16:15:57 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Add UDIV test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 39: > 37: */ > 38: public class ModINodeIdealizationTests { > 39: public static final int RANDOM_POWER_OF_2 = 1 << (1 + new Random().nextInt(30)); We use `Utils.getRandomInstance()` so that tests can be replayed with the same seed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862451773 From qamai at openjdk.org Thu Nov 28 16:21:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 16:21:18 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:32:41 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> abs(MIN_INT) is not positive > > test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 146: > >> 144: @IR(counts = {IRNode.URSHIFT, "1"}) >> 145: public int divByPow2Big(int x) { >> 146: return Integer.divideUnsigned(x, -2147483648); // -2147483648 = Integer.parseUnsignedInt("2147483648") > > You should use `Integer.MIN_VALUE` In general, I think `1 << someValue` would be better than the decimal representation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862453232 From duke at openjdk.org Thu Nov 28 16:26:49 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:26:49 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:31:54 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> abs(MIN_INT) is not positive > > test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 129: > >> 127: IRNode.DIV_BY_ZERO_TRAP, "1" >> 128: }) >> 129: // Hotspot should keep the division because it may cause a division by zero trap > > This comment is actually incorrect, the division is kept because the transformation is wrong. Which part of the transformation do you think is wrong? Do you think there might also be overflow issues and that's why it cannot be replaced with x? (I took over this comment and test from the pre-existing test for the signed version: test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862460329 From duke at openjdk.org Thu Nov 28 16:37:21 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:37:21 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v12] In-Reply-To: References: Message-ID: <33XxZXYssFqQiwWaddJaYWNO_Qeu-HnzRNmeeTtd0Cw=.1abbd1bf-ff90-42a2-8fd3-1ca24e9d3168@github.com> > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Use Utils.getRandomInstance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/38c6cb4b..c34ef039 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=10-11 Stats: 13 lines in 6 files changed: 6 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Thu Nov 28 16:37:22 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 28 Nov 2024 16:37:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v10] In-Reply-To: <3JHZLFop8N7LuJZ_8xTQQLmCUlkeeS7IeZSW14e9R7Y=.e02189b8-0e28-4874-94ef-3772fb6d9d51@github.com> References: <3JHZLFop8N7LuJZ_8xTQQLmCUlkeeS7IeZSW14e9R7Y=.e02189b8-0e28-4874-94ef-3772fb6d9d51@github.com> Message-ID: On Thu, 28 Nov 2024 16:16:42 GMT, Quan Anh Mai wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add UDIV > > test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 39: > >> 37: */ >> 38: public class ModINodeIdealizationTests { >> 39: public static final int RANDOM_POWER_OF_2 = 1 << (1 + new Random().nextInt(30)); > > We use `Utils.getRandomInstance()` so that tests can be replayed with the same seed. Thanks for the tip! I was already wondering if something like this existed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862470866 From qamai at openjdk.org Thu Nov 28 16:37:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 28 Nov 2024 16:37:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v8] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 16:23:59 GMT, theoweidmannoracle wrote: >> test/hotspot/jtreg/compiler/c2/irTests/UDivINodeIdealizationTests.java line 129: >> >>> 127: IRNode.DIV_BY_ZERO_TRAP, "1" >>> 128: }) >>> 129: // Hotspot should keep the division because it may cause a division by zero trap >> >> This comment is actually incorrect, the division is kept because the transformation is wrong. > > Which part of the transformation do you think is wrong? Do you think there might also be overflow issues and that's why it cannot be replaced with x? > > (I took over this comment and test from the pre-existing test for the signed version: test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java.) Yupp it can overflow, if it does not then we can totally do this transformation. I think you should remove this line from the newly added test, feel free to also remove it from the old one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1862472357 From chagedorn at openjdk.org Thu Nov 28 17:00:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 28 Nov 2024 17:00:40 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:15:06 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Nice! I quickly tried it out on Linux and found two things: - The color is not kept when going to another graph. - The "No Color" button looks like this (but it's working when clicking on "No..."): ![image](https://github.com/user-attachments/assets/f0ed0c5f-82e7-4017-a0da-965fab5a6a55) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2506517205 From rcastanedalo at openjdk.org Thu Nov 28 18:06:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 18:06:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Hi Toby, I get the following assertion failure when I open the regular sea-of-nodes view of the attached graph [free-placement-assertion-failure.zip](https://github.com/user-attachments/files/17951261/free-placement-assertion-failure.zip) and enable the "Simplify graph" and "Condense graph" filters: ![Screenshot from 2024-11-28 19-02-19](https://github.com/user-attachments/assets/30945ef8-f3eb-487f-af9b-ab8055f06ea5) ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2468674688 From rcastanedalo at openjdk.org Thu Nov 28 18:11:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 28 Nov 2024 18:11:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > Hi Toby, I get the following assertion failure when I open the regular sea-of-nodes view of the attached graph [free-placement-assertion-failure.zip](https://github.com/user-attachments/files/17951261/free-placement-assertion-failure.zip) and enable the "Simplify graph" and "Condense graph" filters: > > ![Screenshot from 2024-11-28 19-02-19](https://private-user-images.githubusercontent.com/8792647/390877413-30945ef8-f3eb-487f-af9b-ab8055f06ea5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI4MTczMjMsIm5iZiI6MTczMjgxNzAyMywicGF0aCI6Ii84NzkyNjQ3LzM5MDg3NzQxMy0zMDk0NWVmOC1mM2ViLTQ4N2YtYWY5Yi1hYjgwNTVmMDZlYTUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTEyOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDExMjhUMTgwMzQzWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OTY5MDQ5NDA4ZTFmYmI0M2Y5ZTdlOWYwOTk1ZTQyYzhlZWNkOGY5NTA1MzJjZDZhY2IzNTdlYmJlMTQ4ZmI2YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.l-NEc_BKsD8V81uiCk7M-w2XFZzKLEuOO-rPh5cumU4) I just checked and turns out this is a regression in https://github.com/openjdk/jdk/pull/22402, please address it in that pull request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22438#issuecomment-2506608646 From tholenstein at openjdk.org Thu Nov 28 18:24:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 28 Nov 2024 18:24:28 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v4] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <6IgEW6leVGxRBYqn4-QpWkL9TEgf1h_zBrBWkP48GaM=.fb761a40-532b-4e5f-bd85-fa5792c47edf@github.com> > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: undo adding equals functions to Slots ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/c08d99e9..12332f25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=02-03 Stats: 38 lines in 3 files changed: 0 ins; 38 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From shade at openjdk.org Thu Nov 28 18:46:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Nov 2024 18:46:13 GMT Subject: RFR: 8345219: C2: Avoid bailing to interpreter stubs for signalling NaNs on x86_64 Message-ID: Found this while cleaning up x86_32 code for removal. In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. Additional testing: - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/22446/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22446&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345219 Stats: 208 lines in 3 files changed: 206 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22446/head:pull/22446 PR: https://git.openjdk.org/jdk/pull/22446 From shade at openjdk.org Thu Nov 28 18:46:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 28 Nov 2024 18:46:14 GMT Subject: RFR: 8345219: C2: Avoid bailing to interpreter stubs for signalling NaNs on x86_64 In-Reply-To: References: Message-ID: <3s1m_9LsI2x3X5dwBjrEb5z7LyOpuVc8nfokuZvqoMQ=.5473ae9f-1502-48f1-ab5e-715156a96edb@github.com> On Thu, 28 Nov 2024 18:22:24 GMT, Aleksey Shipilev wrote: > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` As expected, none of this matters when C2 intrinsics work: Benchmark Mode Cnt Score Error Units # Baseline DoubleBitConversion.doubleToLongBits_NaN avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToLongBits_one avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToLongBits_zero avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToRawLongBits_NaN avgt 9 0.420 ? 0.041 ns/op DoubleBitConversion.doubleToRawLongBits_one avgt 9 0.413 ? 0.012 ns/op DoubleBitConversion.doubleToRawLongBits_zero avgt 9 0.412 ? 0.020 ns/op DoubleBitConversion.longBitsToDouble_NaN avgt 9 0.413 ? 0.007 ns/op DoubleBitConversion.longBitsToDouble_one avgt 9 0.409 ? 0.007 ns/op DoubleBitConversion.longBitsToDouble_zero avgt 9 0.414 ? 0.012 ns/op FloatBitConversion.floatToIntBits_NaN avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_one avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_zero avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToRawIntBits_NaN avgt 9 0.410 ? 0.005 ns/op FloatBitConversion.floatToRawIntBits_one avgt 9 0.412 ? 0.008 ns/op FloatBitConversion.floatToRawIntBits_zero avgt 9 0.413 ? 0.004 ns/op FloatBitConversion.intBitsToFloat_NaN avgt 9 0.412 ? 0.008 ns/op FloatBitConversion.intBitsToFloat_one avgt 9 0.413 ? 0.009 ns/op FloatBitConversion.intBitsToFloat_zero avgt 9 0.421 ? 0.022 ns/op # Patched DoubleBitConversion.doubleToLongBits_NaN avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToLongBits_one avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToLongBits_zero avgt 9 0.542 ? 0.001 ns/op DoubleBitConversion.doubleToRawLongBits_NaN avgt 9 0.425 ? 0.036 ns/op DoubleBitConversion.doubleToRawLongBits_one avgt 9 0.418 ? 0.009 ns/op DoubleBitConversion.doubleToRawLongBits_zero avgt 9 0.416 ? 0.017 ns/op DoubleBitConversion.longBitsToDouble_NaN avgt 9 0.412 ? 0.004 ns/op DoubleBitConversion.longBitsToDouble_one avgt 9 0.412 ? 0.010 ns/op DoubleBitConversion.longBitsToDouble_zero avgt 9 0.414 ? 0.005 ns/op FloatBitConversion.floatToIntBits_NaN avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_one avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_zero avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToRawIntBits_NaN avgt 9 0.410 ? 0.005 ns/op FloatBitConversion.floatToRawIntBits_one avgt 9 0.408 ? 0.007 ns/op FloatBitConversion.floatToRawIntBits_zero avgt 9 0.413 ? 0.015 ns/op FloatBitConversion.intBitsToFloat_NaN avgt 9 0.411 ? 0.008 ns/op FloatBitConversion.intBitsToFloat_one avgt 9 0.409 ? 0.008 ns/op FloatBitConversion.intBitsToFloat_zero avgt 9 0.426 ? 0.011 ns/op It does matter a lot when the choice is to go through interpreter native entry (slow) or via compiled native adapter (fast): # Baseline, -XX:-InlineMathNatives DoubleBitConversion.doubleToLongBits_NaN avgt 9 0.604 ? 0.015 ns/op DoubleBitConversion.doubleToLongBits_one avgt 9 97.382 ? 1.364 ns/op DoubleBitConversion.doubleToLongBits_zero avgt 9 97.636 ? 2.620 ns/op DoubleBitConversion.doubleToRawLongBits_NaN avgt 9 96.162 ? 0.513 ns/op DoubleBitConversion.doubleToRawLongBits_one avgt 9 98.678 ? 3.378 ns/op DoubleBitConversion.doubleToRawLongBits_zero avgt 9 97.374 ? 3.878 ns/op DoubleBitConversion.longBitsToDouble_NaN avgt 9 96.753 ? 3.659 ns/op DoubleBitConversion.longBitsToDouble_one avgt 9 97.173 ? 2.879 ns/op DoubleBitConversion.longBitsToDouble_zero avgt 9 96.375 ? 2.150 ns/op FloatBitConversion.floatToIntBits_NaN avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_one avgt 9 95.868 ? 2.192 ns/op FloatBitConversion.floatToIntBits_zero avgt 9 97.377 ? 2.346 ns/op FloatBitConversion.floatToRawIntBits_NaN avgt 9 95.947 ? 2.211 ns/op FloatBitConversion.floatToRawIntBits_one avgt 9 97.705 ? 3.467 ns/op FloatBitConversion.floatToRawIntBits_zero avgt 9 96.052 ? 2.359 ns/op FloatBitConversion.intBitsToFloat_NaN avgt 9 98.793 ? 1.997 ns/op FloatBitConversion.intBitsToFloat_one avgt 9 97.201 ? 2.327 ns/op FloatBitConversion.intBitsToFloat_zero avgt 9 97.515 ? 1.939 ns/op # Patched, -XX:-InlineMathNatives DoubleBitConversion.doubleToLongBits_NaN avgt 9 0.598 ? 0.025 ns/op DoubleBitConversion.doubleToLongBits_one avgt 9 4.508 ? 0.318 ns/op DoubleBitConversion.doubleToLongBits_zero avgt 9 4.370 ? 0.003 ns/op DoubleBitConversion.doubleToRawLongBits_NaN avgt 9 4.285 ? 0.295 ns/op DoubleBitConversion.doubleToRawLongBits_one avgt 9 4.281 ? 0.331 ns/op DoubleBitConversion.doubleToRawLongBits_zero avgt 9 4.155 ? 0.311 ns/op DoubleBitConversion.longBitsToDouble_NaN avgt 9 4.592 ? 0.362 ns/op DoubleBitConversion.longBitsToDouble_one avgt 9 4.815 ? 0.038 ns/op DoubleBitConversion.longBitsToDouble_zero avgt 9 4.800 ? 0.019 ns/op FloatBitConversion.floatToIntBits_NaN avgt 9 0.542 ? 0.001 ns/op FloatBitConversion.floatToIntBits_one avgt 9 4.510 ? 0.322 ns/op FloatBitConversion.floatToIntBits_zero avgt 9 4.501 ? 0.332 ns/op FloatBitConversion.floatToRawIntBits_NaN avgt 9 4.280 ? 0.336 ns/op FloatBitConversion.floatToRawIntBits_one avgt 9 4.278 ? 0.320 ns/op FloatBitConversion.floatToRawIntBits_zero avgt 9 4.144 ? 0.329 ns/op FloatBitConversion.intBitsToFloat_NaN avgt 9 4.551 ? 0.329 ns/op FloatBitConversion.intBitsToFloat_one avgt 9 4.549 ? 0.327 ns/op FloatBitConversion.intBitsToFloat_zero avgt 9 4.676 ? 0.328 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2506638455 From bulasevich at openjdk.org Thu Nov 28 21:06:56 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Nov 2024 21:06:56 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21276/files - new: https://git.openjdk.org/jdk/pull/21276/files/a358c6bc..f1a9d9a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From bulasevich at openjdk.org Thu Nov 28 21:22:40 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 28 Nov 2024 21:22:40 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Fri, 22 Nov 2024 02:21:01 GMT, Dean Long wrote: >> - it is not a load from a Constant Pool, so calling ldr_constant here is seems incorrect >> - the ldr_constant function utilizes either ldr (with a range limit of ?1MB) or, when -XX:-NearCpool is enabled, adrp (range limit of ?2GB) followed by ldr ? both of which may fall short when mutable data is allocated on the C heap. > > This change looks wrong, for a number of reasons. First, the dummy address would no longer be needed, and we could just use the same mov as the supports_instruction_patching() case. However, if supports_instruction_patching() is false, I think we are not allowed to generate a multi-instruction movz/movk sequence. We really need something like ldr_constant for this case, so that we load from memory. > However, as you point out, this is tied to NearCpool. For a far metadata slot access, ADR+LDR is the right answer. After this change, will there be any metadata left that could still benefit from NearCpool? If not, then it might make sense to turn it off permanently. Instead of choosing between PC-relative "ldr literal" and far ADR+LDR based on NearCpool, we could decide based on the distance to the metadata table. I believe "ldr literal" only has a 1MB range. > CC @theRealAph That's right. Thanks for pointing that out to me. I have a fix for movoop issue on supports_instruction_patching=false case. Probably it should be considered as a separate change: https://github.com/openjdk/jdk/pull/22448 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1862694312 From amitkumar at openjdk.org Fri Nov 29 01:15:18 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 29 Nov 2024 01:15:18 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: test fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21782/files - new: https://git.openjdk.org/jdk/pull/21782/files/d3181d9f..95a5bfc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=03-04 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From epeter at openjdk.org Fri Nov 29 06:48:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Nov 2024 06:48:43 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 10:52:01 GMT, Jatin Bhateja wrote: > This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 > It adds IR validation tests for newly added saturated vector add / sub operations. Thanks for adding these tests! I agree, it would be nice to get them in for JDK24. If we miss RDP1 we could even consider backporting it. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 257: > 255: } > 256: > 257: public static final String SADD_VB = VECTOR_PREFIX + "SADD_VB" + POSTFIX; Suggestion: public static final String SATURATING_ADD_VB = VECTOR_PREFIX + "SATURATING_ADD_VB" + POSTFIX; I would prefer if this was written out. test/hotspot/jtreg/compiler/vectorapi/VectorSaturatedOperationsTest.java line 26: > 24: /** > 25: * @test > 26: * @bug 8342677 Can you add the bug number of the original feature? Because that is really what we are testing for here. test/hotspot/jtreg/compiler/vectorapi/VectorSaturatedOperationsTest.java line 97: > 95: short_in2[i] = (short)-i; > 96: byte_in1[i] = Byte.MIN_VALUE; > 97: byte_in2[i] = (byte)-i; Are these values sufficient? Example with `SADD_VL`: if the itt elements are `max_value` and `min_value` -> no overflow -> `-1` if the ith elements are `-i` and `i` -> no overflow -> `0` So it seems we are actually not testing the saturation here, am I correct? test/hotspot/jtreg/compiler/vectorapi/VectorSaturatedOperationsTest.java line 200: > 198: > 199: @Test > 200: @IR(counts = {IRNode.SADD_VB, " >0 " , "unsigned_vector_node", " >0 "}, phase = {CompilePhase.BEFORE_MATCHING}, applyIfCPUFeature = {"avx", "true"}) Suggestion: @IR(counts = {IRNode.SADD_VB, " >0 " , "unsigned_vector_node", " >0 "}, phase = {CompilePhase.BEFORE_MATCHING}, applyIfCPUFeature = {"avx", "true"}) I find this generally more readable, than a long line. What is the `unsigned_vector_node` from, i.e. what is the whole line it maches on? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21603#pullrequestreview-2469177514 PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1863014176 PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1863014676 PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1863026590 PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1863022494 From epeter at openjdk.org Fri Nov 29 06:48:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Nov 2024 06:48:44 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 06:28:24 GMT, Emanuel Peter wrote: >> This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 >> It adds IR validation tests for newly added saturated vector add / sub operations. > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 257: > >> 255: } >> 256: >> 257: public static final String SADD_VB = VECTOR_PREFIX + "SADD_VB" + POSTFIX; > > Suggestion: > > public static final String SATURATING_ADD_VB = VECTOR_PREFIX + "SATURATING_ADD_VB" + POSTFIX; > > I would prefer if this was written out. Ah. Now I see that we also named it `VectorOperators.SADD`... hmm. Would have been nice to say what the `S` means here. I would always go with the longer, more explicit name (unless it is really very very long). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1863018535 From kbarrett at openjdk.org Fri Nov 29 07:15:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 07:15:38 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:37:01 GMT, Robbin Ehn wrote: > Change is okay and I sanity tested, all ok. Thanks. > But I must object to keep the underlying type of address. For all our purposes it is an integer value. I.e. `address dest_end = dest->_total_start + dest->_total_size;` Furthermore address 0 is a valid _address_, changing that to nullptr make little sense. 0 is only barely a valid address, and only if very careful. All supported platforms use 0 to represent all null pointers (including null pointer constants). The standards don't guarantee that (neither C nor C++.) (There exist or have existed platforms that used a different representation, but we don't support any of those.) There are certainly places in our code that assume that representation. Trying to avoid that is effectively impossible. It tends to uglify code. And we can't expect testing to uncover any of those places that exist now, or might accidentally be introduced in the future. And tools like ubsan and asan won't help, since on the platforms we support, those tools make the same assumption about the representation of null pointers being 0, and indeed complain about arithmetic involving pointers with a value of 0. > > So I believe the correct fix is to change the type of address. I'm not sure what you mean by this. Do you mean something like "change the type of the argument involved here"? Or do you mean something like "change the alias type `address` to be an integral type (say, `intptr_t`) instead of `char*`"? Neither of those seem like a good idea to me. But maybe you mean something else that I haven't thought of? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22435#issuecomment-2507214458 From duke at openjdk.org Fri Nov 29 07:59:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 29 Nov 2024 07:59:28 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v13] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Improve tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/c34ef039..85d08af1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=11-12 Stats: 16 lines in 7 files changed: 5 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From rcastanedalo at openjdk.org Fri Nov 29 08:00:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 08:00:45 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Test and performance results of this changeset (applied on top of https://github.com/openjdk/jdk/pull/22402/commits/12332f25a551c21ec750857b94480e13f1851011) look good. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2469296180 From duke at openjdk.org Fri Nov 29 08:04:26 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 29 Nov 2024 08:04:26 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v14] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Correctly handle top in unsigned_div_ideal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/85d08af1..35a7f16b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=12-13 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Fri Nov 29 08:15:30 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 29 Nov 2024 08:15:30 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v21] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 27 additional commits since the last revision: - Merge branch 'master' into 8319850 - Add missing header - Update memory management and use treap - Fix style - Derecursify locate - Merge branch 'master' into 8319850 - Change comment style - Change is_enabled to old pattern - Fix TestDuplicatedLateInliningOutput - Fix BCI -1 - ... and 17 more: https://git.openjdk.org/jdk/compare/6168806a...44aabf62 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/5364a488..44aabf62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=19-20 Stats: 22548 lines in 530 files changed: 10874 ins; 8708 del; 2966 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From rcastanedalo at openjdk.org Fri Nov 29 08:35:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 08:35:37 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 16:58:27 GMT, Christian Hagedorn wrote: > * The color is not kept when going to another graph. Hi Christian, can you try [this change](https://github.com/openjdk/jdk/commit/62cc2351ef0cdeaf48bbc0ef5c82501219b9c7c9) and see if it works for you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2507319292 From rehn at openjdk.org Fri Nov 29 08:51:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 29 Nov 2024 08:51:37 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 07:12:40 GMT, Kim Barrett wrote: > > Change is okay and I sanity tested, all ok. > > Thanks. > > > But I must object to keep the underlying type of address. For all our purposes it is an integer value. I.e. `address dest_end = dest->_total_start + dest->_total_size;` Furthermore address 0 is a valid _address_, changing that to nullptr make little sense. > > 0 is only barely a valid address, and only if very careful. In this case we use address 0 for uninitiated immediates value, this is a very valid instructions encoding at least. > > All supported platforms use 0 to represent all null pointers (including null pointer constants). The standards don't guarantee that (neither C nor C++.) (There exist or have existed platforms that used a different representation, but we don't support any of those.) There are certainly places in our code that assume that representation. Trying to avoid that is effectively impossible. It tends to uglify code. And we can't expect testing to uncover any of those places that exist now, or might accidentally be introduced in the future. > > And tools like ubsan and asan won't help, since on the platforms we support, those tools make the same assumption about the representation of null pointers being 0, and indeed complain about arithmetic involving pointers with a value of 0. > It sound like you or on my page, we should use 0 when we mean 0 :) In this particular case we really want to generate code which moves the address **0** into a register. Hence your fix is good as you avoid nullptr by using movptr2. But my concerns was this may not always be possible. > > So I believe the correct fix is to change the type of address. > > I'm not sure what you mean by this. > > Do you mean something like "change the type of the argument involved here"? > > Or do you mean something like "change the alias type `address` to be an integral type (say, `intptr_t`) instead of `char*`"? > > Neither of those seem like a good idea to me. But maybe you mean something else that I haven't thought of? I haven't thought it throught. I just like seeing 0 when I mean the numeric value 0. And seeing nullptr when I mean this pointer doesn't point to anything. Maybe we should not be using the type address at all in some places, I'm not sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22435#issuecomment-2507343023 From chagedorn at openjdk.org Fri Nov 29 08:52:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 08:52:38 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:15:06 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise That works great! Thanks Roberto for fixing that. As discussed offline, we could add a "remove all colors" options in a separate RFE. So, the only minor problem left is the "No..." that does not look like a button and therefore is not suggesting to click it. Maybe you find a fix for that. If not, I guess it's okay to move forward with this patch and come back to it later again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2507345590 From rcastanedalo at openjdk.org Fri Nov 29 08:59:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 08:59:39 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: <83qLWUEfRrHQfHheZnZ_Jv2ADB6-V6lWvzL9E0ebkgY=.7c51f59e-7925-40e7-a31c-40cf33aa19b7@github.com> On Fri, 29 Nov 2024 08:50:26 GMT, Christian Hagedorn wrote: > So, the only minor problem left is the "No..." that does not look like a button and therefore is not suggesting to click it. Maybe you find a fix for that. If not, I guess it's okay to move forward with this patch and come back to it later again. This small adjustment should address that part: https://github.com/openjdk/jdk/commit/a5f6562f86481316ae36fc629bb9d5efae7f3e2f. Could you please try it out? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2507355768 From amitkumar at openjdk.org Fri Nov 29 09:09:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 29 Nov 2024 09:09:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Wed, 27 Nov 2024 03:42:15 GMT, Amit Kumar wrote: >> We can ask @bulasevich (also see https://wiki.openjdk.org/display/HotSpot/Ports). > > Should I revert arm32 changes ? Maybe it can be done with another JBS issue. @shipilev can you do build+test on arm32, please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1863190848 From duke at openjdk.org Fri Nov 29 09:11:33 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 29 Nov 2024 09:11:33 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v22] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/44aabf62..d0f02890 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=20-21 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From rcastanedalo at openjdk.org Fri Nov 29 09:31:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 09:31:38 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:15:06 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > So, the only minor problem left is the "No..." that does not look like a button and therefore is not suggesting to click it. Maybe you find a fix for that. If not, I guess it's okay to move forward with this patch and come back to it later again. > > This small adjustment should address that part: [a5f6562](https://github.com/openjdk/jdk/commit/a5f6562f86481316ae36fc629bb9d5efae7f3e2f). Could you please try it out? Here's a revised version of the adjustment that also makes the button look clickable (thanks to @chhagedorn for trying out and adjusting the button settings to prevent label truncation and ugly button focus shape): https://github.com/openjdk/jdk/commit/aba8d3e253cc2fd7c33d0d00ea9f014fdaa37e3d. @tobiasholenstein feel free to incorporate and edit to your liking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2507411406 From tholenstein at openjdk.org Fri Nov 29 09:59:13 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 09:59:13 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v2] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - patch2: fix botton on linux - patch1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22440/files - new: https://git.openjdk.org/jdk/pull/22440/files/f0de94b7..e58c6171 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=00-01 Stats: 20 lines in 3 files changed: 16 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22440/head:pull/22440 PR: https://git.openjdk.org/jdk/pull/22440 From chagedorn at openjdk.org Fri Nov 29 09:59:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 09:59:13 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v2] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:56:38 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 >> >> [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. >> The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. >> >> color >> >> ### Whats new >> - Now colors are saved with the XML as well >> - Colors are kept when changing to a different graph >> - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - patch2: fix botton on linux > - patch1 Thanks for the update, it works now as expected! :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22440#pullrequestreview-2469534514 From rcastanedalo at openjdk.org Fri Nov 29 10:05:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 10:05:38 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v2] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:59:13 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 >> >> [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. >> The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. >> >> color >> >> ### Whats new >> - Now colors are saved with the XML as well >> - Colors are kept when changing to a different graph >> - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - patch2: fix botton on linux > - patch1 Looks good, making custom colors persistent really improves the usability of this feature! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22440#pullrequestreview-2469549831 From bulasevich at openjdk.org Fri Nov 29 10:16:39 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 29 Nov 2024 10:16:39 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Fri, 29 Nov 2024 09:06:43 GMT, Amit Kumar wrote: >> Should I revert arm32 changes ? Maybe it can be done with another JBS issue. > > @shipilev can you do build+test on arm32, please? Sorry for delay. I will check arm32. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1863287356 From tholenstein at openjdk.org Fri Nov 29 10:54:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 10:54:58 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v5] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: add setLayoutSelfEdges() option and enable for CFG layout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/12332f25..2ee3fa9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=03-04 Stats: 120 lines in 4 files changed: 98 ins; 19 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From tholenstein at openjdk.org Fri Nov 29 11:00:18 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 11:00:18 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v6] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: re-add select edges nodes by clicking on edge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/2ee3fa9d..0023ec4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=04-05 Stats: 28 lines in 1 file changed: 28 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From tholenstein at openjdk.org Fri Nov 29 11:18:39 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 11:18:39 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v2] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 09:17:11 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed graph objects equality > > src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/HierarchicalLayoutManager.java line 75: > >> 73: >> 74: static public void apply(LayoutGraph graph) { >> 75: removeSelfEdges(graph); > > The proposed code removes self-edges unconditionally. This is OK for the sea-of-nodes layout, but for the CFG layout we do need to draw self-edges (think about single basic block loops). Here is an (artificially edited) example of how the new algorithm misses drawing a self-edge for B7 (left is current IGV, right is IGV with your proposed changes): > > ![Screenshot from 2024-11-28 10-09-09](https://github.com/user-attachments/assets/e91b55af-0fbf-4f28-b7e6-558dfcada42a) > > Here is the artificially edited graph file that illustrates the issue: [self-edges.zip](https://github.com/user-attachments/files/17945729/self-edges.zip) should be fixed now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22402#discussion_r1863369559 From rcastanedalo at openjdk.org Fri Nov 29 11:26:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 11:26:38 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v6] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <0WfYkbc3a4gpk-lZlvTQVvwfaRUQClQ6g1H_4tv9_b0=.0247bb39-606b-45ea-9bc1-7df8a27ae180@github.com> On Fri, 29 Nov 2024 11:00:18 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > re-add select edges nodes by clicking on edge Thanks for addressing my comments Toby, looks good! I ran stress testing again and could not find any issue. I have **not** re-run performance testing on the latest changes but I don't think it should be necessary since they are restricted to the CFG view. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2469702834 From tholenstein at openjdk.org Fri Nov 29 11:40:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 11:40:53 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v2] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'pr/22402' into JDK-8343705 - Merge branch 'pr/22402' into JDK-8343705 - 8343705: IGV: Interactive Node Moving in Hierarchical Layout ------------- Changes: https://git.openjdk.org/jdk/pull/22430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=01 Stats: 1300 lines in 9 files changed: 1287 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430 PR: https://git.openjdk.org/jdk/pull/22430 From rcastanedalo at openjdk.org Fri Nov 29 11:44:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 11:44:39 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v2] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 11:40:53 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: >> `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` >> `git checkout pull/22430` >> >> This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. >> >> ## Overview >> >> Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: >> - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. >> - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. >> - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. >> - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. >> >> ## Limitations >> - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) >> - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. >> - To move long straight edges, it's best to drag the top part of the edges around. >> When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. >> >> ## Main Changes >> ### LayoutMover Interface >> Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. >> `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. >> >> ### Enhancements to HierarchicalLayoutManager >> Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Als... > > Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'pr/22402' into JDK-8343705 > - Merge branch 'pr/22402' into JDK-8343705 > - 8343705: IGV: Interactive Node Moving in Hierarchical Layout I re-tested this change after merging the latest changes from #22402 and did not find any issue. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2469731586 From duke at openjdk.org Fri Nov 29 11:53:59 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 29 Nov 2024 11:53:59 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/35a7f16b..ee7f3b35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=13-14 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From rcastanedalo at openjdk.org Fri Nov 29 11:57:39 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 11:57:39 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: <7O9ExyQAgO4J7praZXQ5pW4tWH2bXdmjmYoypVPLYQU=.a82f0fb8-bfc4-47e3-b636-b05c0b7f7e38@github.com> On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) I re-tested this change after merging the latest changes from https://github.com/openjdk/jdk/pull/22402 and did not find any issue. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2469752699 From tholenstein at openjdk.org Fri Nov 29 12:00:17 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 12:00:17 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v7] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: updated Class comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/0023ec4f..70724601 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=05-06 Stats: 25 lines in 4 files changed: 15 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From rcastanedalo at openjdk.org Fri Nov 29 12:00:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 12:00:17 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v7] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Fri, 29 Nov 2024 11:57:19 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > updated Class comments Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2469755149 From tholenstein at openjdk.org Fri Nov 29 12:00:17 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 12:00:17 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <9oL8S-PC-j-Q0QPz-ex2_r31LX1c3akJZ_O1jrt_YhQ=.10a5784b-2a04-4fca-a9b8-a49c96faaf84@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> <9oL8S-PC-j-Q0QPz-ex2_r31LX1c3akJZ_O1jrt_YhQ=.10a5784b-2a04-4fca-a9b8-a49c96faaf84@github.com> Message-ID: On Thu, 28 Nov 2024 07:41:13 GMT, Christian Hagedorn wrote: > Drive-by comment: You could update the class comments for the four classes described above in the PR description with the actual PR descriptions which are more detailed than the current class comments found in the code. Good idea. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2507663790 From rcastanedalo at openjdk.org Fri Nov 29 12:09:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 29 Nov 2024 12:09:38 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v2] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 09:59:13 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 >> >> [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. >> The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. >> >> color >> >> ### Whats new >> - Now colors are saved with the XML as well >> - Colors are kept when changing to a different graph >> - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - patch2: fix botton on linux > - patch1 I re-tested this change after merging the latest changes from https://github.com/openjdk/jdk/pull/22402 and did not find any issue. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22440#pullrequestreview-2469772115 From amitkumar at openjdk.org Fri Nov 29 12:15:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 29 Nov 2024 12:15:09 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' Message-ID: fixes the issue reported by ubsan. ------------- Commit messages: - updates instruction Changes: https://git.openjdk.org/jdk/pull/22456/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344304 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From simonis at openjdk.org Fri Nov 29 12:33:41 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 29 Nov 2024 12:33:41 GMT Subject: Integrated: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control In-Reply-To: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> References: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Message-ID: On Thu, 21 Nov 2024 16:34:12 GMT, Volker Simonis wrote: > Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. > > Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. > > When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). > > The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). This pull request has now been integrated. Changeset: 6bea1b6c Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/6bea1b6cf1f64ce06c2028fe4dbc44f70778168f Stats: 19 lines in 3 files changed: 19 ins; 0 del; 0 mod 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/22295 From tholenstein at openjdk.org Fri Nov 29 12:54:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 12:54:26 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v8] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/70724601..2433d7ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=06-07 Stats: 24 lines in 24 files changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From tholenstein at openjdk.org Fri Nov 29 13:03:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 13:03:23 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v9] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <611MG3PK-juXzaWCSO4Y26YUPRit8eK4KJgyVaARYPs=.30566b64-bc88-4977-ba26-f971d48f2d98@github.com> > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove TODO ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/2433d7ac..47aea83d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From tholenstein at openjdk.org Fri Nov 29 13:40:29 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 13:40:29 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v10] In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: ClusterNode: implement missing default value for Point and missing equals and hashCode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22402/files - new: https://git.openjdk.org/jdk/pull/22402/files/47aea83d..77b6884d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22402&range=08-09 Stats: 38 lines in 3 files changed: 38 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22402/head:pull/22402 PR: https://git.openjdk.org/jdk/pull/22402 From chagedorn at openjdk.org Fri Nov 29 13:45:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 13:45:40 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v10] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Fri, 29 Nov 2024 13:40:29 GMT, Tobias Holenstein wrote: >> This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, >> >> ### LayoutGraph >> The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. >> >> ### LayoutLayer >> The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. >> >> ### LayoutNode >> The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. >> >> ### LayoutEdge >> The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > ClusterNode: implement missing default value for Point and missing equals and hashCode Looks good now and is working as expected! Thanks for including the last updates. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22402#pullrequestreview-2469950640 From kbarrett at openjdk.org Fri Nov 29 13:52:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 13:52:42 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' In-Reply-To: References: Message-ID: <1PSt0vETd6HA2UFCDJbHadPyiNmHeqJ-t-FuHriAU4k=.73971a5d-f7ac-4df3-9be6-8e7e3e419939@github.com> On Fri, 29 Nov 2024 11:08:45 GMT, Amit Kumar wrote: > fixes the issue reported by ubsan. src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 1542: > 1540: __ z_slfi(lreg, c); > 1541: } > 1542: break; Would it be simpler to use `java_negate(c)` (from globalDefinitions.hpp)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1863554225 From tholenstein at openjdk.org Fri Nov 29 13:55:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 13:55:46 GMT Subject: RFR: 8314512: IGV: clean up hierarchical layout code [v3] In-Reply-To: References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: On Thu, 28 Nov 2024 10:39:58 GMT, Roberto Casta?eda Lozano wrote: >> In the "sea of nodes" view, clicking on an edge used to select its connected nodes, but not after this change. > >> In the "sea of nodes" view, clicking on an edge used to select its connected nodes, but not after this change. > > Here is a patch on top of this PR that re-introduces the missing functionality: https://github.com/openjdk/jdk/commit/389ab05ed5505930fbcda7c864316567f3e0ff08. Fee free to incorporate it into this PR. Thanks for the intensive testing and reviewing @robcasloz and @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/22402#issuecomment-2507863620 From tholenstein at openjdk.org Fri Nov 29 13:55:48 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 13:55:48 GMT Subject: Integrated: 8314512: IGV: clean up hierarchical layout code In-Reply-To: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> References: <36PTCT4kayhnHZPjFN9aymUFsu5-Ayzu3Kv5Or4dcYg=.5f99538b-f5bd-4823-abd4-33509e7e5376@github.com> Message-ID: <9OUq9TmDdu0AHFyEHlAPHaKt6YToY0mmZUqyPk30uqM=.0b73b086-011a-48aa-b13f-c80c4155d571@github.com> On Tue, 26 Nov 2024 23:17:15 GMT, Tobias Holenstein wrote: > This refactoring enhances layer and node management by encapsulating computations within LayoutGraph, LayoutLayer, and LayoutNode, improving modularity and delegation of responsibilities for handling dummy nodes and self-edges to dedicated utility classes. Code duplication was reduced by consolidating common logic, such as edge reversal, into reusable methods, and optimizing processes like updating node positions and crossings. Additionally, method names were updated to better reflect their functionality, > > ### LayoutGraph > The LayoutGraph class is responsible for organizing and arranging a graph's nodes and edges for visual display. It takes a collection of nodes (Vertex) and connections between them (Link) and structures them into layers, creating a hierarchical layout. The class handles complexities like edges that span multiple layers by inserting temporary "dummy" nodes to maintain a clear hierarchy. This organization helps ensure that when the graph is displayed, it is easy to understand and visually coherent, making the relationships between nodes clear and straightforward. > > ### LayoutLayer > The LayoutLayer class represents a single horizontal layer in a hierarchical graph layout. It holds a list of nodes (LayoutNode) that are all on the same vertical level. This class provides simple methods to manage these nodes: you can add nodes to the layer, calculate the maximum height needed to fit all nodes, center the nodes vertically within the layer, and set their horizontal positions with proper spacing. In essence, LayoutLayer helps organize nodes neatly in a graph, making it easier to display the graph clearly and understand the relationships between nodes. > > ### LayoutNode > The LayoutNode class represents a node in a hierarchical graph layout. It can be either an actual node from the original graph or a temporary "dummy" node added during the layout process to handle complex edge connections. This class stores important layout information like the node's position (x and y coordinates), size (width and height), layer level, and connections to other nodes through incoming and outgoing edges. It provides methods to calculate optimal positions, manage margins, and handle reversed edges, all aimed at arranging the nodes neatly in layers to create a clear and visually organized graph display. > > ### LayoutEdge > The LayoutEdge class represents a connection between two nodes (LayoutNode) in a hierarchical graph layout. It stores information about the starting node (f... This pull request has now been integrated. Changeset: 4da7c354 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/4da7c3548436ffffb009828891df0d13d47370e3 Stats: 5119 lines in 41 files changed: 1971 ins; 2135 del; 1013 mod 8314512: IGV: clean up hierarchical layout code Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22402 From amitkumar at openjdk.org Fri Nov 29 14:00:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 29 Nov 2024 14:00:42 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' In-Reply-To: <1PSt0vETd6HA2UFCDJbHadPyiNmHeqJ-t-FuHriAU4k=.73971a5d-f7ac-4df3-9be6-8e7e3e419939@github.com> References: <1PSt0vETd6HA2UFCDJbHadPyiNmHeqJ-t-FuHriAU4k=.73971a5d-f7ac-4df3-9be6-8e7e3e419939@github.com> Message-ID: On Fri, 29 Nov 2024 13:50:00 GMT, Kim Barrett wrote: >> fixes the issue reported by ubsan. > > src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 1542: > >> 1540: __ z_slfi(lreg, c); >> 1541: } >> 1542: break; > > Would it be simpler to use `java_negate(c)` (from globalDefinitions.hpp)? Not sure of that actually. I didn't even know that there exists such helper method. Thanks for making me aware. I updated current solution in accordance with GCC compiler. So Z don't have a `shi` instruction which can handle 16-bit numbers, so GCC negates the number and adds it with `ahi` instruction. Then for number upto 32bits, `slfi` instruction is emitted for subtraction. @RealLucy do you have other thoughts on this ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1863564531 From tholenstein at openjdk.org Fri Nov 29 14:15:51 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 14:15:51 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v3] In-Reply-To: References: Message-ID: <0LDtkNMaV5f0uRF2MnhY9aZGh34hWR9FeShxJJJG01s=.bc002727-7f02-4d51-b813-7d5e357539e0@github.com> > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge branch 'master' into JDK-8343705 - Merge branch 'pr/22402' into JDK-8343705 - re-add select edges nodes by clicking on edge - add setLayoutSelfEdges() option and enable for CFG layout - undo adding equals functions to Slots - Merge branch 'pr/22402' into JDK-8343705 - revert copyright changes - 8343705: IGV: Interactive Node Moving in Hierarchical Layout - fixed graph objects equality - remove executability of igv.sh - ... and 11 more: https://git.openjdk.org/jdk/compare/4da7c354...64cdbdf4 ------------- Changes: https://git.openjdk.org/jdk/pull/22430/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=02 Stats: 1360 lines in 9 files changed: 1336 ins; 10 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/22430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430 PR: https://git.openjdk.org/jdk/pull/22430 From tholenstein at openjdk.org Fri Nov 29 14:32:57 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 14:32:57 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v4] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - trailing whitespace - fix after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22430/files - new: https://git.openjdk.org/jdk/pull/22430/files/64cdbdf4..cd2b0493 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=02-03 Stats: 60 lines in 3 files changed: 7 ins; 50 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430 PR: https://git.openjdk.org/jdk/pull/22430 From tholenstein at openjdk.org Fri Nov 29 14:37:31 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 14:37:31 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v3] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'master' into JDK-8345039 - patch2: fix botton on linux - patch1 - JDK-8345039: IGV: save user-defined node colors to XML - revert copyright changes - fixed graph objects equality - remove executability of igv.sh - update Figure height calculation for Slots - run IGV without asserts - batch add connectionLayer.addChildren(newWidgets); - ... and 8 more: https://git.openjdk.org/jdk/compare/df08c533...001a073a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22440/files - new: https://git.openjdk.org/jdk/pull/22440/files/e58c6171..001a073a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=01-02 Stats: 62313 lines in 1516 files changed: 28722 ins; 26121 del; 7470 mod Patch: https://git.openjdk.org/jdk/pull/22440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22440/head:pull/22440 PR: https://git.openjdk.org/jdk/pull/22440 From tholenstein at openjdk.org Fri Nov 29 14:43:11 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 14:43:11 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v4] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Update Slot.java - Update OutputSlot.java - Update InputSlot.java - Merge branch 'master' into JDK-8345039 - Merge branch 'master' into JDK-8345039 - patch2: fix botton on linux - patch1 - JDK-8345039: IGV: save user-defined node colors to XML - revert copyright changes - fixed graph objects equality - ... and 12 more: https://git.openjdk.org/jdk/compare/4da7c354...d2134dbb ------------- Changes: https://git.openjdk.org/jdk/pull/22440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=03 Stats: 96 lines in 12 files changed: 84 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22440/head:pull/22440 PR: https://git.openjdk.org/jdk/pull/22440 From tholenstein at openjdk.org Fri Nov 29 14:50:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 14:50:16 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v5] In-Reply-To: References: Message-ID: <8vVoiCbi5-XQKtz4x6TqtMx-Wan4KBgu-vqHGluJ87c=.7141811d-db6b-4e87-84f7-e445c67291c7@github.com> > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: - fixes after merge - reverts - readd imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22440/files - new: https://git.openjdk.org/jdk/pull/22440/files/d2134dbb..0bddea06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22440&range=03-04 Stats: 21 lines in 7 files changed: 4 ins; 15 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22440/head:pull/22440 PR: https://git.openjdk.org/jdk/pull/22440 From dfenacci at openjdk.org Fri Nov 29 14:59:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Nov 2024 14:59:38 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v2] In-Reply-To: <3A8oM7u8ktLs3B52t9Ik1Le5Oc2TZkTQcmWrxmkzFnc=.d14a4920-1fa5-4063-bc07-80dbdd340899@github.com> References: <3A8oM7u8ktLs3B52t9Ik1Le5Oc2TZkTQcmWrxmkzFnc=.d14a4920-1fa5-4063-bc07-80dbdd340899@github.com> Message-ID: <0wQgLXwbTvGJeafaJOKoZmRsx-vX1s9t2xdMa6F8A2A=.d3fffdaf-ce00-4246-a6c8-095cc00e6f3b@github.com> On Thu, 28 Nov 2024 11:07:13 GMT, Aleksey Shipilev wrote: >> Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Tja Thanks for cleaning up. I happened to notice that for SSE2 most asserts are `NOT_LP64` https://github.com/openjdk/jdk/blob/02db63862e5b3a85cee105538ee2c52f9e64d353/src/hotspot/cpu/x86/assembler_x86.cpp#L1538 but not all, e.g. https://github.com/openjdk/jdk/blob/02db63862e5b3a85cee105538ee2c52f9e64d353/src/hotspot/cpu/x86/assembler_x86.cpp#L4617 Do you know if there is a reason for that difference? (in the end it just results in a superfluous assert for x64 but maybe we should be consistent...) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22432#issuecomment-2507973623 From shade at openjdk.org Fri Nov 29 15:07:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 29 Nov 2024 15:07:37 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v2] In-Reply-To: <0wQgLXwbTvGJeafaJOKoZmRsx-vX1s9t2xdMa6F8A2A=.d3fffdaf-ce00-4246-a6c8-095cc00e6f3b@github.com> References: <3A8oM7u8ktLs3B52t9Ik1Le5Oc2TZkTQcmWrxmkzFnc=.d14a4920-1fa5-4063-bc07-80dbdd340899@github.com> <0wQgLXwbTvGJeafaJOKoZmRsx-vX1s9t2xdMa6F8A2A=.d3fffdaf-ce00-4246-a6c8-095cc00e6f3b@github.com> Message-ID: On Fri, 29 Nov 2024 14:56:40 GMT, Damon Fenacci wrote: > I happened to notice that for SSE2 most asserts are `NOT_LP64` Right, that is because our x86_64 is baselined to have at least SSE2: https://github.com/openjdk/jdk/blob/4da7c3548436ffffb009828891df0d13d47370e3/src/hotspot/cpu/x86/vm_version_x86.cpp#L896-L903 Therefore, checking for SSE <= 2 anywhere else in x86_64 is redundant, but harmless. I guess you need to remember this every time an assert is added, so some leakage happens every so often. I can wrap the currently exposed sse1/2 checks in `NOT_LP64`, if you want. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22432#issuecomment-2507997403 From tholenstein at openjdk.org Fri Nov 29 15:13:58 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:13:58 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v5] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/LayoutGraph.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22430/files - new: https://git.openjdk.org/jdk/pull/22430/files/cd2b0493..23781f5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22430&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22430.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430 PR: https://git.openjdk.org/jdk/pull/22430 From chagedorn at openjdk.org Fri Nov 29 15:13:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 15:13:58 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v4] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 14:32:57 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: >> `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` >> `git checkout pull/22430` >> >> This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. >> >> ## Overview >> >> Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: >> - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. >> - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. >> - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. >> - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. >> >> ## Limitations >> - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) >> - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. >> - To move long straight edges, it's best to drag the top part of the edges around. >> When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. >> >> ## Main Changes >> ### LayoutMover Interface >> Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. >> `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. >> >> ### Enhancements to HierarchicalLayoutManager >> Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Als... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - trailing whitespace > - fix after merge Great work! This is a really nice feature. I've tested it on Linux and it works as expected. src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/LayoutGraph.java line 2: > 1: /* > 2: * Copyright (c) 2008, 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2470099634 PR Review Comment: https://git.openjdk.org/jdk/pull/22430#discussion_r1863645065 From chagedorn at openjdk.org Fri Nov 29 15:13:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 15:13:58 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v5] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 15:11:15 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: >> `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` >> `git checkout pull/22430` >> >> This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. >> >> ## Overview >> >> Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: >> - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. >> - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. >> - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. >> - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. >> >> ## Limitations >> - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) >> - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. >> - To move long straight edges, it's best to drag the top part of the edges around. >> When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. >> >> ## Main Changes >> ### LayoutMover Interface >> Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. >> `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. >> >> ### Enhancements to HierarchicalLayoutManager >> Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Als... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/LayoutGraph.java > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22430#pullrequestreview-2470121770 From tholenstein at openjdk.org Fri Nov 29 15:14:00 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:14:00 GMT Subject: RFR: 8343705: IGV: Interactive Node Moving in Hierarchical Layout [v2] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 11:41:51 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge branch 'pr/22402' into JDK-8343705 >> - Merge branch 'pr/22402' into JDK-8343705 >> - 8343705: IGV: Interactive Node Moving in Hierarchical Layout > > I re-tested this change after merging the latest changes from #22402 and did not find any issue. Thanks @robcasloz , @TobiHartmann and @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22430#issuecomment-2508000392 From qamai at openjdk.org Fri Nov 29 15:16:43 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 29 Nov 2024 15:16:43 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: <0YHdBs1x-Ee10sQ3AaHeVKGXLvj-CnyRBWJsPbJyDGk=.ce4bad79-3012-4452-b078-44fa60c504a0@github.com> On Fri, 29 Nov 2024 11:53:59 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2470143939 From shade at openjdk.org Fri Nov 29 15:17:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 29 Nov 2024 15:17:00 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v3] In-Reply-To: References: Message-ID: > Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Wrap more SSE 1/2 asserts in NOT_LP64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22432/files - new: https://git.openjdk.org/jdk/pull/22432/files/02db6386..b27b9ef6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22432&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22432&range=01-02 Stats: 14 lines in 2 files changed: 0 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22432/head:pull/22432 PR: https://git.openjdk.org/jdk/pull/22432 From chagedorn at openjdk.org Fri Nov 29 15:18:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 29 Nov 2024 15:18:37 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML [v5] In-Reply-To: <8vVoiCbi5-XQKtz4x6TqtMx-Wan4KBgu-vqHGluJ87c=.7141811d-db6b-4e87-84f7-e445c67291c7@github.com> References: <8vVoiCbi5-XQKtz4x6TqtMx-Wan4KBgu-vqHGluJ87c=.7141811d-db6b-4e87-84f7-e445c67291c7@github.com> Message-ID: On Fri, 29 Nov 2024 14:50:16 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 >> >> [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. >> The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. >> >> color >> >> ### Whats new >> - Now colors are saved with the XML as well >> - Colors are kept when changing to a different graph >> - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: > > - fixes after merge > - reverts > - readd imports This is a nice enhancement! As discussed offline, we could follow up with RFEs (if wanted) to handle: - Save colors added in the diff view when having graph "X vs. X + 4" opened and going to the next graph, e.g. "X vs X + 3". It currently drops the color. - When cloning a graph, making the coloring local and not globally per XML. Now when you color a node in one of the graphs, it will be applied to the original and the cloned graph. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22440#pullrequestreview-2470146316 From tholenstein at openjdk.org Fri Nov 29 15:18:37 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:18:37 GMT Subject: RFR: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 08:50:26 GMT, Christian Hagedorn wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22402 >> >> [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. >> The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. >> >> color >> >> ### Whats new >> - Now colors are saved with the XML as well >> - Colors are kept when changing to a different graph >> - The user can remove the color again: This uses the color from the filter or WHITE otherwise > > That works great! Thanks Roberto for fixing that. As discussed offline, we could add a "remove all colors" options in a separate RFE. > > So, the only minor problem left is the "No..." that does not look like a button and therefore is not suggesting to click it. Maybe you find a fix for that. If not, I guess it's okay to move forward with this patch and come back to it later again. thanks @chhagedorn , @robcasloz and @eme64 for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22440#issuecomment-2508015798 From shade at openjdk.org Fri Nov 29 15:19:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 29 Nov 2024 15:19:39 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v2] In-Reply-To: References: <3A8oM7u8ktLs3B52t9Ik1Le5Oc2TZkTQcmWrxmkzFnc=.d14a4920-1fa5-4063-bc07-80dbdd340899@github.com> <0wQgLXwbTvGJeafaJOKoZmRsx-vX1s9t2xdMa6F8A2A=.d3fffdaf-ce00-4246-a6c8-095cc00e6f3b@github.com> Message-ID: On Fri, 29 Nov 2024 15:05:14 GMT, Aleksey Shipilev wrote: > I can wrap the currently exposed sse1/2 checks in `NOT_LP64`, if you want. On a second thought, this would allow me to eliminate these asserts when removing x86_32. I basically grepped around for various `support_sse` and made sure every "1","2" check is wrapped in `NOT_LP64`, and every other check is not wrapped in it. See new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22432#issuecomment-2508016283 From tholenstein at openjdk.org Fri Nov 29 15:19:47 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:19:47 GMT Subject: Integrated: 8343705: IGV: Interactive Node Moving in Hierarchical Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 08:57:08 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 . To check out this PR locally: > `git fetch https://git.openjdk.org/jdk.git pull/22430/head:pull/22430` > `git checkout pull/22430` > > This pull request enhances the Ideal Graph Visualizer (IGV) by introducing an interactive feature that allows users to move nodes within the hierarchical layout by dragging them to new positions. This manual adjustment helps users better understand and explore the graph structure by customizing the layout according to their needs. > > ## Overview > > Previously, the hierarchical layout in IGV was static, and users could not adjust node positions manually. This limitation made it challenging to reorganize the graph for improved readability or to focus on specific areas of interest. With this enhancement is new: > - Interactive Node Movement: Users can now click and drag nodes to new positions within the graph. > - Dynamic Edge Adjustment: When nodes are moved, connected edges adjust dynamically to maintain the graph's structure. > - Layer Management: Nodes can be moved within the same layer or across different layers, with the layout updating accordingly. > - Persistent Positions: Moved nodes remain in their new positions until the layout is reset or the nodes are moved again. > > ## Limitations > - Interactive Node Moving only works in `Sea of nodes` view with `cut long edges` off. (The standard option for IGV) > - Currently, no empty layers are allowed in the layout, so users cannot introduce a horizontal gap between nodes that only contains edges. > - To move long straight edges, it's best to drag the top part of the edges around. > When the graph changes - for example, when nodes are removed or hidden, or layers are applied - the rearrangements are lost since the graph gets re-laid out. To preserve rearrangements, support for a stable incremental layout algorithm would be needed. > > ## Main Changes > ### LayoutMover Interface > Created a new interface `LayoutMover` with methods `moveVertex`, `moveVertices`, and `moveLink`. > `HierarchicalLayoutManager` now implements `LayoutMover`, providing concrete implementations for these methods. > > ### Enhancements to HierarchicalLayoutManager > Improved the `HierarchicalLayoutManager` so it can handle moving nodes interactively. Added methods to move single nodes or multiple nodes, and to adjust links. Nodes can now be moved within the same layer or to different layers while keeping the graph consistent. Also added a `writeBack` method to apply these changes. > > ... This pull request has now been integrated. Changeset: 28b0f3ea Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/28b0f3eaa55a1718e8e725516e64c8e25734f97b Stats: 1301 lines in 9 files changed: 1286 ins; 3 del; 12 mod 8343705: IGV: Interactive Node Moving in Hierarchical Layout Reviewed-by: chagedorn, thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22430 From tholenstein at openjdk.org Fri Nov 29 15:24:44 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:24:44 GMT Subject: Integrated: 8345039: IGV: save user-defined node colors to XML In-Reply-To: References: Message-ID: <8dLI0SV937CFjYWrfWeKgtihJcdqVX2hZzHPu-gAx1Q=.962ba9e9-8ca0-45a7-bfc2-3bbb7b20a5b2@github.com> On Thu, 28 Nov 2024 15:15:06 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22402 > > [JDK-8343535](https://bugs.openjdk.org/browse/JDK-8343535) introduced the possibility to give user-defined colors to a node. > The colors are lost when the user goes to the next graph or when IGV is closed. Save the colors as a property of the graph to the XML to make it more permanent. This requires that the user can also remove the colors again. > > color > > ### Whats new > - Now colors are saved with the XML as well > - Colors are kept when changing to a different graph > - The user can remove the color again: This uses the color from the filter or WHITE otherwise This pull request has now been integrated. Changeset: a80ccf2c Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/a80ccf2cd2792c24b51f1143cb0e6c5b036c5b28 Stats: 75 lines in 6 files changed: 69 ins; 0 del; 6 mod 8345039: IGV: save user-defined node colors to XML Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Christian Hagedorn Reviewed-by: chagedorn, epeter, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22440 From tholenstein at openjdk.org Fri Nov 29 15:25:14 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 15:25:14 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v2] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - JDK-8345041 IGV: Free Placement Mode in IGV Layout fix - Merge branch 'pr/22402' into JDK-8343705 - revert copyright changes - 8343705: IGV: Interactive Node Moving in Hierarchical Layout - fixed graph objects equality - remove executability of igv.sh - update Figure height calculation for Slots - run IGV without asserts - batch add connectionLayer.addChildren(newWidgets); - remove dead code in LineWidget - ... and 7 more: https://git.openjdk.org/jdk/compare/b9c6ce90...4f7ca8ee ------------- Changes: https://git.openjdk.org/jdk/pull/22438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=01 Stats: 6880 lines in 47 files changed: 3738 ins; 2148 del; 994 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From dfenacci at openjdk.org Fri Nov 29 15:37:37 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 29 Nov 2024 15:37:37 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v3] In-Reply-To: References: Message-ID: <5Gs9VuRANHx8wGDXFaWEUKyFhvsJhuhJVmfvl4r3fDU=.355ca960-e8bc-49f7-bf0e-41791cd8a67a@github.com> On Fri, 29 Nov 2024 15:17:00 GMT, Aleksey Shipilev wrote: >> Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap more SSE 1/2 asserts in NOT_LP64 Marked as reviewed by dfenacci (Committer). Cool! Thank you! ------------- PR Review: https://git.openjdk.org/jdk/pull/22432#pullrequestreview-2470185158 PR Comment: https://git.openjdk.org/jdk/pull/22432#issuecomment-2508040455 From tholenstein at openjdk.org Fri Nov 29 16:52:54 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 16:52:54 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v3] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Update DiagramScene.java - Update HierarchicalLayoutManager.java - Update Slot.java - Update OutputSlot.java - Update InputSlot.java - missing import after merge - Merge branch 'master' into JDK-8345041 - JDK-8345041 IGV: Free Placement Mode in IGV Layout fix - Merge branch 'pr/22402' into JDK-8343705 - revert copyright changes - ... and 14 more: https://git.openjdk.org/jdk/compare/a80ccf2c...43fb3612 ------------- Changes: https://git.openjdk.org/jdk/pull/22438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=02 Stats: 587 lines in 11 files changed: 573 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From tholenstein at openjdk.org Fri Nov 29 17:03:54 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 17:03:54 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v4] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - missing - finish merge of master ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/43fb3612..5b6923d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=02-03 Stats: 16 lines in 1 file changed: 7 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From kbarrett at openjdk.org Fri Nov 29 17:08:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 29 Nov 2024 17:08:38 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 08:48:42 GMT, Robbin Ehn wrote: > > > Change is okay and I sanity tested, all ok. Thanks. > In this particular case we really want to generate code which moves the address **0** into a register. Hence your fix is good as you avoid nullptr by using movptr2. But my concerns was this may not always be possible. Actually, we want to generate code which moves the address **don't care** into a register, planning to patch the address with the correct value later. Looking at how patching is done, we could use any value at all here, as the patching overwrites whatever it finds. A different approach that I think also works is to change the literal 0 to `(address)uint(0)` so we're not casting a "literal 0". I think using movptr2 directly rather than sticking with movptr with one of the above adjustments is better. But this code belongs to you folks, so whatever you think is best. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22435#issuecomment-2508165768 From tholenstein at openjdk.org Fri Nov 29 17:18:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 17:18:23 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v5] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: prevent divison by zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/5b6923d4..66ad81f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=03-04 Stats: 39 lines in 1 file changed: 28 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From tholenstein at openjdk.org Fri Nov 29 17:18:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 17:18:23 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v2] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:52:49 GMT, Emanuel Peter wrote: >> src/utils/IdealGraphVisualizer/HierarchicalLayout/src/main/java/com/sun/hotspot/igv/hierarchicallayout/FreeInteractiveLayoutManager.java line 219: >> >>> 217: >>> 218: double deltaX = posX - otherNode.getX(); >>> 219: double deltaY = posY - otherNode.getY(); >> >> What happens if this distance is zero? Does the division below behave ok? > > If we get issues here, we can always check for zero and add some random non-zero noise to force different position to get the two nodes to separate in a sane way. right! It should be fixed now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22438#discussion_r1863794505 From tholenstein at openjdk.org Fri Nov 29 17:23:38 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 17:23:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v5] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 17:18:23 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > prevent divison by zero @chhagedorn reported the following bug: Click `dynamic layout` -> `show all nodes` -> Bug: showing only lines [broken.xml.zip](https://github.com/user-attachments/files/17961889/broken.xml.zip) Screenshot 2024-11-29 at 18 21 00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22438#issuecomment-2508181402 From bulasevich at openjdk.org Fri Nov 29 22:53:39 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 29 Nov 2024 22:53:39 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Fri, 29 Nov 2024 10:14:22 GMT, Boris Ulasevich wrote: >> @shipilev can you do build+test on arm32, please? > > Sorry for delay. I will check arm32. All right. The ARM32 build is fine and jtreg testing shows no regressions. Thanks for fixing this for all ports! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1863979557 From tholenstein at openjdk.org Fri Nov 29 23:00:59 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 23:00:59 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v6] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: layoutNode.setVertex(vertex) and LayoutNode updateSize() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/66ad81f3..2847b60f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=04-05 Stats: 20 lines in 2 files changed: 14 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From tholenstein at openjdk.org Fri Nov 29 23:10:27 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 29 Nov 2024 23:10:27 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v7] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - trailing whitespace - function comment added for setLinkControlPoints ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/2847b60f..ce9720d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=05-06 Stats: 27 lines in 2 files changed: 9 ins; 5 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From tholenstein at openjdk.org Sat Nov 30 00:15:18 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Sat, 30 Nov 2024 00:15:18 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v8] In-Reply-To: References: Message-ID: <1Y-P6ytzuAaNLtzMlmrh7_b4o-Dr6wqpN0mxDkBiFgs=.38c8cb92-1ac6-4db3-b0ba-34b5f8f70cca@github.com> > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: make applyForceBasedAdjustment more numerical stable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/ce9720d3..14d20181 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=06-07 Stats: 43 lines in 1 file changed: 28 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From tholenstein at openjdk.org Sat Nov 30 00:15:18 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Sat, 30 Nov 2024 00:15:18 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v7] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 23:10:27 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - trailing whitespace > - function comment added for setLinkControlPoints > @chhagedorn reported the following bug: Click `dynamic layout` -> `show all nodes` -> Bug: showing only lines > > [broken.xml.zip](https://github.com/user-attachments/files/17961889/broken.xml.zip) > > Screenshot 2024-11-29 at 18 21 00 this should be fixed now. The cause was numerical instability of `applyForceBasedAdjustment` ------------- PR Comment: https://git.openjdk.org/jdk/pull/22438#issuecomment-2508745725 From tholenstein at openjdk.org Sat Nov 30 00:15:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Sat, 30 Nov 2024 00:15:20 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v2] In-Reply-To: References: Message-ID: <1vGfRMsqM9hlE9QLTy39Fb3pELFG_Idyi2au4Kmg7Bo=.9f360ee6-05bd-4519-b46f-9032e8b1fab6@github.com> On Thu, 28 Nov 2024 14:55:26 GMT, Emanuel Peter wrote: >> Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - JDK-8345041 IGV: Free Placement Mode in IGV Layout >> >> fix >> - Merge branch 'pr/22402' into JDK-8343705 >> - revert copyright changes >> - 8343705: IGV: Interactive Node Moving in Hierarchical Layout >> - fixed graph objects equality >> - remove executability of igv.sh >> - update Figure height calculation for Slots >> - run IGV without asserts >> - batch add connectionLayer.addChildren(newWidgets); >> - remove dead code in LineWidget >> - ... and 7 more: https://git.openjdk.org/jdk/compare/b9c6ce90...4f7ca8ee > > This is really amazing, the feature. Really would make me start using IGV. > > I scanned the code changes quickly, and it seems reasonable, at least to a non-IGV developer ? Thanks @eme64, @robcasloz and @chhagedorn for testing and reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22438#issuecomment-2508746746 From amitkumar at openjdk.org Sat Nov 30 02:44:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 30 Nov 2024 02:44:42 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v7] In-Reply-To: References: <2Sb1YAo9ETANGIBrtFbwlX2QpmLzK5F-GikP9gcPRZg=.8f3c7ad0-a162-4b88-ab36-0e0ce4268f81@github.com> <29kM5R6hcczOhxUAxnkhFEKiZWKjB5_Ru9OIMfpElis=.e76467cc-9418-445f-82d5-d872ec65d2b7@github.com> Message-ID: On Fri, 29 Nov 2024 22:50:39 GMT, Boris Ulasevich wrote: >> Sorry for delay. I will check arm32. > > All right. The ARM32 build is fine and jtreg testing shows no regressions. Thanks for fixing this for all ports! I need one more approval to integrate this :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1864053108 From amitkumar at openjdk.org Sat Nov 30 03:10:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 30 Nov 2024 03:10:33 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v2] In-Reply-To: References: Message-ID: > fixes the issue reported by ubsan. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: cover lir_add as well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22456/files - new: https://git.openjdk.org/jdk/pull/22456/files/659836a6..5bdf265a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=00-01 Stats: 11 lines in 1 file changed: 7 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From chagedorn at openjdk.org Sat Nov 30 09:58:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Sat, 30 Nov 2024 09:58:38 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v8] In-Reply-To: <1Y-P6ytzuAaNLtzMlmrh7_b4o-Dr6wqpN0mxDkBiFgs=.38c8cb92-1ac6-4db3-b0ba-34b5f8f70cca@github.com> References: <1Y-P6ytzuAaNLtzMlmrh7_b4o-Dr6wqpN0mxDkBiFgs=.38c8cb92-1ac6-4db3-b0ba-34b5f8f70cca@github.com> Message-ID: On Sat, 30 Nov 2024 00:15:18 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make applyForceBasedAdjustment more numerical stable Awesome enhancement! I've just tested it on Linux and it works very well. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2470857698