From fyang at openjdk.org Mon May 1 03:41:24 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 1 May 2023 03:41:24 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v5] In-Reply-To: <3olpwS6aeI1iq5NQC5jbM62Hq2-LGg1ait21_9yHfas=.da841ea6-90ce-4fdb-8136-cf1da00b4e8c@github.com> References: <3olpwS6aeI1iq5NQC5jbM62Hq2-LGg1ait21_9yHfas=.da841ea6-90ce-4fdb-8136-cf1da00b4e8c@github.com> Message-ID: On Sun, 30 Apr 2023 16:13:53 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Small refactoring of rvv_vsetvli Thanks for the update. A few more questions follow. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1787: > 1785: VectorRegister src, BasicType src_bt) { > 1786: assert(type2aelembytes(dst_bt) > type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 8 && type2aelembytes(src_bt) <= 4, "invalid element size"); > 1787: assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element"); Suggestion: s/"should be integer element"/"unsupported element type"/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1825: > 1823: VectorRegister src, BasicType src_bt, VectorRegister tmp) { > 1824: assert(type2aelembytes(dst_bt) < type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 4 && type2aelembytes(src_bt) <= 8, "invalid element size"); > 1825: assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element"); Suggestion: s/"should be integer element"/"unsupported element type"/ src/hotspot/cpu/riscv/riscv_v.ad line 2660: > 2658: __ vector_integer_extend(as_VectorRegister($dst$$reg), bt == T_FLOAT ? T_INT : T_LONG, > 2659: Matcher::vector_length(this), as_VectorRegister($src$$reg), T_BYTE); > 2660: __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg)); The single-witdh vector conversion instructions when converting to floating-point values use the dynamic rounding mode in 'frm', so I think you should set 'frm' to the correct rounding mode first. You might also want to check the exceptional conditions possibly set by those instrucions. src/hotspot/cpu/riscv/riscv_v.ad line 2718: > 2716: %} > 2717: > 2718: instruct vcvtItoX(vReg dst, vReg src) %{ I think the 'TEMP_DEF dst' effect is only needed for the T_DOUBLE case. It still works if 'dst' and 'src' are allocated the same vector register for the T_FLOAT case. So you might want to further break down this into two separate ones, say 'vcvtItoF' and 'vcvtItoD'. src/hotspot/cpu/riscv/riscv_v.ad line 2804: > 2802: ins_encode %{ > 2803: __ rvv_vsetvli(T_FLOAT, Matcher::vector_length(this, $src)); > 2804: __ vfcvt_x_f_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg)); The language spec [1] specfies: "The round toward zero rounding policy applies to (i) conversion of a floating-point value to an integer value ([?5.1.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3)), and (ii) floating-point remainder ([?15.17.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.17.3))." So it looks to me that we should use the 'rtz' variant (vfcvt.rtz.x.f.v) here to do the conversion to integer instead here. Please also check other places where we do conversion from float-point value to integer value. [1] https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.4 ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1407243306 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365812 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181365844 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181364523 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181331244 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1181332597 From cslucas at openjdk.org Mon May 1 18:41:26 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 18:41:26 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v11] In-Reply-To: <6I1KVkFSekhMTTDq6nXQNoKPE96bycERRtsPrTnZZvU=.c1933f7f-e659-4e22-93a3-e7fbbcdf53a1@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6I1KVkFSekhMTTDq6nXQNoKPE96bycERRtsPrTnZZvU=.c1933f7f-e659-4e22-93a3-e7fbbcdf53a1@github.com> Message-ID: On Wed, 26 Apr 2023 17:28:53 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address part of PR review 4 & fix a bug setting only_candidate I have an update to this PR to make it possible to scalar replace allocations when the Phi is used in a CmpP (not for all cases). Is there any objection to me pushing these changes? I.e., will it complicate any ongoing review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1530047937 From cslucas at openjdk.org Mon May 1 18:41:33 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 18:41:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 27 Apr 2023 23:36:06 GMT, Vladimir Ivanov wrote: >> Can `ObjectCandidateValue` be a wrapper around a `ObjectAllocationValue`? >> >> It does make sense to separate `ObjectMergeValue` and `ObjectValue`. > > I need to to study the code in more details. Seems like I'm missing something important here. @iwanowww - how can I make it easier for you to review? Thanks for your comments so far. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1181781323 From cslucas at openjdk.org Mon May 1 20:20:51 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 20:20:51 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Fix tests. Remember previous reducible Phis. - Address PR review 3. Some comments and be able to abort compilation. - Merge with Master - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. - Add support for SR'ing some inputs of merges used for field loads - Fix some typos and do some small refactorings. - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=11 Stats: 2250 lines in 26 files changed: 1990 ins; 108 del; 152 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From epeter at openjdk.org Tue May 2 06:15:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 May 2023 06:15:21 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v4] In-Reply-To: References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: <_FZ3_cF5e_T-qfT4NrV0Jup6r9r0kgU8jH0sxM1X5go=.5e618d41-163c-4665-a1ea-d995e01bcd3b@github.com> On Fri, 28 Apr 2023 18:56:05 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> CCP worklist on local arena > > src/hotspot/share/opto/phaseX.cpp line 1961: > >> 1959: // Push root onto worklist >> 1960: worklist.push(C->root()); >> 1961: DEBUG_ONLY(Unique_Node_List worklist_verify;) > > Should you put `worklist_verify` to `local_arena` too? @vnkozlov I could do that, but it is not required. The CCP `worklist` gets passed downward much farther, including the graph walks in `push_child_nodes_to_worklist`. So there it is nice to be able to have `ResourceMarks`. But `worklist_verify` is only modified directly in `PhaseCCP::analyze` and one layer deeper in `PhaseCCP::verify_analyze`. So I would never expect a `ResourceMark` to mess with re-allocation. So at this point the only reason to add `worklist_verify` to `local_arena` is to ensure it is de-allocated afterward. We could also use a `ResourceMark` and leave it on `Thread::current()->resource_area()`. Anyway, I'll just move it to `local_arena`, after all it is `DEBUG_ONLY`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13611#discussion_r1182123382 From epeter at openjdk.org Tue May 2 06:22:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 May 2023 06:22:17 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v5] In-Reply-To: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: > An other case of `uncast` not being type-propagated through. > > We have a case like this: > `Phi -> ShiftL -> CastII -> AndI` > > The Phi has an updated type, so we should re-run Value on the AndI. > > In PhaseCCP::push_and, we do update a similar pattern: > `X -> ShiftL -> AndI` > > I extended it to handle this pattern: > `parent -> LShift (use) -> ConstraintCast* -> And` > > For this, I implemented: > https://github.com/openjdk/jdk/blob/26f4adaae901822bea984b926c06d1a78f9c6b48/src/hotspot/share/opto/castnode.hpp#L73-L78 > > I could refactor code from a previous similar fix, for pattern: `ConstraintCast+ -> Sub/Phi` > > **Discussion** > > https://github.com/openjdk/jdk/blob/4d350f8f4eaabb18482c7656cb56a734e60187cf/src/hotspot/share/opto/castnode.hpp#L78-L79 > I would have liked to place a `ResourceMark` between these two lines, to ensure the `internals` data structure is de-allocated after the traversal. But if I add it there, then one cannot modify any outer data-structure, or else one risks re-allocation of the outer data-structure in the inner ResourceMark, and then this memory gets de-allocated once the ResourceMark is cleared, and the outer data-structure is broken. This would for example mean that I could not push to the IGVN worklist inside the callback. > > Not having the ResourceMark means a memory leak, until the compile phase is over. But my code is not the only place, there are lots of places where we create a Resource allocated data-structure, but do not use ResourceMarks. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: added worklist_verify to local_arena ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13611/files - new: https://git.openjdk.org/jdk/pull/13611/files/4d2a4f9e..324db592 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13611&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13611&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13611.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13611/head:pull/13611 PR: https://git.openjdk.org/jdk/pull/13611 From aph at openjdk.org Tue May 2 08:19:24 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 2 May 2023 08:19:24 GMT Subject: RFR: JDK-8305782: Provide MacroAssembler::breakpoint on aarch64 [v2] In-Reply-To: References: Message-ID: <-c7bT1Z3b7YMb_9yWn6EG40za4ukJy3LycjdPedSJyM=.b2e8e7f3-15fa-4cc9-af05-8c746620b9a7@github.com> On Sun, 30 Apr 2023 11:54:37 GMT, Erik ?sterlund wrote: > > > > Now, every time you run the program, a breakpoint will be set at `*poo`. And you can continue after the breakpoint. > > That doesn't work with code relocation, does it? It becomes a bit more elaborate, true. On the other hand, you can simply continue after the breakpoint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13401#issuecomment-1531064412 From stuefe at openjdk.org Tue May 2 08:54:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 08:54:28 GMT Subject: RFR: JDK-8305782: Provide MacroAssembler::breakpoint on aarch64 [v2] In-Reply-To: References: Message-ID: On Sun, 30 Apr 2023 11:54:37 GMT, Erik ?sterlund wrote: > The best way is like this: > > ``` > address poo; > > void stuff() { > ... instructions ... > poo = pc(); > ... instructions ... > } > ``` > > then, in gdb: > > ``` > (gdb) b > breakpoint 2 at 0xabcedf > (gdb) comm 2 > Type commands for breakpoint(s) 2, one per line. > End with a line saying just "end". > >b *poo > >c > >end > (gdb) > ``` > > Now, every time you run the program, a breakpoint will be set at `*poo`. And you can continue after the breakpoint. Thanks, Andrew, that is helpful. I use breakpoint mostly in cases where I have a utility function that gets emitted multiple times, and I don't care which instance I debug, I'm fine with the first one that hits break. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13401#issuecomment-1531104977 From qamai at openjdk.org Tue May 2 14:02:26 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 2 May 2023 14:02:26 GMT Subject: RFR: 8304948: [vectorapi] C2 crashes when expanding VectorBox [v3] In-Reply-To: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> References: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> Message-ID: On Tue, 25 Apr 2023 15:12:12 GMT, Eric Liu wrote: >> This patch fixes C2 failure with SIGSEGV due to endless recursion. >> >> With test case VectorBoxExpandTest.java in this patch, C2 would generate IR graph like below: >> >> >> ------------ >> / \ >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> | | >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> |------------/ >> | >> >> >> >> This Phi will be optimized by merge_through_phi [1], which transforms `Phi (VectorBox VectorBox)` into `VectorBox (Phi Phi)` to pursue opportunity of combining VectorBox with VectorUnbox. In this process, either the pre type check [2] or the process cloning Phi nodes [3], the circle case is well considered to avoid falling into endless loop. >> >> After merge_through_phi, each input Phi of new VectorBox has the same shape with original root Phi before merging (only VectorBox has been replaced). After several other optimizations, C2 would expand VectorBox [4] on a graph like below: >> >> >> ------------ >> / \ >> Region | Proj | >> \ | / | >> Phi | >> | | >> | | >> Region | Proj | >> \ | / | >> Phi | >> | | >> |------------/ >> | >> | Phi >> | / >> VectorBox >> >> >> which the circle case should be taken into consideration as well. >> >> [TEST] >> Full Jtreg passed without new failure. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2554 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2571 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2531 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L311 > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > expand vector box in local > > Change-Id: Ie7bddc049b479aad4f953ec920d83b91e7de2152 src/hotspot/share/opto/vector.cpp line 345: > 343: box_type, vect_type, visited); > 344: if (!new_box->is_Phi()) { > 345: C->initial_gvn()->hash_delete(vbox); May I ask why is this needed? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13489#discussion_r1182590704 From kvn at openjdk.org Tue May 2 16:06:31 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 May 2023 16:06:31 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v5] In-Reply-To: References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: On Tue, 2 May 2023 06:22:17 GMT, Emanuel Peter wrote: >> An other case of `uncast` not being type-propagated through. >> >> We have a case like this: >> `Phi -> ShiftL -> CastII -> AndI` >> >> The Phi has an updated type, so we should re-run Value on the AndI. >> >> In PhaseCCP::push_and, we do update a similar pattern: >> `X -> ShiftL -> AndI` >> >> I extended it to handle this pattern: >> `parent -> LShift (use) -> ConstraintCast* -> And` >> >> For this, I implemented: >> https://github.com/openjdk/jdk/blob/26f4adaae901822bea984b926c06d1a78f9c6b48/src/hotspot/share/opto/castnode.hpp#L73-L78 >> >> I could refactor code from a previous similar fix, for pattern: `ConstraintCast+ -> Sub/Phi` >> >> **Discussion** >> >> https://github.com/openjdk/jdk/blob/4d350f8f4eaabb18482c7656cb56a734e60187cf/src/hotspot/share/opto/castnode.hpp#L78-L79 >> I would have liked to place a `ResourceMark` between these two lines, to ensure the `internals` data structure is de-allocated after the traversal. But if I add it there, then one cannot modify any outer data-structure, or else one risks re-allocation of the outer data-structure in the inner ResourceMark, and then this memory gets de-allocated once the ResourceMark is cleared, and the outer data-structure is broken. This would for example mean that I could not push to the IGVN worklist inside the callback. >> >> Not having the ResourceMark means a memory leak, until the compile phase is over. But my code is not the only place, there are lots of places where we create a Resource allocated data-structure, but do not use ResourceMarks. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > added worklist_verify to local_arena Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13611#pullrequestreview-1409419056 From kvn at openjdk.org Tue May 2 16:06:34 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 May 2023 16:06:34 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v4] In-Reply-To: <_FZ3_cF5e_T-qfT4NrV0Jup6r9r0kgU8jH0sxM1X5go=.5e618d41-163c-4665-a1ea-d995e01bcd3b@github.com> References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> <_FZ3_cF5e_T-qfT4NrV0Jup6r9r0kgU8jH0sxM1X5go=.5e618d41-163c-4665-a1ea-d995e01bcd3b@github.com> Message-ID: <4OapTb_9eqn6r6XuECvTX-qBwU2aWpUo7m6x-KZBYJw=.c48642f2-6386-4016-a9a1-51c5033477a6@github.com> On Tue, 2 May 2023 06:12:45 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/phaseX.cpp line 1961: >> >>> 1959: // Push root onto worklist >>> 1960: worklist.push(C->root()); >>> 1961: DEBUG_ONLY(Unique_Node_List worklist_verify;) >> >> Should you put `worklist_verify` to `local_arena` too? > > @vnkozlov I could do that, but it is not required. The CCP `worklist` gets passed downward much farther, including the graph walks in `push_child_nodes_to_worklist`. So there it is nice to be able to have `ResourceMarks`. But `worklist_verify` is only modified directly in `PhaseCCP::analyze` and one layer deeper in `PhaseCCP::verify_analyze`. So I would never expect a `ResourceMark` to mess with re-allocation. > > So at this point the only reason to add `worklist_verify` to `local_arena` is to ensure it is de-allocated afterward. We could also use a `ResourceMark` and leave it on `Thread::current()->resource_area()`. > > Anyway, I'll just move it to `local_arena`, after all it is `DEBUG_ONLY`. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13611#discussion_r1182752605 From thartmann at openjdk.org Wed May 3 06:01:18 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 May 2023 06:01:18 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v5] In-Reply-To: References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: On Tue, 2 May 2023 06:22:17 GMT, Emanuel Peter wrote: >> An other case of `uncast` not being type-propagated through. >> >> We have a case like this: >> `Phi -> ShiftL -> CastII -> AndI` >> >> The Phi has an updated type, so we should re-run Value on the AndI. >> >> In PhaseCCP::push_and, we do update a similar pattern: >> `X -> ShiftL -> AndI` >> >> I extended it to handle this pattern: >> `parent -> LShift (use) -> ConstraintCast* -> And` >> >> For this, I implemented: >> https://github.com/openjdk/jdk/blob/26f4adaae901822bea984b926c06d1a78f9c6b48/src/hotspot/share/opto/castnode.hpp#L73-L78 >> >> I could refactor code from a previous similar fix, for pattern: `ConstraintCast+ -> Sub/Phi` >> >> **Discussion** >> >> https://github.com/openjdk/jdk/blob/4d350f8f4eaabb18482c7656cb56a734e60187cf/src/hotspot/share/opto/castnode.hpp#L78-L79 >> I would have liked to place a `ResourceMark` between these two lines, to ensure the `internals` data structure is de-allocated after the traversal. But if I add it there, then one cannot modify any outer data-structure, or else one risks re-allocation of the outer data-structure in the inner ResourceMark, and then this memory gets de-allocated once the ResourceMark is cleared, and the outer data-structure is broken. This would for example mean that I could not push to the IGVN worklist inside the callback. >> >> Not having the ResourceMark means a memory leak, until the compile phase is over. But my code is not the only place, there are lots of places where we create a Resource allocated data-structure, but do not use ResourceMarks. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > added worklist_verify to local_arena Still good. Ship it! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13611#pullrequestreview-1410224814 From thartmann at openjdk.org Wed May 3 07:22:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 May 2023 07:22:15 GMT Subject: RFR: 8306933: C2: "assert(false) failed: infinite loop" failure In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 15:36:13 GMT, Roland Westrelin wrote: > The assert fires because an infinite loop appears in the graph after > loop opts are over. > > After loop opts, the `for(;;)` loop contains a null check and a range > check for `array[i]`. So it's not considered an infinite loop (it has > exits to uncommon traps). The null check and range check are redundant > with the one right before the loop: `int v = array2[k];` IGVN can > optimize it but it doesn't happen until after loop opts when a > `ConvI2L` for the `array[i]` access is processed as part of post loop > opts IGVN. The `for(;;)` loop is then emptied and only contains a > `Loop` and a `Safepoint` nodes. > > I propose removing the assert (at least for now) as I don't see a way > to guarantee no infinite loop can appear after loop opts. Looks good to me too (besides the missing `-XX:+UnlockDiagnosticVMOptions`). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13672#pullrequestreview-1410308515 From roland at openjdk.org Wed May 3 08:32:27 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 08:32:27 GMT Subject: RFR: 8306997: C2: "malformed control flow" assert due to missing safepoint on backedge with a switch In-Reply-To: References: Message-ID: <-lP3avrk4i8xeddTbp4cH4JjBiqgE3Qw8WVaeSUp40k=.9283b778-bb31-4039-903d-a56b2eb389cb@github.com> On Fri, 28 Apr 2023 06:29:56 GMT, Tobias Hartmann wrote: >> The assert fires because a self loop (a `Loop` whose second input is >> itself) is removed by loop opts. That loop comes from a switch where >> the default case is a loop head (a code shape I couldn't get javac to >> produce). That `Loop` should at the very least have a `Safepoint` but >> the logic at parse time only looks for backedges in the non default >> cases. With that fixed, the `Loop` is no longer considered dead code. > > Looks good to me. @TobiHartmann @vnkozlov thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13688#issuecomment-1532638132 From roland at openjdk.org Wed May 3 08:32:30 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 08:32:30 GMT Subject: Integrated: 8306997: C2: "malformed control flow" assert due to missing safepoint on backedge with a switch In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 11:27:30 GMT, Roland Westrelin wrote: > The assert fires because a self loop (a `Loop` whose second input is > itself) is removed by loop opts. That loop comes from a switch where > the default case is a loop head (a code shape I couldn't get javac to > produce). That `Loop` should at the very least have a `Safepoint` but > the logic at parse time only looks for backedges in the non default > cases. With that fixed, the `Loop` is no longer considered dead code. This pull request has now been integrated. Changeset: e0774bed Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/e0774bed2d2fcd850f5ca6884dd7aeb45f0bdaef Stats: 106 lines in 3 files changed: 104 ins; 0 del; 2 mod 8306997: C2: "malformed control flow" assert due to missing safepoint on backedge with a switch Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13688 From roland at openjdk.org Wed May 3 08:42:06 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 08:42:06 GMT Subject: RFR: 8306933: C2: "assert(false) failed: infinite loop" failure [v2] In-Reply-To: References: Message-ID: > The assert fires because an infinite loop appears in the graph after > loop opts are over. > > After loop opts, the `for(;;)` loop contains a null check and a range > check for `array[i]`. So it's not considered an infinite loop (it has > exits to uncommon traps). The null check and range check are redundant > with the one right before the loop: `int v = array2[k];` IGVN can > optimize it but it doesn't happen until after loop opts when a > `ConvI2L` for the `array[i]` access is processed as part of post loop > opts IGVN. The `for(;;)` loop is then emptied and only contains a > `Loop` and a `Safepoint` nodes. > > I propose removing the assert (at least for now) as I don't see a way > to guarantee no infinite loop can appear after loop opts. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: -XX:+UnlockDiagnosticVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13672/files - new: https://git.openjdk.org/jdk/pull/13672/files/f943db9d..2e466f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13672&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13672&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13672/head:pull/13672 PR: https://git.openjdk.org/jdk/pull/13672 From roland at openjdk.org Wed May 3 08:42:09 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 08:42:09 GMT Subject: RFR: 8306933: C2: "assert(false) failed: infinite loop" failure [v2] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 10:29:14 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> -XX:+UnlockDiagnosticVMOptions > > test/hotspot/jtreg/compiler/c2/TestInfiniteLoopCompilationFailure.java line 31: > >> 29: * -XX:+StressIGVN -XX:StressSeed=675320863 TestInfiniteLoopCompilationFailure >> 30: * @run main/othervm -Xcomp -XX:CompileOnly=TestInfiniteLoopCompilationFailure::test -XX:-UseLoopPredicate -XX:-UseProfiledLoopPredicate >> 31: * -XX:+StressIGVN TestInfiniteLoopCompilationFailure > > Since `StressIGVN` is diagnostic, you need to add `-XX:+UnlockDiagnosticVMOptions` here. Thanks for reviewing this. I made the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13672#discussion_r1183394234 From epeter at openjdk.org Wed May 3 10:43:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 May 2023 10:43:14 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 17:46:24 GMT, Vladimir Kozlov wrote: >> `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. >> The problem is with the basic approach of it, as far as I know. >> Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> **The current approach** >> >> The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. >> >> **Why this does not work** >> >> However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: >> >> https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 >> >> I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. >> >> **Solution** >> >> Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 >> >> We first schedule all memops into a linear order. >> We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). >> In other words: we have a linearization that respects all dependencies that must be respected. >> Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 >> >> Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 >> >> This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. >> We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). >> >> **Discussion** >> >> This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> One potential improvement to my fix: >> We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. >> This is not wrong, but it makes updates to the graph that may be confusing when debugging. >> Further, the re-ordering may have performance impacts. >> I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. >> >> **Testing** >> Github-actions pass. tier1-6 + stress testing passes. >> Performance testing showed no significant performance change. > > Nice rewrite. @vnkozlov Thanks for the review! @fg1417 @jatin-bhateja Would one of you be willing to review this also? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13354#issuecomment-1532806136 From epeter at openjdk.org Wed May 3 10:48:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 May 2023 10:48:24 GMT Subject: RFR: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) [v5] In-Reply-To: References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: On Tue, 2 May 2023 16:03:13 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> added worklist_verify to local_arena > > Marked as reviewed by kvn (Reviewer). @vnkozlov @chhagedorn @TobiHartmann Thanks for the reviews and discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13611#issuecomment-1532813244 From epeter at openjdk.org Wed May 3 10:48:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 May 2023 10:48:26 GMT Subject: Integrated: 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) In-Reply-To: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> References: <5LdntwU5zlwXPnwYeJxzNPZTwrOuki6VebrE9Leeb8g=.3dc26d60-2729-4d60-9a5c-14cbb57f2813@github.com> Message-ID: <4amQfO4REWuufHGb9jXiia7mlYjIhFQOy0atJX-tpyA=.2d90386b-8bee-4e4e-9c34-6a6b7134de77@github.com> On Mon, 24 Apr 2023 11:32:48 GMT, Emanuel Peter wrote: > An other case of `uncast` not being type-propagated through. > > We have a case like this: > `Phi -> ShiftL -> CastII -> AndI` > > The Phi has an updated type, so we should re-run Value on the AndI. > > In PhaseCCP::push_and, we do update a similar pattern: > `X -> ShiftL -> AndI` > > I extended it to handle this pattern: > `parent -> LShift (use) -> ConstraintCast* -> And` > > For this, I implemented: > https://github.com/openjdk/jdk/blob/26f4adaae901822bea984b926c06d1a78f9c6b48/src/hotspot/share/opto/castnode.hpp#L73-L78 > > I could refactor code from a previous similar fix, for pattern: `ConstraintCast+ -> Sub/Phi` > > **Discussion** > > https://github.com/openjdk/jdk/blob/4d350f8f4eaabb18482c7656cb56a734e60187cf/src/hotspot/share/opto/castnode.hpp#L78-L79 > I would have liked to place a `ResourceMark` between these two lines, to ensure the `internals` data structure is de-allocated after the traversal. But if I add it there, then one cannot modify any outer data-structure, or else one risks re-allocation of the outer data-structure in the inner ResourceMark, and then this memory gets de-allocated once the ResourceMark is cleared, and the outer data-structure is broken. This would for example mean that I could not push to the IGVN worklist inside the callback. > > Not having the ResourceMark means a memory leak, until the compile phase is over. But my code is not the only place, there are lots of places where we create a Resource allocated data-structure, but do not use ResourceMarks. This pull request has now been integrated. Changeset: e9807a4b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e9807a4b0f3533512623fba96042472b69d4ac34 Stats: 128 lines in 3 files changed: 97 ins; 22 del; 9 mod 8306042: C2: failed: Missed optimization opportunity in PhaseCCP (adding LShift->Cast->Add notification) Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13611 From chagedorn at openjdk.org Wed May 3 11:11:15 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 May 2023 11:11:15 GMT Subject: RFR: 8306933: C2: "assert(false) failed: infinite loop" failure [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 08:42:06 GMT, Roland Westrelin wrote: >> The assert fires because an infinite loop appears in the graph after >> loop opts are over. >> >> After loop opts, the `for(;;)` loop contains a null check and a range >> check for `array[i]`. So it's not considered an infinite loop (it has >> exits to uncommon traps). The null check and range check are redundant >> with the one right before the loop: `int v = array2[k];` IGVN can >> optimize it but it doesn't happen until after loop opts when a >> `ConvI2L` for the `array[i]` access is processed as part of post loop >> opts IGVN. The `for(;;)` loop is then emptied and only contains a >> `Loop` and a `Safepoint` nodes. >> >> I propose removing the assert (at least for now) as I don't see a way >> to guarantee no infinite loop can appear after loop opts. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > -XX:+UnlockDiagnosticVMOptions Update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13672#pullrequestreview-1410668978 From roland at openjdk.org Wed May 3 11:17:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 11:17:29 GMT Subject: RFR: 8306933: C2: "assert(false) failed: infinite loop" failure [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 07:19:34 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> -XX:+UnlockDiagnosticVMOptions > > Looks good to me too (besides the missing `-XX:+UnlockDiagnosticVMOptions`). @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/13672#issuecomment-1532850709 From roland at openjdk.org Wed May 3 11:17:31 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 May 2023 11:17:31 GMT Subject: Integrated: 8306933: C2: "assert(false) failed: infinite loop" failure In-Reply-To: References: Message-ID: <99hVNsof0aLe7jBLDn76tNzarT9RlEGT7LMW_X3ayRk=.0c5b091f-f732-4f75-9a72-00b8e262b7dd@github.com> On Wed, 26 Apr 2023 15:36:13 GMT, Roland Westrelin wrote: > The assert fires because an infinite loop appears in the graph after > loop opts are over. > > After loop opts, the `for(;;)` loop contains a null check and a range > check for `array[i]`. So it's not considered an infinite loop (it has > exits to uncommon traps). The null check and range check are redundant > with the one right before the loop: `int v = array2[k];` IGVN can > optimize it but it doesn't happen until after loop opts when a > `ConvI2L` for the `array[i]` access is processed as part of post loop > opts IGVN. The `for(;;)` loop is then emptied and only contains a > `Loop` and a `Safepoint` nodes. > > I propose removing the assert (at least for now) as I don't see a way > to guarantee no infinite loop can appear after loop opts. This pull request has now been integrated. Changeset: ccf91f88 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ccf91f881c06308f39740751161111946487abf1 Stats: 64 lines in 2 files changed: 61 ins; 3 del; 0 mod 8306933: C2: "assert(false) failed: infinite loop" failure Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13672 From gcao at openjdk.org Wed May 3 12:46:15 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 3 May 2023 12:46:15 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v6] In-Reply-To: References: Message-ID: > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Fix round mode and optimize widen/narrow vcast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/586bce12..642b25a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=04-05 Stats: 155 lines in 3 files changed: 65 ins; 29 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From gcao at openjdk.org Wed May 3 12:46:29 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 3 May 2023 12:46:29 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v5] In-Reply-To: References: <3olpwS6aeI1iq5NQC5jbM62Hq2-LGg1ait21_9yHfas=.da841ea6-90ce-4fdb-8136-cf1da00b4e8c@github.com> Message-ID: <_eedwBAXmUjNRI6JDvPejLmhdTyRiX0jubGP22Uwkjk=.9bf4a5eb-a1ef-4dde-9c48-71963099196d@github.com> On Mon, 1 May 2023 03:06:03 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Small refactoring of rvv_vsetvli > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1787: > >> 1785: VectorRegister src, BasicType src_bt) { >> 1786: assert(type2aelembytes(dst_bt) > type2aelembytes(src_bt) && type2aelembytes(dst_bt) <= 8 && type2aelembytes(src_bt) <= 4, "invalid element size"); >> 1787: assert(dst_bt != T_FLOAT && dst_bt != T_DOUBLE && src_bt != T_FLOAT && src_bt != T_DOUBLE, "should be integer element"); > > Suggestion: s/"should be integer element"/"unsupported element type"/ Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 2660: > >> 2658: __ vector_integer_extend(as_VectorRegister($dst$$reg), bt == T_FLOAT ? T_INT : T_LONG, >> 2659: Matcher::vector_length(this), as_VectorRegister($src$$reg), T_BYTE); >> 2660: __ vfcvt_f_x_v(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg)); > > The single-witdh vector conversion instructions when converting to floating-point values use the dynamic rounding mode in 'frm', so I think you should set 'frm' to the correct rounding mode first. You might also want to check the exceptional conditions possibly set by those instrucions. Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 2718: > >> 2716: %} >> 2717: >> 2718: instruct vcvtItoX(vReg dst, vReg src) %{ > > I think the 'TEMP_DEF dst' effect is only needed for the T_DOUBLE case. It still works if 'dst' and 'src' are allocated the same vector register for the T_FLOAT case. So you might want to further break down this into two separate ones, say 'vcvtItoF' and 'vcvtItoD'. Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 2804: > >> 2802: ins_encode %{ >> 2803: __ rvv_vsetvli(T_FLOAT, Matcher::vector_length(this, $src)); >> 2804: __ vfcvt_x_f_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg)); > > The language spec [1] specfies: "The round toward zero rounding policy applies to (i) conversion of a floating-point value to an integer value ([?5.1.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3)), and (ii) floating-point remainder ([?15.17.3](https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.17.3))." > > So it looks to me that we should use the 'rtz' variant (vfcvt.rtz.x.f.v) here to do the conversion to integer instead here. Please also check other places where we do conversion from float-point value to integer value. > > [1] https://docs.oracle.com/javase/specs/jls/se20/html/jls-15.html#jls-15.4 Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1183633130 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1183632961 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1183632617 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1183632804 From qamai at openjdk.org Wed May 3 14:58:23 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 3 May 2023 14:58:23 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. You can add other `Cmp` patterns: `cmp_mem_imm`, `test_mem` (which is basically `cmp_mem_imm` but matching `test_reg` helps cisc spilling), `test_mem_imm`. `cmp` patterns may need unsigned versions, too while `test` patterns do not since `CmpU x 0` should be idealised into `CmpI x 0` already. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13587#issuecomment-1532593107 From thartmann at openjdk.org Wed May 3 14:58:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 May 2023 14:58:24 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: <9Zv0-ZBghaCMrS0y7LXWeHNHnnkTYNl9sDqcwddLZQk=.1122eaa8-6720-44ff-97de-0ccf07883417@github.com> On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. A bug report was created that you can link to this PR by updating the title accordingly: https://bugs.openjdk.org/browse/JDK-8307351 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13587#issuecomment-1532871243 From duke at openjdk.org Wed May 3 14:58:22 2023 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 3 May 2023 14:58:22 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized Message-ID: This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: Before: Benchmark Mode Cnt Score Error Units AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op After: Benchmark Mode Cnt Score Error Units Improvement AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. I've tested my changes using the Tier1 jtreg Tests on Windows. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into ImproveAndTestMatching - Convert line ending from CRLF to LF - Add new matching rule to match CMP(AND) to test on x86 Changes: https://git.openjdk.org/jdk/pull/13587/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13587&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307351 Stats: 125 lines in 2 files changed: 125 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13587.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13587/head:pull/13587 PR: https://git.openjdk.org/jdk/pull/13587 From chumer at openjdk.org Wed May 3 15:03:20 2023 From: chumer at openjdk.org (Christian Humer) Date: Wed, 3 May 2023 15:03:20 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations In-Reply-To: References: Message-ID: On Wed, 3 May 2023 12:43:29 GMT, Doug Simon wrote: > This PRs adds JVMCI API to reflect that fact that deferred locals are not supported on virtual threads. test/hotspot/jtreg/compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java line 346: > 344: } > 345: > 346: static class MaterializationNotSupported extends RuntimeException { any reason this is not just an UnsupportedOperationException? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13777#discussion_r1183691261 From dnsimon at openjdk.org Wed May 3 15:03:22 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 3 May 2023 15:03:22 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations In-Reply-To: References: Message-ID: On Wed, 3 May 2023 13:30:32 GMT, Christian Humer wrote: >> This PRs adds JVMCI API to reflect that fact that deferred locals are not supported on virtual threads. > > test/hotspot/jtreg/compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java line 346: > >> 344: } >> 345: >> 346: static class MaterializationNotSupported extends RuntimeException { > > any reason this is not just an UnsupportedOperationException? To be 100% sure that it's the exception thrown by the test as opposed to some interleaving code that happens to also use UnsupportedOperationException. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13777#discussion_r1183719820 From dnsimon at openjdk.org Wed May 3 15:03:18 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 3 May 2023 15:03:18 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations Message-ID: This PRs adds JVMCI API to reflect that fact that deferred locals are not supported on virtual threads. ------------- Commit messages: - materializing frames on virtual threads is not supported Changes: https://git.openjdk.org/jdk/pull/13777/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13777&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307125 Stats: 48 lines in 6 files changed: 39 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13777/head:pull/13777 PR: https://git.openjdk.org/jdk/pull/13777 From cslucas at openjdk.org Wed May 3 20:28:32 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 3 May 2023 20:28:32 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> <8AmU_ta4meiUmO99Em5bV7XLAV4H9fAcil519yh70fU=.1a28f4a9-a992-43a7-8c4a-d1cf96835963@github.com> Message-ID: <8kDrmtWQJ9oAdm-sM916KB96TqI6HpAHrxjLFn_fRZU=.2d3d9d8e-3eb3-482e-9d1c-416908fa39ac@github.com> On Fri, 21 Apr 2023 19:23:37 GMT, Vladimir Kozlov wrote: >>> Again got failures in the test on Aarch64 running with -XX:-UseTLAB: >>> >>> ``` >>> testCmpMergeWithNull(boolean,int,int): >>> - Failed comparison: [found] 0 = 2 [given] >>> testCmpMergeWithNull_Second(boolean,int,int) >>> - Failed comparison: [found] 0 = 1 [given] >>> testMergedAccessAfterCallNoWrite(boolean,int,int) >>> - Failed comparison: [found] 2 = 3 [given] >>> testMergedAccessAfterCallWithWrite(boolean,int,int) >>> - Failed comparison: [found] 2 = 3 [given] >>> testNestedObjectsArray(boolean,int,int) >>> - Failed comparison: [found] 2 = 4 [given] >>> ``` >> >> @vnkozlov - The reason for these failures is due to an issue in the test framework ALLOC Regex: https://bugs.openjdk.org/browse/JDK-8306625 . Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? > >> Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? > > Create separate PR and fix it first. This PR still need review from @iwanowww and it may take time to address additional comments. @vnkozlov - Please let me know if you have further questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1533687457 From eliu at openjdk.org Thu May 4 01:22:16 2023 From: eliu at openjdk.org (Eric Liu) Date: Thu, 4 May 2023 01:22:16 GMT Subject: RFR: 8304948: [vectorapi] C2 crashes when expanding VectorBox [v3] In-Reply-To: References: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> Message-ID: <4_4_ddu1QCxtoRH-0VPZIxM0pKsCDkIlMBO3Qg_CkHo=.9ba5ce15-04ac-4969-b6b6-98cbb709fd61@github.com> On Tue, 2 May 2023 13:59:46 GMT, Quan Anh Mai wrote: > May I ask why is this needed? Thanks. It is a constraint that nodes must be removed from hash table before modifying their inputs. https://github.com/openjdk/jdk/blob/f4630cdc6f49812d147b9ba8a4ea4009968f0db2/src/hotspot/share/opto/node.hpp#L434 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13489#discussion_r1184448660 From qamai at openjdk.org Thu May 4 03:48:17 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 4 May 2023 03:48:17 GMT Subject: RFR: 8304948: [vectorapi] C2 crashes when expanding VectorBox [v3] In-Reply-To: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> References: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> Message-ID: <9xRjMEp-WMeFGU_Ax66uE8L6Sfl8eOC6kaGOoIWKK5c=.0701afe4-a59e-4298-b0ff-127aecf2a7a1@github.com> On Tue, 25 Apr 2023 15:12:12 GMT, Eric Liu wrote: >> This patch fixes C2 failure with SIGSEGV due to endless recursion. >> >> With test case VectorBoxExpandTest.java in this patch, C2 would generate IR graph like below: >> >> >> ------------ >> / \ >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> | | >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> |------------/ >> | >> >> >> >> This Phi will be optimized by merge_through_phi [1], which transforms `Phi (VectorBox VectorBox)` into `VectorBox (Phi Phi)` to pursue opportunity of combining VectorBox with VectorUnbox. In this process, either the pre type check [2] or the process cloning Phi nodes [3], the circle case is well considered to avoid falling into endless loop. >> >> After merge_through_phi, each input Phi of new VectorBox has the same shape with original root Phi before merging (only VectorBox has been replaced). After several other optimizations, C2 would expand VectorBox [4] on a graph like below: >> >> >> ------------ >> / \ >> Region | Proj | >> \ | / | >> Phi | >> | | >> | | >> Region | Proj | >> \ | / | >> Phi | >> | | >> |------------/ >> | >> | Phi >> | / >> VectorBox >> >> >> which the circle case should be taken into consideration as well. >> >> [TEST] >> Full Jtreg passed without new failure. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2554 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2571 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2531 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L311 > > Eric Liu has updated the pull request incrementally with one additional commit since the last revision: > > expand vector box in local > > Change-Id: Ie7bddc049b479aad4f953ec920d83b91e7de2152 Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13489#pullrequestreview-1412242854 From qamai at openjdk.org Thu May 4 03:48:19 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 4 May 2023 03:48:19 GMT Subject: RFR: 8304948: [vectorapi] C2 crashes when expanding VectorBox [v3] In-Reply-To: <4_4_ddu1QCxtoRH-0VPZIxM0pKsCDkIlMBO3Qg_CkHo=.9ba5ce15-04ac-4969-b6b6-98cbb709fd61@github.com> References: <1wE09YSA0YEQ5CdPJkBHiyO7HFOUF1dh9mbMOuT5W04=.22b77c9b-3b7a-4107-bd6d-44f9a3a6e5d5@github.com> <4_4_ddu1QCxtoRH-0VPZIxM0pKsCDkIlMBO3Qg_CkHo=.9ba5ce15-04ac-4969-b6b6-98cbb709fd61@github.com> Message-ID: On Thu, 4 May 2023 01:19:04 GMT, Eric Liu wrote: >> src/hotspot/share/opto/vector.cpp line 345: >> >>> 343: box_type, vect_type, visited); >>> 344: if (!new_box->is_Phi()) { >>> 345: C->initial_gvn()->hash_delete(vbox); >> >> May I ask why is this needed? Thanks. > >> May I ask why is this needed? Thanks. > > It is a constraint that nodes must be removed from hash table before modifying their inputs. https://github.com/openjdk/jdk/blob/f4630cdc6f49812d147b9ba8a4ea4009968f0db2/src/hotspot/share/opto/node.hpp#L434 Ah yes silly me, thanks a lot for your answers, I have no more questions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13489#discussion_r1184515136 From yyang at openjdk.org Thu May 4 08:08:30 2023 From: yyang at openjdk.org (Yi Yang) Date: Thu, 4 May 2023 08:08:30 GMT Subject: Withdrawn: 8303970: C2 can not merge homogeneous adjacent two If In-Reply-To: References: Message-ID: <3FL0UH1cNSRSswYgo23lw86Lw9QEShQi4ANuvmapi-s=.4e749f8d-3110-4250-83c5-3be08c499ddd@github.com> On Fri, 10 Mar 2023 14:37:06 GMT, Yi Yang wrote: > Hi, can I have a review for this patch? It adds new Identity for BoolNode to lookup homogenous integer comparison, i.e. `Bool (CmpX a b)` is identity to `Bool (CmpX b a)`, in this way, we are able to merge the following two "identical" Ifs, which is not before. > > > public static void test(int a, int b) { // ok, identical ifs, apply split_if > if (a == b) { > int_field = 0x42; > } else { > int_field = 42; > } > if (a == b) { > int_field = 0x42; > } else { > int_field = 42; > } > } > > public static void test(int a, int b) { // do nothing > if (a == b) { > int_field = 0x42; > } else { > int_field = 42; > } > if (b == a) { > int_field = 0x42; > } else { > int_field = 42; > } > } > > > Testing: tier1, appllication/ctw/modules This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12978 From yyang at openjdk.org Thu May 4 08:08:31 2023 From: yyang at openjdk.org (Yi Yang) Date: Thu, 4 May 2023 08:08:31 GMT Subject: Withdrawn: 8304049: C2 can not merge trivial Ifs due to CastII In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 10:37:03 GMT, Yi Yang wrote: > Hi can I have a review for this patch? C2 can not apply Split If for the attached trivial case. PhiNode::Ideal removes itself by unique_input but introduces a new CastII > > https://github.com/openjdk/jdk/blob/e3777b0c49abb9cc1925f4044392afadf3adef61/src/hotspot/share/opto/cfgnode.cpp#L1470-L1474 > > https://github.com/openjdk/jdk/blob/e3777b0c49abb9cc1925f4044392afadf3adef61/src/hotspot/share/opto/cfgnode.cpp#L2078-L2079 > > Therefore we have two Cmp, which is not identical for split_if. > > ![image](https://user-images.githubusercontent.com/5010047/225285449-b41dc939-1d3f-45f3-b6d6-a9b9445c2f6a.png) > (Fig1. Phi#41 is removed during ideal, create CastII#58 then) > > ![image](https://user-images.githubusercontent.com/5010047/225285493-30471f1c-97b0-452b-9218-3b5f09f09859.png) > (Fig2. CmpI#42 and CmpI#23 are different comparisons, they are not identical_backtoback_ifs ) > > This patch adds Cmp identity to find existing Cmp node, i.e. Cmp#42 is identity to Cmp#23 > > > public static void test5(int a, int b){ > > if( b!=0) { > int_field = 35; > } else { > int_field =222; > } > > if( b!=0) { > int_field = 35; > } else { > int_field =222; > } > } > > > > Test: tier1, application/ctw/modules This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13039 From yyang at openjdk.org Thu May 4 08:14:44 2023 From: yyang at openjdk.org (Yi Yang) Date: Thu, 4 May 2023 08:14:44 GMT Subject: Withdrawn: 8304034: Remove redundant and meaningless comments in opto In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 07:15:11 GMT, Yi Yang wrote: > Please help review this trivial change to remove redundant and meaningless comments in `hotspot/share/opto` directory. > > They are either > 1. Repeat the function name that the function they comment for. > 2. Makes no sense, e.g. `//----Idealize----` > > And I think original CC-style code (`if( test )`,`call( arg )`) can be formatted in one go, instead of formatting the near code when someone touches them. But this may form a big patch, and it confuses code blame, so I left this work until we reach a consensus. > > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12995 From fgao at openjdk.org Thu May 4 10:25:19 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 4 May 2023 10:25:19 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 14:55:55 GMT, Emanuel Peter wrote: > `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. > The problem is with the basic approach of it, as far as I know. > Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > **The current approach** > > The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. > > **Why this does not work** > > However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: > > https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 > > I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. > > **Solution** > > Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 > > We first schedule all memops into a linear order. > We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). > In other words: we have a linearization that respects all dependencies that must be respected. > Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 > > Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 > > This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. > We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). > > **Discussion** > > This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > One potential improvement to my fix: > We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. > This is not wrong, but it makes updates to the graph that may be confusing when debugging. > Further, the re-ordering may have performance impacts. > I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. > > **Testing** > Github-actions pass. tier1-6 + stress testing passes. > Performance testing showed no significant performance change. Great work! src/hotspot/share/opto/superword.cpp line 2766: > 2764: > 2765: for (int i = 0; i < _block.length(); i++) { > 2766: Node* n = _block.at(i); // last in pack Nit: how about moving this comment line to the upward side of the if clause on L2768? A little bit confusing now. src/hotspot/share/opto/superword.cpp line 2768: > 2766: Node* n = _block.at(i); // last in pack > 2767: Node_List* p = my_pack(n); > 2768: if (p != nullptr && n == p->at(p->size()-1)) { Sorry, I don't quite understand why the mem ops in the pack are internally in order. Maybe I missed somewhere you reordered these ops in the same pack using linearized memops_schedule. Could you please point it out for me? Thanks. test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java line 32: > 30: * be reordered during SuperWord::schedule. > 31: * @requires vm.compiler2.enabled > 32: * @requires vm.cpu.features ~= ".*avx2.*" | vm.cpu.features ~= ".*asimd.*" Can we drop this line just like you did in the files above? Seems we have cpu feature check separately for each test. ------------- PR Review: https://git.openjdk.org/jdk/pull/13354#pullrequestreview-1412534697 PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1184824754 PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1184819886 PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1184680686 From fjiang at openjdk.org Thu May 4 12:41:21 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 4 May 2023 12:41:21 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v6] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 12:46:15 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Fix round mode and optimize widen/narrow vcast Overall looks good, with some suggestions: src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1673: > 1671: } > 1672: > 1673: void C2_MacroAssembler::rvv_reduce_integral(Register dst, VectorRegister tmp, Could you please rename this to `reduce_integral_v`, we already got `xxxxx_v` naming style. Suggestion: void C2_MacroAssembler::reduce_integral_v(Register dst, VectorRegister tmp, src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1711: > 1709: // Set vl and vtype for full and partial vector operations. > 1710: // (vma = mu, vta = tu, vill = false) > 1711: void C2_MacroAssembler::rvv_vsetvli(BasicType bt, int vector_length, LMUL vlmul, Register tmp) { Same here: Suggestion: void C2_MacroAssembler::vsetvli_v(BasicType bt, int vector_length, LMUL vlmul, Register tmp) { src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1784: > 1782: } > 1783: > 1784: void C2_MacroAssembler::vector_integer_extend(VectorRegister dst, BasicType dst_bt, int vector_length, Suggestion: void C2_MacroAssembler::integer_extend_v(VectorRegister dst, BasicType dst_bt, int vector_length, src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1822: > 1820: // Vector narrow from src to dst with specified element sizes. > 1821: // High part of dst vector will be filled with zero. > 1822: void C2_MacroAssembler::vector_integer_narrow(VectorRegister dst, BasicType dst_bt, int vector_length, Suggestion: void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length, ------------- Changes requested by fjiang (Author). PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1412980702 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1184948465 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1184948943 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1184945401 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1184945681 From gcao at openjdk.org Thu May 4 16:15:58 2023 From: gcao at openjdk.org (Gui Cao) Date: Thu, 4 May 2023 16:15:58 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v7] In-Reply-To: References: Message-ID: > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: rename rvv_vsetvli to vsetvli_helper ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/642b25a8..7efb9dfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=05-06 Stats: 218 lines in 3 files changed: 0 ins; 0 del; 218 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From gcao at openjdk.org Thu May 4 16:16:06 2023 From: gcao at openjdk.org (Gui Cao) Date: Thu, 4 May 2023 16:16:06 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v6] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 12:31:03 GMT, Feilong Jiang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix round mode and optimize widen/narrow vcast > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1673: > >> 1671: } >> 1672: >> 1673: void C2_MacroAssembler::rvv_reduce_integral(Register dst, VectorRegister tmp, > > Could you please rename this to `reduce_integral_v`, we already got `xxxxx_v` naming style. > Suggestion: > > void C2_MacroAssembler::reduce_integral_v(Register dst, VectorRegister tmp, Fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1711: > >> 1709: // Set vl and vtype for full and partial vector operations. >> 1710: // (vma = mu, vta = tu, vill = false) >> 1711: void C2_MacroAssembler::rvv_vsetvli(BasicType bt, int vector_length, LMUL vlmul, Register tmp) { > > Same here: > Suggestion: > > void C2_MacroAssembler::vsetvli_v(BasicType bt, int vector_length, LMUL vlmul, Register tmp) { Fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1784: > >> 1782: } >> 1783: >> 1784: void C2_MacroAssembler::vector_integer_extend(VectorRegister dst, BasicType dst_bt, int vector_length, > > Suggestion: > > void C2_MacroAssembler::integer_extend_v(VectorRegister dst, BasicType dst_bt, int vector_length, Thanks for the review. Fixed as suggested. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1822: > >> 1820: // Vector narrow from src to dst with specified element sizes. >> 1821: // High part of dst vector will be filled with zero. >> 1822: void C2_MacroAssembler::vector_integer_narrow(VectorRegister dst, BasicType dst_bt, int vector_length, > > Suggestion: > > void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length, Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1185223652 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1185223073 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1185222520 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1185222820 From eliu at openjdk.org Fri May 5 00:38:28 2023 From: eliu at openjdk.org (Eric Liu) Date: Fri, 5 May 2023 00:38:28 GMT Subject: RFR: 8304948: [vectorapi] C2 crashes when expanding VectorBox [v2] In-Reply-To: <4NjSkOwLqjp_TkdMwYLo2GcxEACHHlBwcRjUTW50fVg=.18b2de9d-75a5-42f2-a0f7-3324c19323ba@github.com> References: <4NjSkOwLqjp_TkdMwYLo2GcxEACHHlBwcRjUTW50fVg=.18b2de9d-75a5-42f2-a0f7-3324c19323ba@github.com> Message-ID: On Wed, 19 Apr 2023 08:14:46 GMT, Tobias Hartmann wrote: >> Eric Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge jdk:master >> >> Change-Id: I63c06b87d5b0c20ddaec0aa43031872b8ebb5362 >> - fix typo >> >> Change-Id: I1b84c4957398178bf234f71242a1cdd044181a79 >> - 8304948: [vectorapi] C2 crashes when expanding VectorBox >> >> This patch fixes C2 failure with SIGSEGV due to endless recursion. >> >> With test case VectorBoxExpandTest.java in this patch, C2 would generate >> IR graph like below: >> >> ``` >> ------------ >> / \ >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> | | >> Region | VectorBox | >> \ | / | >> Phi | >> | | >> |\------------/ >> | >> >> ``` >> >> This Phi will be optimized by merge_through_phi [1], which transforms >> `Phi (VectorBox VectorBox)` into `VectorBox (Phi Phi)` to pursue >> opportunity of combining VectorBox with VectorUnbox. In this process, >> either the pre type check [2] or the process cloning Phi nodes [3], the >> circle case is well considered to avoid falling into endless loop. >> >> After merge_through_phi, each input Phi of new VectorBox has the same >> shape with original root Phi before merging (only VectorBox has been >> replaced). After several other optimizations, C2 would expand VectorBox >> [4] on a graph like below: >> >> ``` >> ------------ >> / \ >> Region | Proj | >> \ | / | >> Phi | >> | | >> | | >> Region | Proj | >> \ | / | >> Phi | >> | | >> |\------------/ >> | >> | Phi >> | / >> VectorBox >> >> ``` >> which the circle case should be taken into consideration as well. >> >> [TEST] >> Full Jtreg passed without new failure. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2557 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2574 >> [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2534 >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L316 >> >> Change-Id: I381b1ba7e0865814d97535e365db6d9d72ef1949 > > Another review would be good. Thanks your kindly review. @TobiHartmann @merykitty @jatin-bhateja ------------- PR Comment: https://git.openjdk.org/jdk/pull/13489#issuecomment-1535559681 From sspitsyn at openjdk.org Fri May 5 00:39:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 00:39:17 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: <_xH4KdRJRcDHNkNtyzIFjdO_IiMqyV-DLwFwDqlX4kA=.e964e7a0-14a1-49c7-bc29-128c0f87d419@github.com> On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing src/hotspot/share/opto/runtime.hpp line 219: > 217: static address register_finalizer_Java() { return _register_finalizer_Java; } > 218: #if INCLUDE_JVMTI > 219: static address notify_jvmti_object_alloc() { return _notify_jvmti_object_alloc; } This line has to be also removed: `312 static const TypeFunc* notify_jvmti_object_alloc_Type();` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13806#discussion_r1185622347 From eliu at openjdk.org Fri May 5 00:38:30 2023 From: eliu at openjdk.org (Eric Liu) Date: Fri, 5 May 2023 00:38:30 GMT Subject: Integrated: 8304948: [vectorapi] C2 crashes when expanding VectorBox In-Reply-To: References: Message-ID: <856L4qRA7-DhVKOUGtK-IytjkxzDB0qrldgLh411QGM=.918fd952-aa97-410a-a259-0c216d5e9829@github.com> On Mon, 17 Apr 2023 08:43:28 GMT, Eric Liu wrote: > This patch fixes C2 failure with SIGSEGV due to endless recursion. > > With test case VectorBoxExpandTest.java in this patch, C2 would generate IR graph like below: > > > ------------ > / \ > Region | VectorBox | > \ | / | > Phi | > | | > | | > Region | VectorBox | > \ | / | > Phi | > | | > |------------/ > | > > > > This Phi will be optimized by merge_through_phi [1], which transforms `Phi (VectorBox VectorBox)` into `VectorBox (Phi Phi)` to pursue opportunity of combining VectorBox with VectorUnbox. In this process, either the pre type check [2] or the process cloning Phi nodes [3], the circle case is well considered to avoid falling into endless loop. > > After merge_through_phi, each input Phi of new VectorBox has the same shape with original root Phi before merging (only VectorBox has been replaced). After several other optimizations, C2 would expand VectorBox [4] on a graph like below: > > > ------------ > / \ > Region | Proj | > \ | / | > Phi | > | | > | | > Region | Proj | > \ | / | > Phi | > | | > |------------/ > | > | Phi > | / > VectorBox > > > which the circle case should be taken into consideration as well. > > [TEST] > Full Jtreg passed without new failure. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2554 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2571 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L2531 > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vector.cpp#L311 This pull request has now been integrated. Changeset: 46df171d Author: Eric Liu URL: https://git.openjdk.org/jdk/commit/46df171d537c0d9cb1df2d7915cc745a7f524557 Stats: 165 lines in 3 files changed: 136 ins; 9 del; 20 mod 8304948: [vectorapi] C2 crashes when expanding VectorBox Reviewed-by: thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/13489 From sspitsyn at openjdk.org Fri May 5 00:43:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 00:43:15 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: <3-xnvJQ9SgsTQAMEko8IEp42n7bMnLXQ-xIuv2aGD_c=.11041bc7-d7ae-4d42-be31-09a9c55b6876@github.com> On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing The `notify_jvmti_object_alloc_Type` declaration needs to be also removed from the runtime.hpp file. Other than that the BACKOUT looks clean. Thanks,. Serguei ------------- PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414075226 From fjiang at openjdk.org Fri May 5 00:53:15 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 5 May 2023 00:53:15 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v7] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 16:15:58 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [ ] Tier2 tests (release) >> - [ ] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > rename rvv_vsetvli to vsetvli_helper Marked as reviewed by fjiang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1414079655 From lmesnik at openjdk.org Fri May 5 01:06:09 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 May 2023 01:06:09 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: removed notify_jvmti_object_alloc_Type line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13806/files - new: https://git.openjdk.org/jdk/pull/13806/files/72e42170..fed4d98a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13806&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13806&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13806.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13806/head:pull/13806 PR: https://git.openjdk.org/jdk/pull/13806 From sspitsyn at openjdk.org Fri May 5 01:25:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 01:25:21 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414092133 From epeter at openjdk.org Fri May 5 03:53:15 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 03:53:15 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph In-Reply-To: References: Message-ID: On Thu, 4 May 2023 10:11:52 GMT, Fei Gao wrote: >> `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. >> The problem is with the basic approach of it, as far as I know. >> Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> **The current approach** >> >> The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. >> >> **Why this does not work** >> >> However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: >> >> https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 >> >> I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. >> >> **Solution** >> >> Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 >> >> We first schedule all memops into a linear order. >> We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). >> In other words: we have a linearization that respects all dependencies that must be respected. >> Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 >> >> Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 >> >> This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. >> We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). >> >> **Discussion** >> >> This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> One potential improvement to my fix: >> We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. >> This is not wrong, but it makes updates to the graph that may be confusing when debugging. >> Further, the re-ordering may have performance impacts. >> I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. >> >> **Testing** >> Github-actions pass. tier1-6 + stress testing passes. >> Performance testing showed no significant performance change. > > src/hotspot/share/opto/superword.cpp line 2768: > >> 2766: Node* n = _block.at(i); // last in pack >> 2767: Node_List* p = my_pack(n); >> 2768: if (p != nullptr && n == p->at(p->size()-1)) { > > Sorry, I don't quite understand why the mem ops in the pack are internally in order. Maybe I missed somewhere you reordered these ops in the same pack using linearized memops_schedule. Could you please point it out for me? Thanks. Thanks for the question. It is what I mentioned in the PR description: > This scheduling has the nice side-effect of simplifying SuperWord::output a little. We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). **Details** We add the pack to `memops_schedule`, in the order of the pack: https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2506-L2518 And then we reorder all memops according to this schedule: https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2617-L2619 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1185676113 From epeter at openjdk.org Fri May 5 04:04:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 04:04:13 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: References: Message-ID: > `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. > The problem is with the basic approach of it, as far as I know. > Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > **The current approach** > > The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. > > **Why this does not work** > > However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: > > https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 > > I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. > > **Solution** > > Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 > > We first schedule all memops into a linear order. > We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). > In other words: we have a linearization that respects all dependencies that must be respected. > Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 > > Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 > > This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. > We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). > > **Discussion** > > This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > One potential improvement to my fix: > We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. > This is not wrong, but it makes updates to the graph that may be confusing when debugging. > Further, the re-ordering may have performance impacts. > I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. > > **Testing** > Github-actions pass. tier1-6 + stress testing passes. > Performance testing showed no significant performance change. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Addressed Fei's review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13354/files - new: https://git.openjdk.org/jdk/pull/13354/files/677400bb..edf80202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13354&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13354&range=00-01 Stats: 9 lines in 2 files changed: 4 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13354/head:pull/13354 PR: https://git.openjdk.org/jdk/pull/13354 From gcao at openjdk.org Fri May 5 06:18:23 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 5 May 2023 06:18:23 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8] In-Reply-To: References: Message-ID: > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8306966 - rename rvv_vsetvli to vsetvli_helper - Fix round mode and optimize widen/narrow vcast - Small refactoring of rvv_vsetvli - Fix VectorCastF2X - During the conversion, specify the number of vectors - Use zr register instead of x0 - 8306966: RISC-V: Support vector cast node for Vector API ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/7efb9dfd..80beb6a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=06-07 Stats: 39048 lines in 799 files changed: 24210 ins; 8877 del; 5961 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From thartmann at openjdk.org Fri May 5 06:48:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 May 2023 06:48:16 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414249585 From epeter at openjdk.org Fri May 5 07:39:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:25 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter wrote: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > **TODO report performance testing (running)** > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. @jatin-bhateja @sviswa7 What do you think about the performance numbers I measured? Do they make sense to you? A few questions: - `long min/max`: Why do we require `avx512vlbwdq` in `Matcher::match_rule_supported_vector`? Would `avx512f` not be sufficient? `C2_MacroAssembler::reduceL` leads me to `vextracti64x4` (that should only require `avx512f`) and `reduce_operation_256` (where `vpminsq` only requires `avx2` via assert). - What do you think about the `double min/max` performance? What do you think could be the reason it is not similar to the behavior of `float min/max`? @jatin-bhateja @vnkozlov @sviswa7 I substantially reworked this RFE, and have it working now, and included your suggestions. The lagorithm now sits in `PhaseIdealLoop::build_and_optimize` after `SuperWord`. It can now handle chains of `UnorderedReduction`, so that it is more robust agains unrolling. The only thing missing for me is: 1. Benchmark on `aarch64`. @fg1417 Would you want to have a look at that? 2. Wait for the performance testing results. src/hotspot/share/opto/superword.cpp line 2670: > 2668: const Type *arith_type = n->bottom_type(); > 2669: vn = ReductionNode::make(opc, nullptr, in1, in2, arith_type->basic_type()); > 2670: if (vn->is_UnorderedReduction()) { @jatin-bhateja I want to check if `vn` is a `UnorderedReduction`, so I want a `bool` answer. If I ask for `isa_UnorderdReduction()`, I would get a `UnorderedReductionNode*`, and `nullptr` if it is not a `UnorderedReduction`. Maybe I did not understand your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1479124966 PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1535844780 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1158362742 From sviswanathan at openjdk.org Fri May 5 07:39:26 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 May 2023 07:39:26 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> On Wed, 22 Mar 2023 08:49:11 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> **TODO report performance testing (running)** >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > @jatin-bhateja @sviswa7 > What do you think about the performance numbers I measured? Do they make sense to you? > > A few questions: > - `long min/max`: Why do we require `avx512vlbwdq` in `Matcher::match_rule_supported_vector`? Would `avx512f` not be sufficient? `C2_MacroAssembler::reduceL` leads me to `vextracti64x4` (that should only require `avx512f`) and `reduce_operation_256` (where `vpminsq` only requires `avx2` via assert). > - What do you think about the `double min/max` performance? What do you think could be the reason it is not similar to the behavior of `float min/max`? @eme64 The double min/max reduction is also affected by [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). With the following patch, I see double min/max reduction happening and good perf gain (> 2x) with your PR: diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp index baf880aac20..04b20904cf4 100644 --- a/src/hotspot/share/opto/superword.cpp +++ b/src/hotspot/share/opto/superword.cpp @@ -3353,7 +3353,8 @@ bool SuperWord::construct_bb() { // First see if we can map the reduction on the given system we are on, then // make a data entry operation for each reduction we see. BasicType bt = use->bottom_type()->basic_type(); - if (ReductionNode::implemented(use->Opcode(), Matcher::min_vector_size(bt), bt)) { + int min_vec_size = Matcher::min_vector_size(bt); + if (ReductionNode::implemented(use->Opcode(), min_vec_size < 2 ? 2 : min_vec_size, bt)) { reduction_uses++; } } Hope this helps. @eme64 For long min/max, currently Math.min(long, long) is not getting intrinsified. Only int/float/double are getting intrinsified. No scalar intrinsification for Math.min(long, long) leads to no MinL scalar node generation and in turn no vectorization and no reduction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1480200297 PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1480330263 From kvn at openjdk.org Fri May 5 07:39:22 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 May 2023 07:39:22 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter wrote: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > **TODO report performance testing (running)** > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. In general it looks good. src/hotspot/share/opto/vectornode.cpp line 1348: > 1346: in1 != nullptr && in1->is_Phi() && in1->in(2) == this && in1->outcnt() == 1 && > 1347: in1->in(0)->is_CountedLoop() && > 1348: in2->is_Vector()) { Should you also check that this reduction node doesn't have users inside loop? test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java line 28: > 26: * @bug 8074981 8302652 > 27: * @summary Test SuperWord Reduction Perf. > 28: * @requires vm.compiler2.enabled This is not enough. Yes, we need to check for C2 presence. But you need skip `arm`, `ppc` and `s390` which have C2. You need second `@requires` as original but you can reduce checks for x86: * @requires vm.simpleArch == "x86" | vm.simpleArch == "x64" | vm.simpleArch == "aarch64" | vm.simpleArch == "riscv64"" test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java line 45: > 43: public static void main(String args[]) { > 44: int iter_warmup = 2_000; > 45: int iter_perf = 5_000; Did you measure total execution time of this test? Warmup 2_000 iterations is too big number I think. You have 8192 iterations already in tested methods. 10 should be enough to trigger compilation. May be add `-Xbatch` if you want to make sure C2 does compile it. If reduction is not supported (MulReduction on RISC-V and ARCH64) the test will be slow. ------------- PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1351296122 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1143957774 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1143929762 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1143941974 From sviswanathan at openjdk.org Fri May 5 07:39:30 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 May 2023 07:39:30 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> References: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> Message-ID: On Wed, 22 Mar 2023 22:17:29 GMT, Sandhya Viswanathan wrote: >> @jatin-bhateja @sviswa7 >> What do you think about the performance numbers I measured? Do they make sense to you? >> >> A few questions: >> - `long min/max`: Why do we require `avx512vlbwdq` in `Matcher::match_rule_supported_vector`? Would `avx512f` not be sufficient? `C2_MacroAssembler::reduceL` leads me to `vextracti64x4` (that should only require `avx512f`) and `reduce_operation_256` (where `vpminsq` only requires `avx2` via assert). >> - What do you think about the `double min/max` performance? What do you think could be the reason it is not similar to the behavior of `float min/max`? > > @eme64 For long min/max, currently Math.min(long, long) is not getting intrinsified. Only int/float/double are getting intrinsified. No scalar intrinsification for Math.min(long, long) leads to no MinL scalar node generation and in turn no vectorization and no reduction. > @sviswa7 thanks for your quick response! > > I can confirm: we do not "intrinsify" (ie turn into `MinL/MaxL`), rather we just inline the `java.lang.Math::Min/Max` methods, implemented with `CmpL` / `If`-branching. Do you think this makes sense, or should we intrinsify, at least when the hardware supports it? @eme64 We should intrinsify MinL/MaxL when the hardware supports it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1481977784 From epeter at openjdk.org Fri May 5 07:39:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:17 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible Message-ID: https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. **Performance results** I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. I disabled `turbo-boost`. Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. Full `avx512` support, including `avx512dq` required for `MulReductionVL`. operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | --------------------------------------------------------------- int add 2063 2085 660 530 415 283 | | int mul 2272 2257 1189 733 908 439 | | int min 2527 2520 2516 2579 2585 2542 | 1 | int max 2548 2525 2551 2516 2515 2517 | 1 | int and 2410 2414 602 480 353 263 | | int or 2149 2151 597 498 354 262 | | int xor 2059 2062 605 476 364 263 | | long add 1776 1790 2000 1000 1683 591 | 2 | long mul 2135 2199 2185 2001 2176 1307 | 2 | long min 1439 1424 1421 1420 1430 1427 | 3 | long max 2299 2287 2303 2305 1433 1425 | 3 | long and 1657 1667 2015 1003 1679 568 | 4 | long or 1776 1783 2032 1009 1680 569 | 4 | long xor 1834 1783 2012 1024 1679 570 | 4 | float add 2779 2644 2633 2648 2632 2639 | 5 | float mul 2779 2871 2810 2776 2732 2791 | 5 | float min 2294 2620 1725 1286 872 672 | | float max 2371 2519 1697 1265 841 468 | | double add 2634 2636 2635 2650 2635 2648 | 5 | double mul 3053 2955 2881 3030 2979 2927 | 5 | double min 2364 2400 2439 2399 2486 2398 | 6 | double max 2488 2459 2501 2451 2493 2498 | 6 | Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. The lines without note show clear speedup as expected. Notes: 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). **Testing** I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. Passes up to tier5 and stress-testing. **TODO report performance testing (running)** **TODO** can someone benchmark on `aarch64`? **Discussion** We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). So far, I did not work on `byte, char, short`, we can investigate this in the future. FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. ------------- Commit messages: - Merge branch 'master' into JDK-8302652 - small bug fix - generalized the algorithm to handle a chain of UnorderedReductions - Moved move_unordered_reduction_out_of_loop from SuperWord to PhaseIdealLoop - neutral -> identity element - Moved code from Ideal to SuperWord - Vladimir's suggestions for ReductionPerf.java - pushed updated ReductionPerf.java - added IR rules to validate reduced use of Reduce node - fix for and, or, xor, min, max - ... and 2 more: https://git.openjdk.org/jdk/compare/ccf91f88...cc9e7e8e Changes: https://git.openjdk.org/jdk/pull/13056/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302652 Stats: 831 lines in 14 files changed: 627 ins; 27 del; 177 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From jbhateja at openjdk.org Fri May 5 07:39:33 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 May 2023 07:39:33 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter wrote: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > **TODO report performance testing (running)** > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. src/hotspot/share/opto/superword.cpp line 2890: > 2888: UnorderedReductionNode* ur = unordered_reductions.at(i)->as_UnorderedReduction(); > 2889: move_unordered_reduction_out_of_loop(ur); > 2890: } Hi @eme64 , if we move this processing post SLP to a stand alone pass, we can also handler vector IR created through VectorAPI. src/hotspot/share/opto/superword.cpp line 2967: > 2965: const Type* bt_t = Type::get_const_basic_type(bt); > 2966: > 2967: // Create vector of neutral elements (zero for add, one for mul, etc) A minor nomenclature fix, we can use name identity_scalar instead of neutral_scalar, 0 is an additive identity, 1 is a multiplicative identity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1152866650 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1156993568 From epeter at openjdk.org Fri May 5 07:39:35 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:35 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 07:55:28 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/superword.cpp line 2890: >> >>> 2888: UnorderedReductionNode* ur = unordered_reductions.at(i)->as_UnorderedReduction(); >>> 2889: move_unordered_reduction_out_of_loop(ur); >>> 2890: } >> >> Hi @eme64 , if we move this processing post SLP to a stand alone pass, we can also handler vector IR created through VectorAPI. > > We can also relax following limitation with your patch since loop body will now comprise of lane wise vector operations with reduction moved out of loop it may allow vectorizing patterns like res += a[i]; which is composed of single load and reduction operation, unrolling will create multiple vector operations within loop may improve performance. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L2265 @jatin-bhateja > if we move this processing post SLP to a stand alone pass, we can also handler vector IR created through VectorAPI. Where exactly would you put it? We need a location during LoopOpts, so that we have the ctrl information. I previously suggested in `split_if`, but @vnkozlov seemed not very excited. Additionally, I have not seen any case where VectorAPI could make use of it. I gave it a quick look, so maybe you can find something. Maybe in the long run, we should have a node-by-node pass during loop-opts, and allow all sorts of peep-hole optimizations that require ctrl/idom information. We already have a number of non-split-if optimizations that have snuck into the `split_if` code. Maybe a refactoring would be a good idea there. What do you think? And about: https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 Yes, that is the hope, that we could allow things like that to vectorize. The question is if we can guarantee that my new optimization will happen. But probably it is ok to be a bit optimistic here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1153134345 From jbhateja at openjdk.org Fri May 5 07:39:35 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 May 2023 07:39:35 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 11:41:48 GMT, Emanuel Peter wrote: > Where exactly would you put it? We need a location during LoopOpts, so that we have the ctrl information. I previously suggested in `split_if`, but @vnkozlov seemed not very excited. Additionally, I have not seen any case where VectorAPI could make use of it. I gave it a quick look, so maybe you can find something. May I know the penalty which you see if we do this as a separate pass towards the end of _PhaseIdealLoop::build_and_optimize_, where we can iterate over __ltree_root_ and for each counted loop marked as a vector loop we can do this processing for all the reduction nodes part of loop body. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1157004962 From epeter at openjdk.org Fri May 5 07:39:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:28 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> References: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> Message-ID: On Wed, 22 Mar 2023 22:17:29 GMT, Sandhya Viswanathan wrote: >> @jatin-bhateja @sviswa7 >> What do you think about the performance numbers I measured? Do they make sense to you? >> >> A few questions: >> - `long min/max`: Why do we require `avx512vlbwdq` in `Matcher::match_rule_supported_vector`? Would `avx512f` not be sufficient? `C2_MacroAssembler::reduceL` leads me to `vextracti64x4` (that should only require `avx512f`) and `reduce_operation_256` (where `vpminsq` only requires `avx2` via assert). >> - What do you think about the `double min/max` performance? What do you think could be the reason it is not similar to the behavior of `float min/max`? > > @eme64 For long min/max, currently Math.min(long, long) is not getting intrinsified. Only int/float/double are getting intrinsified. No scalar intrinsification for Math.min(long, long) leads to no MinL scalar node generation and in turn no vectorization and no reduction. @sviswa7 thanks for your quick response! I can confirm: we do not "intrinsify" (ie turn into `MinL/MaxL`), rather we just inline the `java.lang.Math::Min/Max` methods, implemented with `CmpL` / `If`-branching. Do you think this makes sense, or should we intrinsify, at least when the hardware supports it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1480770656 From jbhateja at openjdk.org Fri May 5 07:39:35 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 May 2023 07:39:35 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Tue, 4 Apr 2023 09:49:28 GMT, Jatin Bhateja wrote: >> @jatin-bhateja >> >>> if we move this processing post SLP to a stand alone pass, we can also handler vector IR created through VectorAPI. >> >> Where exactly would you put it? We need a location during LoopOpts, so that we have the ctrl information. I previously suggested in `split_if`, but @vnkozlov seemed not very excited. Additionally, I have not seen any case where VectorAPI could make use of it. I gave it a quick look, so maybe you can find something. >> >> Maybe in the long run, we should have a node-by-node pass during loop-opts, and allow all sorts of peep-hole optimizations that require ctrl/idom information. We already have a number of non-split-if optimizations that have snuck into the `split_if` code. Maybe a refactoring would be a good idea there. What do you think? >> >> And about: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> >> Yes, that is the hope, that we could allow things like that to vectorize. The question is if we can guarantee that my new optimization will happen. But probably it is ok to be a bit optimistic here. > >> Where exactly would you put it? We need a location during LoopOpts, so that we have the ctrl information. I previously suggested in `split_if`, but @vnkozlov seemed not very excited. Additionally, I have not seen any case where VectorAPI could make use of it. I gave it a quick look, so maybe you can find something. > > May I know the penalty which you see if we do this as a separate pass towards the end of _PhaseIdealLoop::build_and_optimize_, where we can iterate over __ltree_root_ and for each counted loop marked as a vector loop we can do this processing for all the reduction nodes part of loop body. There is also an opportunity to support reduction involving non-commutative bytecodes like isub and lsub, but it may need explicit backend support and can be taken up separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1157021127 From sviswanathan at openjdk.org Fri May 5 07:39:31 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 May 2023 07:39:31 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: <7Fop4qaVfok0Jg2x3yqUto2shQQ7U9Bmd6mCGiGAGWc=.d61ca8ef-33d9-4687-89e0-649d744faef5@github.com> Message-ID: On Thu, 23 Mar 2023 08:25:30 GMT, Emanuel Peter wrote: >> @eme64 For long min/max, currently Math.min(long, long) is not getting intrinsified. Only int/float/double are getting intrinsified. No scalar intrinsification for Math.min(long, long) leads to no MinL scalar node generation and in turn no vectorization and no reduction. > > @sviswa7 thanks for your quick response! > > I can confirm: we do not "intrinsify" (ie turn into `MinL/MaxL`), rather we just inline the `java.lang.Math::Min/Max` methods, implemented with `CmpL` / `If`-branching. Do you think this makes sense, or should we intrinsify, at least when the hardware supports it? @eme64 The MinI doesn't vectorize due to the rewrite as right-spline graph in MinINode::Ideal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1482015255 From jbhateja at openjdk.org Fri May 5 07:39:34 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 May 2023 07:39:34 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 30 Mar 2023 07:49:43 GMT, Jatin Bhateja wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> **TODO report performance testing (running)** >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > src/hotspot/share/opto/superword.cpp line 2890: > >> 2888: UnorderedReductionNode* ur = unordered_reductions.at(i)->as_UnorderedReduction(); >> 2889: move_unordered_reduction_out_of_loop(ur); >> 2890: } > > Hi @eme64 , if we move this processing post SLP to a stand alone pass, we can also handler vector IR created through VectorAPI. We can also relax following limitation with your patch since loop body will now comprise of lane wise vector operations with reduction moved out of loop it may allow vectorizing patterns like res += a[i]; which is composed of single load and reduction operation, unrolling will create multiple vector operations within loop may improve performance. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/superword.cpp#L2265 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1152872919 From epeter at openjdk.org Fri May 5 07:39:36 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:36 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: <6GkZlWGbed85KSXCba7yw6hW5v-N1vDtCn-vCFseOrA=.803e637f-6971-4f62-9da9-d414178d7a4f@github.com> On Tue, 4 Apr 2023 09:39:42 GMT, Jatin Bhateja wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> **TODO report performance testing (running)** >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > src/hotspot/share/opto/superword.cpp line 2967: > >> 2965: const Type* bt_t = Type::get_const_basic_type(bt); >> 2966: >> 2967: // Create vector of neutral elements (zero for add, one for mul, etc) > > A minor nomenclature fix, we can use name identity_scalar instead of neutral_scalar, 0 is an additive identity, 1 is a multiplicative identity. @jatin-bhateja Ok, "neutral element" and "identity element" seem to be synonyms. I'll change it to "identity", since that is what we seem to use in the code already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1158371364 From epeter at openjdk.org Fri May 5 07:39:37 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:37 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 20:31:21 GMT, Vladimir Kozlov wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> **TODO report performance testing (running)** >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > src/hotspot/share/opto/vectornode.cpp line 1348: > >> 1346: in1 != nullptr && in1->is_Phi() && in1->in(2) == this && in1->outcnt() == 1 && >> 1347: in1->in(0)->is_CountedLoop() && >> 1348: in2->is_Vector()) { > > Should you also check that this reduction node doesn't have users inside loop? @vnkozlov How should I do that? Can that even be done during IGVN? Or should I move the implementation to loopopts? > test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java line 45: > >> 43: public static void main(String args[]) { >> 44: int iter_warmup = 2_000; >> 45: int iter_perf = 5_000; > > Did you measure total execution time of this test? > Warmup 2_000 iterations is too big number I think. You have 8192 iterations already in tested methods. 10 should be enough to trigger compilation. May be add `-Xbatch` if you want to make sure C2 does compile it. > If reduction is not supported (MulReduction on RISC-V and ARCH64) the test will be slow. I reduced the iteration count to `100` and `1000`. For performance measurement they can be increased. I also added `-Xbatch`. On my laptop, the test now definately runs in less than 2 seconds, with or without SuperWord. So even platforms that do not support a feature, or SuperWord as a whole could run it decently fast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1144435233 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1144374526 From epeter at openjdk.org Fri May 5 07:39:38 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 07:39:38 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 09:03:28 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 1348: >> >>> 1346: in1 != nullptr && in1->is_Phi() && in1->in(2) == this && in1->outcnt() == 1 && >>> 1347: in1->in(0)->is_CountedLoop() && >>> 1348: in2->is_Vector()) { >> >> Should you also check that this reduction node doesn't have users inside loop? > > @vnkozlov How should I do that? Can that even be done during IGVN? Or should I move the implementation to loopopts? For example, I could put it before or after `try_sink_out_of_loop` inside `PhaseIdealLoop::split_if_with_blocks_post`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1144654995 From kvn at openjdk.org Fri May 5 07:39:38 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 May 2023 07:39:38 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 11:29:09 GMT, Emanuel Peter wrote: >> @vnkozlov How should I do that? Can that even be done during IGVN? Or should I move the implementation to loopopts? > > For example, I could put it before or after `try_sink_out_of_loop` inside `PhaseIdealLoop::split_if_with_blocks_post`. As we discussed offline, you may check and mark Reduction node if it has users (other than Phi) inside loop in SuperWord code where we are creating vector referenced by Reduction node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1145240288 From fgao at openjdk.org Fri May 5 07:59:19 2023 From: fgao at openjdk.org (Fei Gao) Date: Fri, 5 May 2023 07:59:19 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 03:50:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2768: >> >>> 2766: Node* n = _block.at(i); // last in pack >>> 2767: Node_List* p = my_pack(n); >>> 2768: if (p != nullptr && n == p->at(p->size()-1)) { >> >> Sorry, I don't quite understand why the mem ops in the pack are internally in order. Maybe I missed somewhere you reordered these ops in the same pack using linearized memops_schedule. Could you please point it out for me? Thanks. > > Thanks for the question. It is what I mentioned in the PR description: > >> This scheduling has the nice side-effect of simplifying SuperWord::output a little. > We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). > > **Details** > We add the pack to `memops_schedule`, in the order of the pack: > https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2506-L2518 > > And then we reorder all memops according to this schedule: > https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2617-L2619 Hi @eme64 thanks for your reply. Since all these nodes in a pack are executed at the same time, we don't really care about the first or last one, i.e., position in the pack. What we really care about is if we can get the right memory input from the first or last one for the pack, correctly connecting to other mem ops in the loop body. Did I get that right? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1185801012 From roland at openjdk.org Fri May 5 08:57:25 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 May 2023 08:57:25 GMT Subject: RFR: 8307131: C2: assert(false) failed: malformed control flow Message-ID: The IR graph has a loop nest with 2 loops and 2 safepoints. Both safepoints are in the inner loop. One is on the backedge of the inner loop. The inner loop is transformed into a counted loop and that safepoint is removed. The other safepoint is right above the inner loop's exit condition. The outer strip mined loop is constructed and the safepoint is moved to the outer strip mined loop eventhough that safepoint is marked as non deleteable. The inner loop is later on removed, the outer strip mined loop is too, so is the safepoint. What was the outer loop of the 2 loop nest becomes an infinite loop without a safepoint and is considered dead code which in turn causes the assert to fire. The fix I propose is to only build the strip mined loop if the safepoint that's moved to the outer strip mined loop is deleteable. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/13826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307131 Stats: 58 lines in 2 files changed: 57 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13826/head:pull/13826 PR: https://git.openjdk.org/jdk/pull/13826 From epeter at openjdk.org Fri May 5 09:00:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 May 2023 09:00:02 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 07:56:29 GMT, Fei Gao wrote: >> Thanks for the question. It is what I mentioned in the PR description: >> >>> This scheduling has the nice side-effect of simplifying SuperWord::output a little. >> We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). >> >> **Details** >> We add the pack to `memops_schedule`, in the order of the pack: >> https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2506-L2518 >> >> And then we reorder all memops according to this schedule: >> https://github.com/openjdk/jdk/blob/677400bbcd1921b280a63de2ce60aefa1c835241/src/hotspot/share/opto/superword.cpp#L2617-L2619 > > Hi @eme64 thanks for your reply. Since all these nodes in a pack are executed at the same time, we don't really care about the first or last one, i.e., position in the pack. What we really care about is if we can get the right memory input from the first or last one for the pack, correctly connecting to other mem ops in the loop body. Did I get that right? Thanks. @fg1417 Yes, that is the reason. This was like that before and after my patch. I did not want to change too much here. I wanted to avoid refactoring the whole of `SuperWord::output`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1185854548 From shade at openjdk.org Fri May 5 09:19:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 May 2023 09:19:15 GMT Subject: RFR: 8307527: MacOS Zero builds fail with undefined FFI_GO_CLOSURES after JDK-8304265 Message-ID: See the bug. Actually, I am not sure why JDK-8304265 changed the `#ifndef FFI_GO_CLOSURES` to `#ifdef _APPLE_`. That seems too intrusive if `FFI_GO_CLOSURES` *is* enabled. So I rewrote the block to something more safe. Additional testing: - [x] macos-aarch64-zero-fastdebug `make images` passes ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/13827/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13827&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307527 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13827/head:pull/13827 PR: https://git.openjdk.org/jdk/pull/13827 From duke at openjdk.org Fri May 5 09:25:42 2023 From: duke at openjdk.org (Chang Peng) Date: Fri, 5 May 2023 09:25:42 GMT Subject: RFR: 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE [v4] In-Reply-To: References: Message-ID: > We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. > > The following instruction sequence > > > movi v17.16b, #12 > cmpgt p0.b, p7/z, z16.b, z17.b > > > can be optimized to: > > > cmpgt p0.b, p7/z, z16.b, #12 > > > This patch does the following: > 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. > SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to > 127)or 5bit signed integer immediate (range from -16 to 15). > > 2. Add optimized match rules to generate the compare-with-immediate instructions. > > [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into add_sve_cmpU - Refactor some code - Merge branch 'openjdk:master' into add_sve_cmpU - 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. The following instruction sequence ``` movi v17.16b, #12 cmpgt p0.b, p7/z, z16.b, z17.b ``` can be optimized to: ``` cmpgt p0.b, p7/z, z16.b, #12 ``` This patch does the following: 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to 127) or 5bit signed integer immediate (range from -16 to 15). 2. Add optimized match rules to generate the compare-with-immediate instructions. [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- TEST_LABEL: v1 || n2, aarch64&&ubuntu&&conformance-enabled JDK_SCOPE: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector/ Jira: ENTLLT-5294 Change-Id: I6b915864308faf9a8ec6e35ca1b4948666d75dca ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13200/files - new: https://git.openjdk.org/jdk/pull/13200/files/d9d861ea..2bcd41ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13200&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13200&range=02-03 Stats: 69484 lines in 1244 files changed: 47473 ins; 13552 del; 8459 mod Patch: https://git.openjdk.org/jdk/pull/13200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13200/head:pull/13200 PR: https://git.openjdk.org/jdk/pull/13200 From duke at openjdk.org Fri May 5 09:39:25 2023 From: duke at openjdk.org (Chang Peng) Date: Fri, 5 May 2023 09:39:25 GMT Subject: RFR: 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE [v5] In-Reply-To: References: Message-ID: <8IbGP73SEE8FZ5foX9_1i5cZi83pgVhQNwSvsCMzxv8=.efd624bb-97b1-431c-a046-3c59a4d7fc86@github.com> > We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. > > The following instruction sequence > > > movi v17.16b, #12 > cmpgt p0.b, p7/z, z16.b, z17.b > > > can be optimized to: > > > cmpgt p0.b, p7/z, z16.b, #12 > > > This patch does the following: > 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. > SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to > 127)or 5bit signed integer immediate (range from -16 to 15). > > 2. Add optimized match rules to generate the compare-with-immediate instructions. > > [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Merge match rules ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13200/files - new: https://git.openjdk.org/jdk/pull/13200/files/2bcd41ed..14905623 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13200&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13200&range=03-04 Stats: 136 lines in 4 files changed: 78 ins; 26 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/13200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13200/head:pull/13200 PR: https://git.openjdk.org/jdk/pull/13200 From fgao at openjdk.org Fri May 5 10:17:15 2023 From: fgao at openjdk.org (Fei Gao) Date: Fri, 5 May 2023 10:17:15 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Fri, 5 May 2023 07:33:03 GMT, Emanuel Peter wrote: > The only thing missing for me is: > > 1. Benchmark on `aarch64`. @fg1417 Would you want to have a look at that? > 2. Wait for the performance testing results. Nice Optimization! Sure, I'll test the benchmark on aarch64 machines. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1536038577 From qamai at openjdk.org Fri May 5 12:21:21 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:21:21 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 01:45:37 GMT, Xiaohong Gong wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 466: > >> 464: @IR(counts = {IRNode.VECTOR_SLICE, "17"}) >> 465: static void testB128(byte[][] dst, byte[] src1, byte[] src2) { >> 466: var species = ByteVector.SPECIES_128; > > Suggest to define the species as a "`private static final`" field of this test class. It may make the intrinsification fail if the species is not a constant to the compiler. This local is final and is loaded from a `static final` field so it should be equivalent to referring to `ByteVector.SPECIES_128` directly ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186024843 From qamai at openjdk.org Fri May 5 12:25:19 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:25:19 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 19:03:21 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ShortVector.java line 2295: > >> 2293: // to be performant >> 2294: @ForceInline >> 2295: public ShortVector apply(ShortVector v1, ShortVector v2, int o) { > > Have you considered matching the corresponding IR during GVN to produce VectorSlice nodes rather than going through VM intrinsic? I have thought about this but it will require C2 to track the values of individual elements in a vector and constant fold vector loads from stable fields, both of which are not available as of right now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186028378 From jvernee at openjdk.org Fri May 5 12:26:12 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 5 May 2023 12:26:12 GMT Subject: RFR: 8307527: MacOS Zero builds fail with undefined FFI_GO_CLOSURES after JDK-8304265 In-Reply-To: References: Message-ID: On Fri, 5 May 2023 09:11:40 GMT, Aleksey Shipilev wrote: > See the bug. Actually, I am not sure why JDK-8304265 changed the `#ifndef FFI_GO_CLOSURES` to `#ifdef _APPLE_`. That seems too intrusive if `FFI_GO_CLOSURES` *is* enabled. So I rewrote the block to something more safe. > > Additional testing: > - [x] macos-aarch64-zero-fastdebug `make images` passes Hi Aleksey. The original change came in from: https://github.com/openjdk/panama-foreign/pull/770 with the motivation: > Finally, I had to slightly modify globalDefinitions_zero.hpp to conditionally define FFI_GO_CLOSURES, since I was getting a redefinition error using GCC. The macro is also defined in ffitarget.h which is included by ffi.h. The define in globalDefinitions comes from this PR: https://github.com/openjdk/jdk/pull/8195 which indicates that the define is only needed on Mac Os X. So I switched out the guard to check for __APPLE__ instead. (the check whether it is already defined doesn't really do anything, since FFI_GO_CLOSURES is defined by including ffi.h). I kept the check, but I'm still not sure how this define is supposed to work, since as far as I can tell `ffitarget.h` will define `FFI_GO_CLOSURES` (which is included by `ffi.h`). I guess the `#ifndef` is needed in case it is defined on the command line? I'll run this patch through our CI as well. P.S. hmm, looks like the aarch64 ffitarget.h doesn't define `FFI_GO_CLOSURES`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13827#issuecomment-1536178625 From aph at openjdk.org Fri May 5 12:31:24 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 May 2023 12:31:24 GMT Subject: RFR: 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE [v5] In-Reply-To: <8IbGP73SEE8FZ5foX9_1i5cZi83pgVhQNwSvsCMzxv8=.efd624bb-97b1-431c-a046-3c59a4d7fc86@github.com> References: <8IbGP73SEE8FZ5foX9_1i5cZi83pgVhQNwSvsCMzxv8=.efd624bb-97b1-431c-a046-3c59a4d7fc86@github.com> Message-ID: On Fri, 5 May 2023 09:39:25 GMT, Chang Peng wrote: >> We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. >> >> The following instruction sequence >> >> >> movi v17.16b, #12 >> cmpgt p0.b, p7/z, z16.b, z17.b >> >> >> can be optimized to: >> >> >> cmpgt p0.b, p7/z, z16.b, #12 >> >> >> This patch does the following: >> 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. >> SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to >> 127)or 5bit signed integer immediate (range from -16 to 15). >> >> 2. Add optimized match rules to generate the compare-with-immediate instructions. >> >> [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Merge match rules OK! Sorry for the delay. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13200#pullrequestreview-1414693746 From qamai at openjdk.org Fri May 5 12:31:26 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:31:26 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 11:59:30 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1914: >> >>> 1912: if (vector_klass->const_oop() == NULL || elem_klass->const_oop() == NULL || >>> 1913: !vlen->is_con() || !origin_type->is_con()) { >>> 1914: if (C->print_intrinsics()) { >> >> Hi @merykitty , your inline expander is not handling non-constant origin case, this will introduce performance regressions w.r.t to existing implementation. > > You can extend expander to generate IR corresponding to fallback implementation to handle non-constant origin case. Yes it seems that `ForceInline` is not respected if intrinsification fails, which results in regressions. I will try to look at both approaches, I kind of like falling back to Java code more since it is cleaner and avoids duplication between Hotspot intrinsic kit and Java implementation, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186033536 From duke at openjdk.org Fri May 5 13:15:29 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 5 May 2023 13:15:29 GMT Subject: Integrated: 8305080: Suppress the 'removal' warning for finalize() from test/hotspot/jtreg/compiler/jvmci/common/testcases that used in compiler/jvmci/compilerToVM/ tests In-Reply-To: References: Message-ID: <0pDKnT0zkpOXf2usbfVAJfSIG3FKgZC5utnni9cIHtQ=.a1c1adcf-0bfd-4cc9-b739-ac7cce65d527@github.com> On Tue, 11 Apr 2023 07:58:35 GMT, Afshin Zafari wrote: > The finalize() methods are removed and replaced by Cleaner callbacks. > > Note: > `test/hotspot/jtreg/compiler/jvmci/compilerToVM/HasFinalizableSubclassTest.java` may be removed since there is no need to test if finalize() exists in the subclasses or not.. This pull request has now been integrated. Changeset: 1a1ce66d Author: Afshin Zafari Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/1a1ce66dc9976b8f44de613e81e557a8ae698135 Stats: 20 lines in 9 files changed: 8 ins; 0 del; 12 mod 8305080: Suppress the 'removal' warning for finalize() from test/hotspot/jtreg/compiler/jvmci/common/testcases that used in compiler/jvmci/compilerToVM/ tests Reviewed-by: dnsimon, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/13419 From jvernee at openjdk.org Fri May 5 13:26:16 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 5 May 2023 13:26:16 GMT Subject: RFR: 8307527: MacOS Zero builds fail with undefined FFI_GO_CLOSURES after JDK-8304265 In-Reply-To: References: Message-ID: On Fri, 5 May 2023 09:11:40 GMT, Aleksey Shipilev wrote: > See the bug. Actually, I am not sure why JDK-8304265 changed the `#ifndef FFI_GO_CLOSURES` to `#ifdef _APPLE_`. That seems too intrusive if `FFI_GO_CLOSURES` *is* enabled. So I rewrote the block to something more safe. > > Additional testing: > - [x] macos-aarch64-zero-fastdebug `make images` passes After thinking a bit, I think I'd prefer this to be addressed in the build system instead. e.g. something like: diff --git a/make/autoconf/lib-ffi.m4 b/make/autoconf/lib-ffi.m4 index 0905c3cd225..83de5a4abf7 100644 --- a/make/autoconf/lib-ffi.m4 +++ b/make/autoconf/lib-ffi.m4 @@ -106,6 +106,13 @@ AC_DEFUN_ONCE([LIB_SETUP_LIBFFI], AC_MSG_ERROR([Could not find libffi! $HELP_MSG]) fi + if test "x${OPENJDK_TARGET_CPU}" = "xaarch64" && test "x${OPENJDK_TARGET_OS}" = xmacosx; then + # ffi.h checks '#if FFI_GO_CLOSURES' which throws a warning in xcode on aarch64 because the aarch64 + # ffitarget.h (included from ffi.h) doesn't explicitly define FFI_GO_CLOSURES (like it does on e.g. x64). + # define it explicitly here to avoid compilation errors + LIBFFI_CFLAGS="$LIBFFI_CFLAGS -DFFI_GO_CLOSURES=0" + fi + AC_MSG_CHECKING([if libffi works]) AC_LANG_PUSH(C) OLD_CFLAGS="$CFLAGS" And then remove the workaround from the source code. (`LIBFFI_CFLAGS` is used to build both relevant libraries, and should also be used when a new library is added that needs libffi. So this would avoid a repeat of this issue) Either way, let's thoroughly document the issue this time around, so future editors won't have to guess why this is needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13827#issuecomment-1536257130 From ysuenaga at openjdk.org Fri May 5 14:27:25 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 5 May 2023 14:27:25 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: References: Message-ID: > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Introduce os::free_memory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13398/files - new: https://git.openjdk.org/jdk/pull/13398/files/6e10eb9c..5bcac06b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=00-01 Stats: 58 lines in 10 files changed: 45 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13398/head:pull/13398 PR: https://git.openjdk.org/jdk/pull/13398 From kvn at openjdk.org Fri May 5 15:47:26 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 May 2023 15:47:26 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: <6tYsgjIL9o6s6POMCWYayzjkjAmgCUo5wiF1G8nGUj0=.2f9b129c-5813-4a88-9afe-470927f08f94@github.com> On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Agree. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1415035282 From kvn at openjdk.org Fri May 5 15:50:18 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 May 2023 15:50:18 GMT Subject: RFR: 8307131: C2: assert(false) failed: malformed control flow In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:48:06 GMT, Roland Westrelin wrote: > The IR graph has a loop nest with 2 loops and 2 safepoints. Both > safepoints are in the inner loop. One is on the backedge of the inner > loop. The inner loop is transformed into a counted loop and that > safepoint is removed. The other safepoint is right above the inner > loop's exit condition. The outer strip mined loop is constructed and > the safepoint is moved to the outer strip mined loop eventhough that > safepoint is marked as non deleteable. The inner loop is later on > removed, the outer strip mined loop is too, so is the safepoint. What > was the outer loop of the 2 loop nest becomes an infinite loop without > a safepoint and is considered dead code which in turn causes the > assert to fire. > > The fix I propose is to only build the strip mined loop if the > safepoint that's moved to the outer strip mined loop is deleteable. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13826#pullrequestreview-1415039959 From lmesnik at openjdk.org Fri May 5 19:02:26 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 May 2023 19:02:26 GMT Subject: Integrated: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing This pull request has now been integrated. Changeset: e2b1013f Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/e2b1013f11fc605501c3bf77976facb9b870d28e Stats: 73 lines in 11 files changed: 5 ins; 64 del; 4 mod 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Reviewed-by: sspitsyn, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13806 From ysuenaga at openjdk.org Fri May 5 23:30:24 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 5 May 2023 23:30:24 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 13:12:51 GMT, Thomas Stuefe wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > We could also just bypass the compiler thread creation question for now. Let the compiler continue to use the old metric when calculating its thread count, but let all other users of os::available_memory() the new one. @tstuefe @robcasloz I updated this PR to implement both `free_memory` and `available_memory`. In Linux, `free_memory` refers MemFree (equivalent with older `available_memory`), and `available_memory` refers MemAvailable. In other platforms, `free_memory` proxies `available_memory`. And also `CompileBroker` uses `free_memory` rather than `available_memory`. Some GHA checks were failed, but I think they are not caused by this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13398#issuecomment-1536901248 From fgao at openjdk.org Sat May 6 01:40:15 2023 From: fgao at openjdk.org (Fei Gao) Date: Sat, 6 May 2023 01:40:15 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: References: Message-ID: <69Et_PaFnKjKYm_ILdMNs-eIooTG6eCNNgy4mxYax3w=.695a58e6-664b-4403-b2c0-1d143fa60a2a@github.com> On Fri, 5 May 2023 04:04:13 GMT, Emanuel Peter wrote: >> `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. >> The problem is with the basic approach of it, as far as I know. >> Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> **The current approach** >> >> The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. >> >> **Why this does not work** >> >> However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: >> >> https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 >> >> I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. >> >> **Solution** >> >> Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 >> >> We first schedule all memops into a linear order. >> We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). >> In other words: we have a linearization that respects all dependencies that must be respected. >> Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 >> >> Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. >> >> https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 >> >> This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. >> We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). >> >> **Discussion** >> >> This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). >> >> One potential improvement to my fix: >> We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. >> This is not wrong, but it makes updates to the graph that may be confusing when debugging. >> Further, the re-ordering may have performance impacts. >> I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. >> >> **Testing** >> Github-actions pass. tier1-6 + stress testing passes. >> Performance testing showed no significant performance change. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Addressed Fei's review suggestions Tier 1-3 on aarch64 machines passed. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/13354#pullrequestreview-1415615485 From fgao at openjdk.org Sat May 6 01:40:16 2023 From: fgao at openjdk.org (Fei Gao) Date: Sat, 6 May 2023 01:40:16 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:56:28 GMT, Emanuel Peter wrote: >> Hi @eme64 thanks for your reply. Since all these nodes in a pack are executed at the same time, we don't really care about the first or last one, i.e., position in the pack. What we really care about is if we can get the right memory input from the first or last one for the pack, correctly connecting to other mem ops in the loop body. Did I get that right? Thanks. > > @fg1417 Yes, that is the reason. This was like that before and after my patch. > I did not want to change too much here. I wanted to avoid refactoring the whole of `SuperWord::output`. Make sense to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13354#discussion_r1186579455 From fyang at openjdk.org Sat May 6 03:08:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 6 May 2023 03:08:18 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 06:18:23 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8306966 > - rename rvv_vsetvli to vsetvli_helper > - Fix round mode and optimize widen/narrow vcast > - Small refactoring of rvv_vsetvli > - Fix VectorCastF2X > - During the conversion, specify the number of vectors > - Use zr register instead of x0 > - 8306966: RISC-V: Support vector cast node for Vector API Thanks for the update. Would you mind a few more tweaks? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1823: > 1821: // High part of dst vector will be filled with zero. > 1822: void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length, > 1823: VectorRegister src, BasicType src_bt, VectorRegister tmp) { If you allocate different vector registers for 'dst' and 'src' on the callsite, then we should be able to eliminate the 'tmp' register parameter for this function. That is saving the intermediate result in 'dst' instead. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp line 239: > 237: VectorRegister src, BasicType src_bt, VectorRegister tmp); > 238: > 239: void vfcvt_rtz_xu_f_v_safe(VectorRegister dst, VectorRegister src); I don't think we need the unsigned version. Could you please remove them? ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1415641580 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186604752 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186605732 From duke at openjdk.org Sat May 6 03:14:13 2023 From: duke at openjdk.org (Chang Peng) Date: Sat, 6 May 2023 03:14:13 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:22:16 GMT, Quan Anh Mai wrote: > Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. I think intoArray() may have heavier cost. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1537028801 From qamai at openjdk.org Sat May 6 05:41:17 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 6 May 2023 05:41:17 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 03:11:50 GMT, Chang Peng wrote: >> Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. > >> Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. > > I think intoArray() may have heavier cost. @changpeng1997 Yes that's right, I think ideally we can make `Blackhole` receive a `VectorNode`, but this patch is good as it is ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1537057615 From qamai at openjdk.org Sat May 6 05:41:16 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 6 May 2023 05:41:16 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: <8YEG5Ln240AMXcb8eBep_wSQo7gexYo59C0DdGQm6Vk=.4d2cbf71-26dc-48f5-b936-aa4d6ce291f8@github.com> On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13851#pullrequestreview-1415700121 From duke at openjdk.org Sat May 6 05:59:15 2023 From: duke at openjdk.org (Chang Peng) Date: Sat, 6 May 2023 05:59:15 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 03:11:50 GMT, Chang Peng wrote: >> Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. > >> Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. > > I think intoArray() may have heavier cost. > @changpeng1997 Yes that's right, I think ideally we can make `Blackhole` receive a `VectorNode`, but this patch is good as it is Thanks for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1537060606 From eliu at openjdk.org Sat May 6 06:04:16 2023 From: eliu at openjdk.org (Eric Liu) Date: Sat, 6 May 2023 06:04:16 GMT Subject: RFR: 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE [v5] In-Reply-To: <8IbGP73SEE8FZ5foX9_1i5cZi83pgVhQNwSvsCMzxv8=.efd624bb-97b1-431c-a046-3c59a4d7fc86@github.com> References: <8IbGP73SEE8FZ5foX9_1i5cZi83pgVhQNwSvsCMzxv8=.efd624bb-97b1-431c-a046-3c59a4d7fc86@github.com> Message-ID: On Fri, 5 May 2023 09:39:25 GMT, Chang Peng wrote: >> We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. >> >> The following instruction sequence >> >> >> movi v17.16b, #12 >> cmpgt p0.b, p7/z, z16.b, z17.b >> >> >> can be optimized to: >> >> >> cmpgt p0.b, p7/z, z16.b, #12 >> >> >> This patch does the following: >> 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. >> SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to >> 127)or 5bit signed integer immediate (range from -16 to 15). >> >> 2. Add optimized match rules to generate the compare-with-immediate instructions. >> >> [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Merge match rules LGTM. ------------- Marked as reviewed by eliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/13200#pullrequestreview-1415706643 From duke at openjdk.org Sat May 6 06:12:15 2023 From: duke at openjdk.org (Chang Peng) Date: Sat, 6 May 2023 06:12:15 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 05:56:08 GMT, Chang Peng wrote: > @changpeng1997 Yes that's right, I think ideally we can make `Blackhole` receive a `VectorNode`, but this patch is good as it is @merykitty I have updated commit message to explain why we should not use blackhole to fix this benchmark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1537062956 From xgong at openjdk.org Sat May 6 07:23:16 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Sat, 6 May 2023 07:23:16 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d Looks good to me! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/13851#pullrequestreview-1415734300 From duke at openjdk.org Sat May 6 07:23:28 2023 From: duke at openjdk.org (Chang Peng) Date: Sat, 6 May 2023 07:23:28 GMT Subject: Integrated: 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 02:06:41 GMT, Chang Peng wrote: > We can use SVE compare-with-integer-immediate instructions like cmpgt(immediate)[1] to avoid the extra scalar2vector operations. > > The following instruction sequence > > > movi v17.16b, #12 > cmpgt p0.b, p7/z, z16.b, z17.b > > > can be optimized to: > > > cmpgt p0.b, p7/z, z16.b, #12 > > > This patch does the following: > 1. Add SVE compare-with-7bit-unsigned-immediate instructions to C2's backend. > SVE cmp(immediate) instructions can support vector comparing with 7bit unsigned integer immediate (range from 0 to > 127)or 5bit signed integer immediate (range from -16 to 15). > > 2. Add optimized match rules to generate the compare-with-immediate instructions. > > [1]: https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/CMP-cc---immediate---Compare-vector-to-immediate- This pull request has now been integrated. Changeset: 0dca573c Author: changpeng1997 Committer: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/0dca573ca5d357157565072e22e24d6a9bee717a Stats: 1010 lines in 10 files changed: 653 ins; 13 del; 344 mod 8301739: AArch64: Add optimized rules for vector compare with immediate for SVE Reviewed-by: aph, eliu ------------- PR: https://git.openjdk.org/jdk/pull/13200 From duke at openjdk.org Sat May 6 07:30:14 2023 From: duke at openjdk.org (Chang Peng) Date: Sat, 6 May 2023 07:30:14 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d @jatin-bhateja Could you please help to review this patch? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1537077855 From gcao at openjdk.org Sat May 6 13:20:19 2023 From: gcao at openjdk.org (Gui Cao) Date: Sat, 6 May 2023 13:20:19 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v9] In-Reply-To: References: Message-ID: <37dh65GyGBXyLRQr_qAD5Y7ffzYEiuokdm_n-AMGXLk=.aee221eb-4826-4437-a13b-1086b704f0ba@github.com> > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Update integer_narrow_v and remove unused code - Merge branch 'master' into JDK-8306966 - Merge remote-tracking branch 'upstream/master' into JDK-8306966 - rename rvv_vsetvli to vsetvli_helper - Fix round mode and optimize widen/narrow vcast - Small refactoring of rvv_vsetvli - Fix VectorCastF2X - During the conversion, specify the number of vectors - Use zr register instead of x0 - 8306966: RISC-V: Support vector cast node for Vector API ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/80beb6a1..ca5b4ae8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=07-08 Stats: 8211 lines in 218 files changed: 6374 ins; 613 del; 1224 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From gcao at openjdk.org Sat May 6 13:20:24 2023 From: gcao at openjdk.org (Gui Cao) Date: Sat, 6 May 2023 13:20:24 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8] In-Reply-To: References: Message-ID: <3xhO7dlKkhplL0XScHa0cTMu7WacNd3CSahIaa8vHqY=.9cec9360-a377-4ce0-88d2-20fde666ba71@github.com> On Sat, 6 May 2023 03:01:38 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream/master' into JDK-8306966 >> - rename rvv_vsetvli to vsetvli_helper >> - Fix round mode and optimize widen/narrow vcast >> - Small refactoring of rvv_vsetvli >> - Fix VectorCastF2X >> - During the conversion, specify the number of vectors >> - Use zr register instead of x0 >> - 8306966: RISC-V: Support vector cast node for Vector API > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1823: > >> 1821: // High part of dst vector will be filled with zero. >> 1822: void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length, >> 1823: VectorRegister src, BasicType src_bt, VectorRegister tmp) { > > If you allocate different vector registers for 'dst' and 'src' on the callsite, then we should be able to eliminate the 'tmp' register parameter for this function. That is saving the intermediate result in 'dst' instead. Fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.hpp line 239: > >> 237: VectorRegister src, BasicType src_bt, VectorRegister tmp); >> 238: >> 239: void vfcvt_rtz_xu_f_v_safe(VectorRegister dst, VectorRegister src); > > I don't think we need the unsigned version. Could you please remove them? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186696455 PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186696488 From gcao at openjdk.org Sat May 6 14:19:15 2023 From: gcao at openjdk.org (Gui Cao) Date: Sat, 6 May 2023 14:19:15 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v10] In-Reply-To: References: Message-ID: > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: remove unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/ca5b4ae8..63905dff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=08-09 Stats: 17 lines in 1 file changed: 0 ins; 17 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From fjiang at openjdk.org Sun May 7 02:05:27 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 7 May 2023 02:05:27 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8] In-Reply-To: <3xhO7dlKkhplL0XScHa0cTMu7WacNd3CSahIaa8vHqY=.9cec9360-a377-4ce0-88d2-20fde666ba71@github.com> References: <3xhO7dlKkhplL0XScHa0cTMu7WacNd3CSahIaa8vHqY=.9cec9360-a377-4ce0-88d2-20fde666ba71@github.com> Message-ID: On Sat, 6 May 2023 13:15:15 GMT, Gui Cao wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1823: >> >>> 1821: // High part of dst vector will be filled with zero. >>> 1822: void C2_MacroAssembler::integer_narrow_v(VectorRegister dst, BasicType dst_bt, int vector_length, >>> 1823: VectorRegister src, BasicType src_bt, VectorRegister tmp) { >> >> If you allocate different vector registers for 'dst' and 'src' on the callsite, then we should be able to eliminate the 'tmp' register parameter for this function. That is saving the intermediate result in 'dst' instead. > > Fixed. Should we add `assert_differrent_registers` to prevent the same vector register used by other ppl? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186769795 From gcao at openjdk.org Sun May 7 04:31:31 2023 From: gcao at openjdk.org (Gui Cao) Date: Sun, 7 May 2023 04:31:31 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v11] In-Reply-To: References: Message-ID: <12UgYqWF68hGRwVO20ZZRSyhc63hy7VwxSWs8X1xCQ4=.e4c17f66-6d13-47ec-83de-b71a15503a96@github.com> > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: use vxor_vv to clear register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13684/files - new: https://git.openjdk.org/jdk/pull/13684/files/63905dff..a7bb3f32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13684&range=09-10 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13684.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13684/head:pull/13684 PR: https://git.openjdk.org/jdk/pull/13684 From gcao at openjdk.org Sun May 7 05:12:32 2023 From: gcao at openjdk.org (Gui Cao) Date: Sun, 7 May 2023 05:12:32 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v8] In-Reply-To: References: <3xhO7dlKkhplL0XScHa0cTMu7WacNd3CSahIaa8vHqY=.9cec9360-a377-4ce0-88d2-20fde666ba71@github.com> Message-ID: On Sun, 7 May 2023 02:02:56 GMT, Feilong Jiang wrote: > Should we add `assert_differrent_registers` to prevent the same vector register used by other ppl? has been modified and the src and dst registers can now be the same. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13684#discussion_r1186783569 From fyang at openjdk.org Sun May 7 05:28:25 2023 From: fyang at openjdk.org (Fei Yang) Date: Sun, 7 May 2023 05:28:25 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v11] In-Reply-To: <12UgYqWF68hGRwVO20ZZRSyhc63hy7VwxSWs8X1xCQ4=.e4c17f66-6d13-47ec-83de-b71a15503a96@github.com> References: <12UgYqWF68hGRwVO20ZZRSyhc63hy7VwxSWs8X1xCQ4=.e4c17f66-6d13-47ec-83de-b71a15503a96@github.com> Message-ID: On Sun, 7 May 2023 04:31:31 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > use vxor_vv to clear register Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13684#pullrequestreview-1415846613 From gcao at openjdk.org Mon May 8 01:14:16 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 8 May 2023 01:14:16 GMT Subject: RFR: 8306966: RISC-V: Support vector cast node for Vector API [v11] In-Reply-To: <12UgYqWF68hGRwVO20ZZRSyhc63hy7VwxSWs8X1xCQ4=.e4c17f66-6d13-47ec-83de-b71a15503a96@github.com> References: <12UgYqWF68hGRwVO20ZZRSyhc63hy7VwxSWs8X1xCQ4=.e4c17f66-6d13-47ec-83de-b71a15503a96@github.com> Message-ID: On Sun, 7 May 2023 04:31:31 GMT, Gui Cao wrote: >> Hi, >> >> we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> >> We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> >> >> #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X >> There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: >> >> ``` >> 1ba0 ld R28, [R23, #280] # ptr, #@loadP >> 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm >> 1ba8 reinterpretResize V1, V5 >> 1bb0 vcvtBtoX V4, V1 >> 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 >> ``` >> >> #### VectorRearrange >> >> When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. >> >> The compilation log for the `VectorRearrange` node: >> >> ``` >> 1f6 spill R7 -> [sp, #320] # spill size = 64 >> 1f8 spill [sp, #128] -> V1 # vector spill size = 256 >> 200 spill [sp, #160] -> V2 # vector spill size = 256 >> 208 rearrange V3, V1, V2 >> 210 spill V3 -> [sp, #96] # vector spill size = 256 >> 218 li R11, #4 # int, #@loadConI >> ``` >> >> #### VectorReinterpret >> If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. >> https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 >> The compilation log for the `VectorReinterpret` node: >> >> >> 1218 spill [sp, #32] -> V4 # vector spill size = 256 >> 1220 spill [sp, #176] -> V3 # vector spill size = 256 >> 1228 rearrange V2, V4, V3 >> 1230 spill [sp, #72] -> V0 # vmask spill size = 32 >> 123c vmerge_vvm V1, V1, V2, v0 #@vector blend >> 1244 reinterpretResize V2, V1 >> 124c vcvtStoX_extend V5, V2 >> 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> #### LShiftCntV/RShiftCntV >> >> We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types >> >> The compilation log for the LShiftCntV/RShiftCntV node: >> >> >> 24c vasrB V3, V1, V2 >> 260 storeV [R19], V3 # vector (rvv) >> 268 lbu R19, [R29, #48] # byte, #@loadUB >> 26c andi R19, R19, #7 #@andI_reg_imm >> 270 loadV V1, [R25] # vector (rvv) >> 278 vshiftcnt V2, R19 >> 280 vasrB V3, V1, V2 >> 294 storeV [R26], V3 # vector (rvv) >> 29c lbu R19, [R29, #80] # byte, #@loadUB >> 2a0 andi R19, R19, #7 #@andI_reg_imm >> 2a4 loadV V1, [R22] # vector (rvv) >> 2ac vshiftcnt V2, R19 >> >> >> [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc >> [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java >> Testing: >> qemu with UseRVV: >> >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > use vxor_vv to clear register Thanks all for the reivew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13684#issuecomment-1537604067 From gcao at openjdk.org Mon May 8 01:18:47 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 8 May 2023 01:18:47 GMT Subject: Integrated: 8306966: RISC-V: Support vector cast node for Vector API In-Reply-To: References: Message-ID: <69nNsTTo5_zM1fF9rMN65YUzh914Jfyz_0d8wajGRh4=.fca242be-af2e-41e7-a086-367f40cb40a7@github.com> On Thu, 27 Apr 2023 03:37:00 GMT, Gui Cao wrote: > Hi, > > we have added some implementations related to vector cast, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > > We can use the VectorReshapeTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/VectorReshapeTests_PrintOptoAssembly_20230426.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > > > #### VectorCast/VectorCastB2X/VectorCastD2X/VectorCastF2X/VectorCastI2X/VectorCastL2X/VectorCastS2X > There are too many nodes here, and the following shows the log of `VectorCastB2X` nodes: > > ``` > 1ba0 ld R28, [R23, #280] # ptr, #@loadP > 1ba4 addi R29, R7, #32 # ptr, #@addP_reg_imm > 1ba8 reinterpretResize V1, V5 > 1bb0 vcvtBtoX V4, V1 > 1bb8 far_bgeu R29, R28, B465 #@far_cmpP_branch P=0.000100 C=-1.000000 > ``` > > #### VectorRearrange > > When the original vector is converted to the target vector, if the actual number of elements of the original vector is greater than the number of elements of the target vector, a slicing action is performed to provide data for subsequent cast nodes. The slicing action depends on the VectorRearrange node. > > The compilation log for the `VectorRearrange` node: > > ``` > 1f6 spill R7 -> [sp, #320] # spill size = 64 > 1f8 spill [sp, #128] -> V1 # vector spill size = 256 > 200 spill [sp, #160] -> V2 # vector spill size = 256 > 208 rearrange V3, V1, V2 > 210 spill V3 -> [sp, #96] # vector spill size = 256 > 218 li R11, #4 # int, #@loadConI > ``` > > #### VectorReinterpret > If num_elem_from and num_elem_to are not equal, Reinterpret is needed to reset the correct number. > https://github.com/openjdk/jdk/blob/3554e7a3ffb879c7e5ef7547eb053e484d09d12b/src/hotspot/share/opto/vectorIntrinsics.cpp#L2374-L2376 > The compilation log for the `VectorReinterpret` node: > > > 1218 spill [sp, #32] -> V4 # vector spill size = 256 > 1220 spill [sp, #176] -> V3 # vector spill size = 256 > 1228 rearrange V2, V4, V3 > 1230 spill [sp, #72] -> V0 # vmask spill size = 32 > 123c vmerge_vvm V1, V1, V2, v0 #@vector blend > 1244 reinterpretResize V2, V1 > 124c vcvtStoX_extend V5, V2 > 1254 bgeu R28, R7, B169 #@cmpP_branch P=0.000100 C=-1.000000 > > > #### LShiftCntV/RShiftCntV > > We have merged `LShiftCntV`, `RShiftCntV` nodes and support boolean types > > The compilation log for the LShiftCntV/RShiftCntV node: > > > 24c vasrB V3, V1, V2 > 260 storeV [R19], V3 # vector (rvv) > 268 lbu R19, [R29, #48] # byte, #@loadUB > 26c andi R19, R19, #7 #@andI_reg_imm > 270 loadV V1, [R25] # vector (rvv) > 278 vshiftcnt V2, R19 > 280 vasrB V3, V1, V2 > 294 storeV [R26], V3 # vector (rvv) > 29c lbu R19, [R29, #80] # byte, #@loadUB > 2a0 andi R19, R19, #7 #@andI_reg_imm > 2a4 loadV V1, [R22] # vector (rvv) > 2ac vshiftcnt V2, R19 > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/VectorReshapeTests.java > Testing: > qemu with UseRVV: > > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 495f2688 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/495f2688d64ca0393906487a0b9ac6ed4c679ffa Stats: 756 lines in 5 files changed: 457 ins; 83 del; 216 mod 8306966: RISC-V: Support vector cast node for Vector API Co-authored-by: Dingli Zhang Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/13684 From epeter at openjdk.org Mon May 8 06:12:41 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 06:12:41 GMT Subject: RFR: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph [v2] In-Reply-To: <69Et_PaFnKjKYm_ILdMNs-eIooTG6eCNNgy4mxYax3w=.695a58e6-664b-4403-b2c0-1d143fa60a2a@github.com> References: <69Et_PaFnKjKYm_ILdMNs-eIooTG6eCNNgy4mxYax3w=.695a58e6-664b-4403-b2c0-1d143fa60a2a@github.com> Message-ID: On Sat, 6 May 2023 01:37:50 GMT, Fei Gao wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed Fei's review suggestions > > Tier 1-3 on aarch64 machines passed. Thanks @fg1417 @vnkozlov for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13354#issuecomment-1537804362 From epeter at openjdk.org Mon May 8 06:12:43 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 06:12:43 GMT Subject: Integrated: 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 14:55:55 GMT, Emanuel Peter wrote: > `SuperWord:schedule`, and specifically `SuperWord::co_locate_pack` is broken. > The problem is with the basic approach of it, as far as I know. > Hence, I had to completely re-design the `schedule` algorithm, based on the `PacksetGraph` ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > **The current approach** > > The idea is to leave the non-vectorized memory ops in their place, and find the right place for the vectorized memops to be "sandwiched" into. The logic is very complex and has already had a few bugs fixed. > > **Why this does not work** > > However, in some rare cases, we have to reorder non-vectorized operations. See this example that I added as a regression test: > > https://github.com/openjdk/jdk/blob/a771a61005aea272cc51fa3f3e1637c217582fce/test/hotspot/jtreg/compiler/loopopts/superword/TestScheduleReordersScalarMemops.java#L82-L109 > > I found this issue during work on https://github.com/openjdk/jdk/pull/13078, where I had to restrict/disable some tests that are now passing. > > **Solution** > > Abandon the idea of "sandwiching" memops. Rewrite `SuperWord:schedule`: > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2567-L2576 > > We first schedule all memops into a linear order. > We do this scheduling based on the `PacksetGraph`, which gives us a `DAG` based on the `packset` and the dependency-graph (which in turn respects the data use-defs, as well as the memory dependencies, unless we can prove that they do not reference the same memory). > In other words: we have a linearization that respects all dependencies that must be respected. > Further, we make sure that ops from the same pack are scheduled as a block (all adjacent to each other), and in order that the packset has internally. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2489-L2493 > > Now that we have this order (and we have not aborted because we found a cycle in the `PacksetGraph`), we must apply this schedule to each memory slice, and reorder the memops in the slices accordingly. > > https://github.com/openjdk/jdk/blob/6bb2da3da988618803823e905f23cb106cd9d6b2/src/hotspot/share/opto/superword.cpp#L2617-L2619 > > This scheduling has the nice side-effect of simplifying `SuperWord::output` a little. > We know now that the first element in a pack is also first in the slice order, and the last element in the pack is last in the slice (because we schedule the packs as a block, i.e. in the pack order). > > **Discussion** > > This seems to me to be a much more straight forward approach, and it uses the code I recently added for verification of cyclic dependencies in the packset ([JDK-8304042](https://bugs.openjdk.org/browse/JDK-8304042), https://git.openjdk.org/jdk/pull/13078). > > One potential improvement to my fix: > We now sometimes re-order the non-vectorized memory slices, even though it may not be necessary. > This is not wrong, but it makes updates to the graph that may be confusing when debugging. > Further, the re-ordering may have performance impacts. > I could use a priority-queue (min-heap, would have to implement it since it does not yet exist), and schedule the `PacksetGraph` whenever possible with the lower `bb_idx` first. This would make the new linear order the same/closer to the old one. However, I am not sure if this is worth the effort and overhead of a priority-queue. > > **Testing** > Github-actions pass. tier1-6 + stress testing passes. > Performance testing showed no significant performance change. This pull request has now been integrated. Changeset: ad0e5a99 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/ad0e5a99ca1ad9dd04105f502985735a3536c3f4 Stats: 617 lines in 5 files changed: 224 ins; 272 del; 121 mod 8304720: SuperWord::schedule should rebuild C2-graph from SuperWord dependency-graph Reviewed-by: kvn, fgao ------------- PR: https://git.openjdk.org/jdk/pull/13354 From chagedorn at openjdk.org Mon May 8 06:23:25 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 May 2023 06:23:25 GMT Subject: RFR: 8307131: C2: assert(false) failed: malformed control flow In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:48:06 GMT, Roland Westrelin wrote: > The IR graph has a loop nest with 2 loops and 2 safepoints. Both > safepoints are in the inner loop. One is on the backedge of the inner > loop. The inner loop is transformed into a counted loop and that > safepoint is removed. The other safepoint is right above the inner > loop's exit condition. The outer strip mined loop is constructed and > the safepoint is moved to the outer strip mined loop eventhough that > safepoint is marked as non deleteable. The inner loop is later on > removed, the outer strip mined loop is too, so is the safepoint. What > was the outer loop of the 2 loop nest becomes an infinite loop without > a safepoint and is considered dead code which in turn causes the > assert to fire. > > The fix I propose is to only build the strip mined loop if the > safepoint that's moved to the outer strip mined loop is deleteable. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13826#pullrequestreview-1416185530 From pli at openjdk.org Mon May 8 06:31:26 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 8 May 2023 06:31:26 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter wrote: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > **TODO report performance testing (running)** > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. src/hotspot/share/opto/loopopts.cpp line 4273: > 4271: VectorNode* vector_accumulator = current->make_normal_vector_op(last_vector_accumulator, vector_input, vec_t); > 4272: _igvn.register_new_node_with_optimizer(vector_accumulator); > 4273: C->copy_node_notes_to(vector_accumulator, current); Copying node notes may be combined with registering new node by `_igvn.register_new_node_with_optimizer(vector_accumulator, current);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187069088 From epeter at openjdk.org Mon May 8 06:53:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 06:53:21 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > **TODO report performance testing (running)** > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: copy node notes with igvn registering ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/cc9e7e8e..5a51ac37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From epeter at openjdk.org Mon May 8 06:53:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 06:53:22 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 06:26:56 GMT, Pengfei Li wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> copy node notes with igvn registering > > src/hotspot/share/opto/loopopts.cpp line 4273: > >> 4271: VectorNode* vector_accumulator = current->make_normal_vector_op(last_vector_accumulator, vector_input, vec_t); >> 4272: _igvn.register_new_node_with_optimizer(vector_accumulator); >> 4273: C->copy_node_notes_to(vector_accumulator, current); > > Copying node notes may be combined with registering new node by calling `_igvn.register_new_node_with_optimizer(vector_accumulator, current);` @pfustc thanks for the hint, I did not know that! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187084497 From pli at openjdk.org Mon May 8 06:57:25 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 8 May 2023 06:57:25 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 06:53:21 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > copy node notes with igvn registering src/hotspot/share/opto/loopopts.cpp line 4305: > 4303: > 4304: // Turn the scalar phi into a vector phi. > 4305: _igvn.rehash_node_delayed(phi); Is it possible to setup the vector phi first, and then replace all reduction nodes by vector accumulators via calling `_igvn.replace_node()`? Current code here looks a bit wordy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187089152 From pli at openjdk.org Mon May 8 07:01:19 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 8 May 2023 07:01:19 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: <3mH6mAq6VP2frur2HqexmQCUVSj4gB7MG82u-31dccI=.6b69ae8e-6a39-4d33-9242-1c44a5e82ef3@github.com> On Mon, 8 May 2023 06:53:21 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > copy node notes with igvn registering src/hotspot/share/opto/loopopts.cpp line 4318: > 4316: > 4317: #ifdef ASSERT > 4318: if (TraceNewVectors) { Not sure if this is required. But I haven't seen the creation of vector phi is traced in other places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187091733 From thartmann at openjdk.org Mon May 8 07:10:13 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 May 2023 07:10:13 GMT Subject: RFR: 8307131: C2: assert(false) failed: malformed control flow In-Reply-To: References: Message-ID: <3cv2GfKx_K4DhIlVH8958AKtrrEj2ySxJQQ9hBlLZSU=.7dee413e-d517-46aa-b84f-e18b51afba95@github.com> On Fri, 5 May 2023 08:48:06 GMT, Roland Westrelin wrote: > The IR graph has a loop nest with 2 loops and 2 safepoints. Both > safepoints are in the inner loop. One is on the backedge of the inner > loop. The inner loop is transformed into a counted loop and that > safepoint is removed. The other safepoint is right above the inner > loop's exit condition. The outer strip mined loop is constructed and > the safepoint is moved to the outer strip mined loop eventhough that > safepoint is marked as non deleteable. The inner loop is later on > removed, the outer strip mined loop is too, so is the safepoint. What > was the outer loop of the 2 loop nest becomes an infinite loop without > a safepoint and is considered dead code which in turn causes the > assert to fire. > > The fix I propose is to only build the strip mined loop if the > safepoint that's moved to the outer strip mined loop is deleteable. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13826#pullrequestreview-1416239761 From epeter at openjdk.org Mon May 8 07:15:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 07:15:24 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 06:54:43 GMT, Pengfei Li wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> copy node notes with igvn registering > > src/hotspot/share/opto/loopopts.cpp line 4305: > >> 4303: >> 4304: // Turn the scalar phi into a vector phi. >> 4305: _igvn.rehash_node_delayed(phi); > > Is it possible to setup the vector phi first, and then replace all reduction nodes by vector accumulators via calling `_igvn.replace_node()`? Current code here looks a bit wordy. If I set up the phi first, then where do I keep the `init` value? I have to make sure it has a use throughout the transformation, other wise it may think it is `dead`. > src/hotspot/share/opto/loopopts.cpp line 4318: > >> 4316: >> 4317: #ifdef ASSERT >> 4318: if (TraceNewVectors) { > > Not sure if this is required. But I haven't seen the creation of vector phi is traced in other places. Do we even have any other Vector Phi's? SuperWord never does it. Maybe in the Vector API? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187104346 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187102352 From stuefe at openjdk.org Mon May 8 07:47:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 May 2023 07:47:30 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 14:27:25 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Introduce os::free_memory Changes requested by stuefe (Reviewer). src/hotspot/os/linux/os_linux.cpp line 224: > 222: > 223: julong os::Linux::available_memory() { > 224: julong avail_mem = 0UL; I would use a signed value, e.g. ssize_t, instead, and use -1 as a marker value. 0 may possibly be a real value you read. src/hotspot/os/linux/os_linux.cpp line 245: > 243: julong mem_available; > 244: if (fscanf(fp, "MemAvailable: " JULONG_FORMAT " kB", &mem_available) == 1) { > 245: avail_mem = mem_available; I think this mem_available var is not needed, you can pass avail_mem directly. src/hotspot/os/linux/os_linux.cpp line 253: > 251: if (avail_mem == 0UL) { > 252: avail_mem = free_memory(); > 253: } Don't you need to multiply the returned value with 1024? ------------- PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1416274312 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187119446 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187131817 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187132435 From pli at openjdk.org Mon May 8 07:49:27 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 8 May 2023 07:49:27 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 07:12:46 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4305: >> >>> 4303: >>> 4304: // Turn the scalar phi into a vector phi. >>> 4305: _igvn.rehash_node_delayed(phi); >> >> Is it possible to setup the vector phi first, and then replace all reduction nodes by vector accumulators via calling `_igvn.replace_node()`? Current code here looks a bit wordy. > > If I set up the phi first, then where do I keep the `init` value? I have to make sure it has a use throughout the transformation, other wise it may think it is `dead`. Why can the `init` node be dead? I think it should be ok as long as `init` node is connected well at the end of this transformation function. Registering new nodes just puts nodes into igvn worklist and no actual igvn is performed in the middle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187134726 From pli at openjdk.org Mon May 8 07:54:22 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 8 May 2023 07:54:22 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 07:10:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4318: >> >>> 4316: >>> 4317: #ifdef ASSERT >>> 4318: if (TraceNewVectors) { >> >> Not sure if this is required. But I haven't seen the creation of vector phi is traced in other places. > > Do we even have any other Vector Phi's? SuperWord never does it. Maybe in the Vector API? My colleague working on VectorAPI just told me yes. A scalar phi may eventually become a vector phi after several transformations in vector intrinsics and vector box/unbox optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187139644 From epeter at openjdk.org Mon May 8 08:06:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 May 2023 08:06:26 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v2] In-Reply-To: References: Message-ID: <75k3jJWrNOKe48phhuVALq4UUfxgQxivOge7UfgEm5k=.b112d73f-cde4-436c-af02-10eece5181ca@github.com> On Mon, 8 May 2023 07:51:52 GMT, Pengfei Li wrote: >> Do we even have any other Vector Phi's? SuperWord never does it. Maybe in the Vector API? > > My colleague working on VectorAPI just told me yes. A scalar phi may eventually become a vector phi after several transformations in vector intrinsics and vector box/unbox optimizations. Ok, I can remove it. The user probably does not care about that so much, more about the vector instructions that actually end up in the code-gen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1187150209 From jeisl at openjdk.org Mon May 8 08:42:23 2023 From: jeisl at openjdk.org (Josef Eisl) Date: Mon, 8 May 2023 08:42:23 GMT Subject: RFR: 8307588: HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 Message-ID: As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. ------------- Commit messages: - Fix HotSpotConstantPool#lookupBootstrapMethodInvocation (JDK-8307588) - Test for HotSpotConstantPool#lookupBootstrapMethodInvocation (JDK-8307588) Changes: https://git.openjdk.org/jdk/pull/13858/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13858&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307588 Stats: 20 lines in 2 files changed: 3 ins; 13 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13858/head:pull/13858 PR: https://git.openjdk.org/jdk/pull/13858 From sgehwolf at openjdk.org Mon May 8 08:46:30 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 8 May 2023 08:46:30 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: References: Message-ID: <1IB3l4_DimiOYMACS_dEGE8OzrtfUlCy5QwPahT-Bx8=.8af56e59-ce08-4160-8e3a-3899d012e46f@github.com> On Fri, 5 May 2023 14:27:25 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Introduce os::free_memory src/hotspot/os/linux/os_linux.cpp line 278: > 276: return free_mem; > 277: } > 278: } We now have this code twice in `os_linux.cpp`. Could we parameterize this and extract as a function so that we avoid the duplication from lines 226-236? src/hotspot/share/runtime/os.hpp line 314: > 312: > 313: static julong available_memory(); > 314: static julong free_memory(); It would probably be a good idea to add a comment describing what the actual difference is between free/available (i.e. Linux only and usually `available memory > free memory` on such systems). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187187253 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187189400 From dnsimon at openjdk.org Mon May 8 08:46:17 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 8 May 2023 08:46:17 GMT Subject: RFR: 8307588: HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13858#pullrequestreview-1416388280 From shade at openjdk.org Mon May 8 10:18:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 May 2023 10:18:35 GMT Subject: RFR: 8307527: MacOS Zero builds fail with undefined FFI_GO_CLOSURES after JDK-8304265 [v2] In-Reply-To: References: Message-ID: > See the bug. Actually, I am not sure why JDK-8304265 changed the `#ifndef FFI_GO_CLOSURES` to `#ifdef _APPLE_`. That seems too intrusive if `FFI_GO_CLOSURES` *is* enabled. So I rewrote the block to something more safe. > > Additional testing: > - [x] macos-aarch64-zero-fastdebug `make images` passes Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use a build system fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13827/files - new: https://git.openjdk.org/jdk/pull/13827/files/30029824..72f978ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13827&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13827&range=00-01 Stats: 30 lines in 3 files changed: 20 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13827/head:pull/13827 PR: https://git.openjdk.org/jdk/pull/13827 From ysuenaga at openjdk.org Mon May 8 12:20:26 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 8 May 2023 12:20:26 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v3] In-Reply-To: References: Message-ID: <8WK3jGh4_472qAmHahCGApTgxhVmxAUucGhlMF-R7aY=.9e34251e-a42c-45c6-96fa-5277b01a6fe8@github.com> > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Fix comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13398/files - new: https://git.openjdk.org/jdk/pull/13398/files/5bcac06b..1660c0f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=01-02 Stats: 46 lines in 2 files changed: 20 ins; 17 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13398/head:pull/13398 PR: https://git.openjdk.org/jdk/pull/13398 From stuefe at openjdk.org Mon May 8 12:49:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 May 2023 12:49:38 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v3] In-Reply-To: <8WK3jGh4_472qAmHahCGApTgxhVmxAUucGhlMF-R7aY=.9e34251e-a42c-45c6-96fa-5277b01a6fe8@github.com> References: <8WK3jGh4_472qAmHahCGApTgxhVmxAUucGhlMF-R7aY=.9e34251e-a42c-45c6-96fa-5277b01a6fe8@github.com> Message-ID: <1R_pABdsmCLpFs3u8yEvIG85bxWpjwWLATmq9kGOuhY=.d8f9ae44-52cc-4b2a-b3bf-0f98407629a1@github.com> On Mon, 8 May 2023 12:20:26 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments LGTM. Please make sure that GHAs are green, or if we have infrastructure problems that you verify the windows build manually. Thanks, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1416732946 From chagedorn at openjdk.org Mon May 8 13:11:26 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 May 2023 13:11:26 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates Message-ID: This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. To make reviewing the entire change easier, I've decided to split the work into several PRs. This first PR includes the following _semantic-preserving_ changes: - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: - Updating the code (variables, method names etc.) accordingly. - Renaming "Skeleton Predicates" to "Assertion Predicates". - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). - Change `class Predicates` -> `class ParsePredicates`. - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). - Removing unused variables. - Removing unnecessary checks. - Code style fixes in touched code. Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. The blog post can be found on my Github page at: https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. Thanks, Christian ------------- Commit messages: - 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates Changes: https://git.openjdk.org/jdk/pull/13864/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305634 Stats: 704 lines in 19 files changed: 202 ins; 40 del; 462 mod Patch: https://git.openjdk.org/jdk/pull/13864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13864/head:pull/13864 PR: https://git.openjdk.org/jdk/pull/13864 From sgehwolf at openjdk.org Mon May 8 13:11:33 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 8 May 2023 13:11:33 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v3] In-Reply-To: <8WK3jGh4_472qAmHahCGApTgxhVmxAUucGhlMF-R7aY=.9e34251e-a42c-45c6-96fa-5277b01a6fe8@github.com> References: <8WK3jGh4_472qAmHahCGApTgxhVmxAUucGhlMF-R7aY=.9e34251e-a42c-45c6-96fa-5277b01a6fe8@github.com> Message-ID: On Mon, 8 May 2023 12:20:26 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments Seems fine. One nit about implicit booleans. src/hotspot/os/linux/os_linux.cpp line 246: > 244: > 245: FILE *fp = os::fopen("/proc/meminfo", "r"); > 246: if (fp) { Please don't use implicit booleans. `fp != nullptr` should be fine. See: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#miscellaneous ------------- Marked as reviewed by sgehwolf (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1416769276 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1187424558 From sgehwolf at openjdk.org Mon May 8 13:17:28 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 8 May 2023 13:17:28 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo In-Reply-To: References: Message-ID: On Fri, 5 May 2023 23:27:50 GMT, Yasumasa Suenaga wrote: >> We could also just bypass the compiler thread creation question for now. Let the compiler continue to use the old metric when calculating its thread count, but let all other users of os::available_memory() the new one. > > @tstuefe @robcasloz > > I updated this PR to implement both `free_memory` and `available_memory`. In Linux, `free_memory` refers MemFree (equivalent with older `available_memory`), and `available_memory` refers MemAvailable. In other platforms, `free_memory` proxies `available_memory`. And also `CompileBroker` uses `free_memory` rather than `available_memory`. Some GHA checks were failed, but I think they are not caused by this change. @YaSuenag Windows GHA issue should go away if you merge with latest master. See https://bugs.openjdk.org/browse/JDK-8306543 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13398#issuecomment-1538343451 From rcastanedalo at openjdk.org Mon May 8 13:59:53 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 8 May 2023 13:59:53 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v4] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 13:54:53 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8305770 > - Use nullptr in condition > - Fix comments > - Introduce os::free_memory > - Use JULONG_FORMAT in format string > - 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo Compiler changes look good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1416855041 From ysuenaga at openjdk.org Mon May 8 13:59:53 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 8 May 2023 13:59:53 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v4] In-Reply-To: References: Message-ID: > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8305770 - Use nullptr in condition - Fix comments - Introduce os::free_memory - Use JULONG_FORMAT in format string - 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13398/files - new: https://git.openjdk.org/jdk/pull/13398/files/1660c0f3..13e3dcb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=02-03 Stats: 319696 lines in 3203 files changed: 272077 ins; 26150 del; 21469 mod Patch: https://git.openjdk.org/jdk/pull/13398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13398/head:pull/13398 PR: https://git.openjdk.org/jdk/pull/13398 From chagedorn at openjdk.org Mon May 8 14:43:20 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 May 2023 14:43:20 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v2] In-Reply-To: References: Message-ID: <25lksRBizIIfNL3HxxyG7YUCm1KF1FdgvRocnrlxtqI=.9d604245-3f20-49fd-bfae-f9a2b9e336c6@github.com> > This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. > > To make reviewing the entire change easier, I've decided to split the work into several PRs. > > This first PR includes the following _semantic-preserving_ changes: > - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: > - Updating the code (variables, method names etc.) accordingly. > - Renaming "Skeleton Predicates" to "Assertion Predicates". > - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. > - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). > - Change `class Predicates` -> `class ParsePredicates`. > - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). > - Removing unused variables. > - Removing unnecessary checks. > - Code style fixes in touched code. > > Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. > > The blog post can be found on my Github page at: > https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html > > Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13864/files - new: https://git.openjdk.org/jdk/pull/13864/files/26f11820..cf4525e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13864/head:pull/13864 PR: https://git.openjdk.org/jdk/pull/13864 From thartmann at openjdk.org Mon May 8 14:59:27 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 May 2023 14:59:27 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet Message-ID: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 A loaded type can therefore be replaced by an unloaded type during GVN. In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. Thanks, Tobias ------------- Commit messages: - Re-ordering of _computed fields initialization - Reverted unrelated change - 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet Changes: https://git.openjdk.org/jdk/pull/13868/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303512 Stats: 53 lines in 2 files changed: 13 ins; 24 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13868/head:pull/13868 PR: https://git.openjdk.org/jdk/pull/13868 From qamai at openjdk.org Mon May 8 15:55:43 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 May 2023 15:55:43 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Mon, 8 May 2023 14:52:21 GMT, Tobias Hartmann wrote: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias It seems to me that if 2 threads access a variable racily without using atomic, it results in undefined behaviours. This patch removes the logic and therefore removes the race condition. Is there any risk with that in the surrounding code, too? I see you reordered `_hash_computed` and `_exact_klass_computed` assignment, if these reorders matter, should they be at least releasing stores instead? Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1538632666 From vlivanov at openjdk.org Mon May 8 18:02:33 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 8 May 2023 18:02:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <_QCxNp2slZ7n9AQvfzl_a8ftbokD6fD44f6a538jsO0=.b7c658df-5a6e-42f6-b80b-4e09398f3d79@github.com> On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 It took longer than I expected, but I finished looking into debug info. A couple of minor comments first: * Please, ensure that the AllocationMergesTests.java has cases to trigger the case when SRs and NSRs meet at a merge point. I was not able to provoke it with the unit test. * diagnostic output becomes much harder to read (sample output follows). Sample output: - ordniary SR case Expression stack - @0: obj: ID=1335, only_merge_candidate=0, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, nullptr ... Objects obj: ID=1335, only_merge_candidate=0, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, nullptr - mixed merge case: ScopeDesc(pc=0x00000001080bc664 offset=1824): java.lang.String::substring at 8 (line 2830) Locals - l0: merge: ID=1781, N.Candidates=1 ... Objects merge: ID=1781, N.Candidates=1obj: ID=1782, only_merge_candidate=1, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, reg rfp [58],oop ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1538801137 From vlivanov at openjdk.org Mon May 8 18:24:24 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 8 May 2023 18:24:24 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 Speaking of debug info design, it seems there's a need for an additional transformation step now. Originally, all the operations were performed right on the deserialized debug info representation. It was well-justified at first, but slowly accrued with special cases (nulls, autobox, vectors) and merges push it over the limit IMO. I propose to introduce an additional pass which takes original debug info and, based on current JVM state (`frame` + `RegisterMap`), transforms it into a list of objects to be materialized and a graph of `ScopeValue`s which depend on them. It would isolate preprocessing logic you have scattered across multiple places, simplify rematerialization, make it easier to find out what happens during deoptimizaiton in each particular case. Moreover, it'll enable support for more complex scenarios (e.g., nested merges) which I expect to eventually emerge in followup enhancements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1538835019 From never at openjdk.org Mon May 8 19:00:15 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 8 May 2023 19:00:15 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: <8i3Jx9Gb2kLvnD52YJHDUKq1pGpLDCeL6NKd4EaqhU4=.3d8fd44e-29e5-4c6e-9c4e-1d73d2712ef6@github.com> On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13858#pullrequestreview-1417338946 From jeisl at openjdk.org Mon May 8 19:09:27 2023 From: jeisl at openjdk.org (Josef Eisl) Date: Mon, 8 May 2023 19:09:27 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13858#issuecomment-1538896451 From kvn at openjdk.org Mon May 8 19:43:23 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 May 2023 19:43:23 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. Please, update Copyright year in test file. No need for re-testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/13858#pullrequestreview-1417397266 From jeisl at openjdk.org Mon May 8 19:51:25 2023 From: jeisl at openjdk.org (Josef Eisl) Date: Mon, 8 May 2023 19:51:25 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 [v2] In-Reply-To: References: Message-ID: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. Josef Eisl has updated the pull request incrementally with one additional commit since the last revision: Update copyright date in TestDynamicConstant.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13858/files - new: https://git.openjdk.org/jdk/pull/13858/files/b271e842..239a368f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13858&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13858&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13858.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13858/head:pull/13858 PR: https://git.openjdk.org/jdk/pull/13858 From jeisl at openjdk.org Mon May 8 19:51:26 2023 From: jeisl at openjdk.org (Josef Eisl) Date: Mon, 8 May 2023 19:51:26 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 [v2] In-Reply-To: References: Message-ID: <-AZZ80O2I3TB2d5_gRXwXq6Ek0mU9OOg8KCqjsgTrys=.c8b87c34-47e1-41c6-8c32-afa6e2f90abb@github.com> On Mon, 8 May 2023 19:40:26 GMT, Vladimir Kozlov wrote: >> Josef Eisl has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright date in TestDynamicConstant.java > > Please, update Copyright year in test file. No need for re-testing. Thanks @vnkozlov, done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13858#issuecomment-1538945477 From kvn at openjdk.org Mon May 8 19:54:22 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 May 2023 19:54:22 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 19:51:25 GMT, Josef Eisl wrote: >> As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. >> >> This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. >> >> I've manually verified that this solved the native image issues that uncovered the problem. > > Josef Eisl has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright date in TestDynamicConstant.java Good. Please, wait GHA finish before integration. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13858#pullrequestreview-1417411343 From kvn at openjdk.org Mon May 8 19:54:24 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 May 2023 19:54:24 GMT Subject: RFR: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. And, please activate GH Actions testing in your repo. Never mind. You did it already. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13858#issuecomment-1538950802 From sviswanathan at openjdk.org Mon May 8 22:22:25 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 8 May 2023 22:22:25 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. src/hotspot/cpu/x86/x86_64.ad line 12456: > 12454: match(Set cr (CmpI (AndI src1 src2) zero)); > 12455: > 12456: format %{ "testl $src1, $src2\t# long" %} The format string has "# long" should be "# int" here as this is integer operation. test/micro/org/openjdk/bench/vm/compiler/x86/AndCmpTestInstruction.java line 2: > 1: /* > 2: * Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved. Copyright year should be 2023. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13587#discussion_r1187950806 PR Review Comment: https://git.openjdk.org/jdk/pull/13587#discussion_r1187951090 From sviswanathan at openjdk.org Mon May 8 22:26:20 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 8 May 2023 22:26:20 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: <8pLz7m1fIKnPzR-C-loD7I_vAU-zMbQd6IOPZvPGnAw=.26d172a5-8ee9-4d60-9f8a-a9af5bcc9d53@github.com> On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. @ichttt Nice performance gain for opaque operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13587#issuecomment-1539142398 From cslucas at openjdk.org Mon May 8 22:53:31 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 8 May 2023 22:53:31 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <4PBnXq7Eci77beY5cjMGEiuqpRfDcQF9Hwln0ADgDb4=.20c74eb7-f7f8-46be-a005-34dbfd5cdd96@github.com> On Mon, 8 May 2023 18:21:09 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > Speaking of debug info design, it seems there's a need for an additional transformation step now. > > Originally, all the operations were performed right on the deserialized debug info representation. It was well-justified at first, but slowly accrued with special cases (nulls, autobox, vectors) and merges push it over the limit IMO. > > I propose to introduce an additional pass which takes original debug info and, based on current JVM state (`frame` + `RegisterMap`), transforms it into a list of objects to be materialized and a graph of `ScopeValue`s which depend on them. It would isolate preprocessing logic you have scattered across multiple places, simplify rematerialization, make it easier to find out what happens during deoptimizaiton in each particular case. Moreover, it'll enable support for more complex scenarios (e.g., nested merges) which I expect to eventually emerge in followup enhancements. Thank you @iwanowww for taking the time to review this! Please let me ask you some clarifying questions. > A couple of minor comments first [...] I'll address those asap! Thanks. > I propose to introduce an additional pass which takes original debug info [...] What kind of pass are you referring to exactly? When would this pass run? By "original debug info" you mean the debug information stream? > It would isolate preprocessing logic you have scattered across multiple places [...] Which preprocessing logic are you referring to exactly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1539161396 From vlivanov at openjdk.org Tue May 9 00:06:33 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 May 2023 00:06:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: * new list of objects which enumerates all scalarized instances which needs to be rematerialized; * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). It should be performed before `rematerialize_objects`. By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1539210279 From dholmes at openjdk.org Tue May 9 01:46:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 01:46:28 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v4] In-Reply-To: References: Message-ID: <4Z9Bp-O7stgl1vfZNxha6IQpqvRwp6HmJ3H7mo39YRk=.6050a499-fe30-42e5-aefe-c26340207204@github.com> On Mon, 8 May 2023 13:59:53 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into JDK-8305770 > - Use nullptr in condition > - Fix comments > - Introduce os::free_memory > - Use JULONG_FORMAT in format string > - 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo Just trying to understand where this has ended up. IIUC the change: - modifies `os::Linux::available_memory` to use `MemAvailable` - adds `os::Linux_free_memory` to do what `os::Linux::available_memory` used to do - changes compileBroker to use `os::free_memory` instead of `os::available_memory` so that it is unaffected by the change ------------- PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1417712138 From dholmes at openjdk.org Tue May 9 01:46:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 01:46:30 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: <1IB3l4_DimiOYMACS_dEGE8OzrtfUlCy5QwPahT-Bx8=.8af56e59-ce08-4160-8e3a-3899d012e46f@github.com> References: <1IB3l4_DimiOYMACS_dEGE8OzrtfUlCy5QwPahT-Bx8=.8af56e59-ce08-4160-8e3a-3899d012e46f@github.com> Message-ID: On Mon, 8 May 2023 08:42:53 GMT, Severin Gehwolf wrote: >> Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: >> >> Introduce os::free_memory > > src/hotspot/share/runtime/os.hpp line 314: > >> 312: >> 313: static julong available_memory(); >> 314: static julong free_memory(); > > It would probably be a good idea to add a comment describing what the actual difference is between free/available (i.e. Linux only and usually `available memory > free memory` on such systems). Yes some detailed commentary is very necessary. It looks very odd to have two functions that do the same thing on every platform but Linux. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1188033176 From fgao at openjdk.org Tue May 9 02:30:28 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 9 May 2023 02:30:28 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Fri, 5 May 2023 07:33:03 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > @jatin-bhateja @vnkozlov @sviswa7 I substantially reworked this RFE, and have it working now, and included your suggestions. > > The lagorithm now sits in `PhaseIdealLoop::build_and_optimize` after `SuperWord`. It can now handle chains of `UnorderedReduction`, so that it is more robust agains unrolling. > > The only thing missing for me is: > 1. Benchmark on `aarch64`. @fg1417 Would you want to have a look at that? > 2. Wait for the performance testing results. Hi @eme64 , I tested your benchmark on neon and 128-bit sve machines, also with `2_000` warmup, `100_000` perf iterations and `16*1024` array length. I preprocessed raw data by dividing results on master. M: master M-N: master with -XX:-SuperWordReductions P: with your patch 128-bit-sve -------------------------------------------- type op M M-N P int add 1 2.222 0.787 int mul 1 0.931 0.465 int min 1 1.000 0.999 int max 1 0.999 0.999 int and 1 1.928 0.691 int or 1 1.866 0.685 int xor 1 1.924 0.738 long add 1 1.036 1.001 long mul 1 0.983 1.001 long min 1 1.001 0.999 long max 1 1.002 0.996 long and 1 1.037 1.001 long or 1 1.017 1.002 long xor 1 1.037 1.002 float add 1 1.894 1.000 float mul 1 0.973 1.000 float min 1 2.926 0.758 float max 1 2.925 0.758 double add 1 1.472 1.005 double mul 1 1.105 1.000 double min 1 1.670 0.866 double max 1 1.669 0.865 NEON -------------------------------------------- type op M M-N P int add 1 1.991 0.892 int mul 1 1.212 0.605 int min 1 1.007 1.007 int max 1 1.002 1.004 int and 1 1.597 0.717 int or 1 1.566 0.716 int xor 1 1.594 0.715 long add 1 1.001 1.000 long mul 1 1.000 1.000 long min 1 1.015 1.001 long max 1 1.001 1.000 long and 1 1.001 1.000 long or 1 1.001 1.000 long xor 1 1.001 1.000 float add 1 1.000 1.000 float mul 1 1.000 0.999 float min 1 2.875 0.765 float max 1 2.873 0.762 double add 1 0.999 0.996 double mul 1 1.001 0.996 double min 1 1.607 0.862 double max 1 1.609 0.865 We can see obvious uplift brought by your patch, and almost no regression. Since superword doesn't support 2-element long reduction, see https://github.com/openjdk/jdk/blob/d9052b946682d1c0f2629455d73fe4e6b95b29db/src/hotspot/share/opto/superword.cpp#L2314 no benefit on NEON/128-bit sve machines for long type is expected. Once I can access more than 128 bit sve machines, I can verify it later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1539298861 From ysuenaga at openjdk.org Tue May 9 04:40:17 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Tue, 9 May 2023 04:40:17 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v4] In-Reply-To: <4Z9Bp-O7stgl1vfZNxha6IQpqvRwp6HmJ3H7mo39YRk=.6050a499-fe30-42e5-aefe-c26340207204@github.com> References: <4Z9Bp-O7stgl1vfZNxha6IQpqvRwp6HmJ3H7mo39YRk=.6050a499-fe30-42e5-aefe-c26340207204@github.com> Message-ID: On Tue, 9 May 2023 01:43:39 GMT, David Holmes wrote: > Just trying to understand where this has ended up. IIUC the change: Yes, all of them are correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13398#issuecomment-1539394920 From ysuenaga at openjdk.org Tue May 9 04:40:19 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Tue, 9 May 2023 04:40:19 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: References: <1IB3l4_DimiOYMACS_dEGE8OzrtfUlCy5QwPahT-Bx8=.8af56e59-ce08-4160-8e3a-3899d012e46f@github.com> Message-ID: <5xy2JOZ1lQY25wI1cLAM3noFSOAB0JtT3L1zjXpwHPU=.1c0736cf-13b2-4702-b2df-c37892a955d3@github.com> On Tue, 9 May 2023 01:37:21 GMT, David Holmes wrote: >> src/hotspot/share/runtime/os.hpp line 314: >> >>> 312: >>> 313: static julong available_memory(); >>> 314: static julong free_memory(); >> >> It would probably be a good idea to add a comment describing what the actual difference is between free/available (i.e. Linux only and usually `available memory > free memory` on such systems). > > Yes some detailed commentary is very necessary. It looks very odd to have two functions that do the same thing on every platform but Linux. I added comments into os_linux.hpp . I will move them to os.hpp . https://github.com/openjdk/jdk/pull/13398/files#diff-b6c4026228694834053813a8a8ea4795b5edb29eeb9cc46cd234c8e3b92336f4R61-R64 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1188121999 From epeter at openjdk.org Tue May 9 06:20:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 May 2023 06:20:24 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D Message-ID: **Bug** In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 The wrong results with `NaN` are because of a bug in `x`: https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). **Solution** @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. This has a few benefits: - `VectorMaskCmp + VectorBlend` is more powerful: - `CMoveVF/D` required the same inputs to the compare than to the move itself. - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). - We need less code (I completely removed all code for `CMoveVF/D`). I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, the CMove code did not properly maintain the `packset` / `my_pack`. I now added some verification here, since I also just removed the problematic code, and the verification passes now with this patch. I was also able to remove the unwanted `UseVectorCmov` in an assert. Most of the changes come from the regression tests in `TestVectorConditionalMove.java`: I generalized it from `aarch64` to all platforms, the IR rules only apply with `avx/asimd`. And I added many new tests to cover the newly implemented cases. Further, I modified the tests to include `NaN's` among the random numbers, to verify that the ordered/unordered comparisions are correct. **Discussion / Context / Future Work** 1. From what I understand, we currently never introduce a `CMoveF/D`, unless asked for by `UseCMoveUnconditionally` (`C->use_c_move()`). If the flag is set, we attribute no cost to the CMove, else we take `Matcher::float_cmove_cost()`, which seems to be `ConditionalMoveLimit`, and so the Phi is never converted into a CMove. An then if one wants to convert these scalar-CMove into a vector-CMove, one needs to activate the flag `UseVectorCmov`. @vnkozlov did some research: the goal was always to have this be on by default eventually. I see 2 paths here: either we obsolete `UseVectorCmov`, and implicitly have it on. Or we keep it, but make it by default on. I can do some performance measurements in a follow-up **RFE**. 2. I also saw that `int` and `long` are also CMove'd in `PhaseIdealLoop::conditional_move`. Especially `int` can currently be CMove'd without the `UseCMoveUnconditionally` flag. It would be nice to allow them to be vectorized. This is a small fix, but I'd like to do the testing and performance analysis for it. So a separate **RFE**. A slightly more involved idea: also allow cmp / blend with types of different widths (eg. compare `int` but cmove `double`). That would require a cast on the vector-mask. 3. It is a shame that scalar-CMove is on its own usually not profitable. But together with `SuperWord` it would be profitable. But if it is not scalar-CMove'd first, we fail to vectorize, since the loop has control-flow. It is one of my dreams: allow `SuperWord` to handle control flow, and to the `If-conversion` (CMove) directly with vectorization. Let me know if you have any thoughts or ideas. ------------- Commit messages: - remove UseVectorCmov condition in assert from JDK-8304720 - Merge branch 'master' into JDK-8306088 - fix type for VectorMaskCmpNode - whitespaces: replaced tabs with spaces - small refactoring / improved comments - Restrict Cmp and Bool for CMove - Add NaN test, and fix the NaN bug (it also existed with CMoveVF/D) - remove useless and buggy cmpOp_vcmppd - Merge branch 'master' into JDK-8306088 - UseVectorCmov can block Cmp in SuperWord::implemented - ... and 14 more: https://git.openjdk.org/jdk/compare/bb3e44d8...81d4de72 Changes: https://git.openjdk.org/jdk/pull/13493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13493&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306302 Stats: 1370 lines in 13 files changed: 787 ins; 550 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/13493.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13493/head:pull/13493 PR: https://git.openjdk.org/jdk/pull/13493 From fjiang at openjdk.org Tue May 9 08:11:31 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 9 May 2023 08:11:31 GMT Subject: RFR: 8307651: RISC-V: Correct format typo for stringL_indexof_char instruction Message-ID: Hi. Can I have reviews for this trivial patch that fixes a typo in the format of `stringL_indexof_char` instruction? It should be `StringLatin1` instead of `StringUTF16` for `StrIntrinsicNode::L`. instruct stringL_indexof_char(iRegP_R11 str1, iRegI_R12 cnt1, iRegI_R13 ch, iRegI_R10 result, iRegINoSp tmp1, iRegINoSp tmp2, iRegINoSp tmp3, iRegINoSp tmp4, rFlagsReg cr) %{ match(Set result (StrIndexOfChar (Binary str1 cnt1) ch)); predicate(!UseRVV && (((StrIndexOfCharNode*)n)->encoding() == StrIntrinsicNode::L)); effect(USE_KILL str1, USE_KILL cnt1, USE_KILL ch, TEMP_DEF result, TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, KILL cr); format %{ "StringUTF16 IndexOf char[] $str1,$cnt1,$ch -> $result" %} ====> Should be StringLatin1 here. ins_encode %{ __ string_indexof_char($str1$$Register, $cnt1$$Register, $ch$$Register, $result$$Register, $tmp1$$Register, $tmp2$$Register, $tmp3$$Register, $tmp4$$Register, true /* isL */); %} ins_pipe(pipe_class_memory); %} ------------- Commit messages: - Fix typo in stringL_indexof_char instruction Changes: https://git.openjdk.org/jdk/pull/13881/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13881&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307651 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13881.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13881/head:pull/13881 PR: https://git.openjdk.org/jdk/pull/13881 From fyang at openjdk.org Tue May 9 08:31:17 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 May 2023 08:31:17 GMT Subject: RFR: 8307651: RISC-V: Correct format typo for stringL_indexof_char instruction In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:02:11 GMT, Feilong Jiang wrote: > Hi. > > Can I have reviews for this trivial patch that fixes a typo in the format of `stringL_indexof_char` instruction? It should be `StringLatin1` instead of `StringUTF16` for `StrIntrinsicNode::L`. > > > instruct stringL_indexof_char(iRegP_R11 str1, iRegI_R12 cnt1, iRegI_R13 ch, > iRegI_R10 result, iRegINoSp tmp1, iRegINoSp tmp2, > iRegINoSp tmp3, iRegINoSp tmp4, rFlagsReg cr) > %{ > match(Set result (StrIndexOfChar (Binary str1 cnt1) ch)); > predicate(!UseRVV && (((StrIndexOfCharNode*)n)->encoding() == StrIntrinsicNode::L)); > effect(USE_KILL str1, USE_KILL cnt1, USE_KILL ch, TEMP_DEF result, > TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, KILL cr); > > format %{ "StringUTF16 IndexOf char[] $str1,$cnt1,$ch -> $result" %} ====> Should be StringLatin1 here. > ins_encode %{ > __ string_indexof_char($str1$$Register, $cnt1$$Register, $ch$$Register, > $result$$Register, $tmp1$$Register, $tmp2$$Register, > $tmp3$$Register, $tmp4$$Register, true /* isL */); > %} > ins_pipe(pipe_class_memory); > %} Looks good. You might want to leave a space among the operands both for 'stringU_indexof_char' and 'stringL_indexof_char'. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13881#pullrequestreview-1418134947 From ysuenaga at openjdk.org Tue May 9 08:36:35 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Tue, 9 May 2023 08:36:35 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v5] In-Reply-To: References: Message-ID: > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Move description for MemFree and MemAvailable to os.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13398/files - new: https://git.openjdk.org/jdk/pull/13398/files/13e3dcb9..6aba3440 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=03-04 Stats: 9 lines in 2 files changed: 5 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13398/head:pull/13398 PR: https://git.openjdk.org/jdk/pull/13398 From thartmann at openjdk.org Tue May 9 08:53:33 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 08:53:33 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v2] In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Eager computation to avoid racy update of remaining fields ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13868/files - new: https://git.openjdk.org/jdk/pull/13868/files/ad9863c5..27d502d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=00-01 Stats: 49 lines in 2 files changed: 17 ins; 17 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13868/head:pull/13868 PR: https://git.openjdk.org/jdk/pull/13868 From thartmann at openjdk.org Tue May 9 08:53:35 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 08:53:35 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Mon, 8 May 2023 15:52:56 GMT, Quan Anh Mai wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > It seems to me that if 2 threads access a variable racily without using atomic, it results in undefined behaviours. This patch removes the logic and therefore removes the race condition. Is there any risk with that in the surrounding code, too? I see you reordered `_hash_computed` and `_exact_klass_computed` assignment, if these reorders matter, should they be at least releasing stores instead? Thanks a lot. Thanks for looking at this, @merykitty. I did these reorders because I found it weird that we set `..._computed` before initializing the corresponding fields. I thought that for `_hash` and `_exact_klass` it shouldn't really matter if they are racily accessed. On second thought though, I think you are right that this could lead to undefined behavior, especially if the C++ compiler decides to reorder again. One thread could then observe `_exact_klass_computed == true` and read garbage from `_exact_klass` because another thread is still in the process of computing and setting `_exact_klass`. Since lock-free data structures are hard to get rid, and the overhead would only be needed for a few shared types, I propose to eagerly compute `_hash` and `_exact_klass`. I pushed an updated fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1539709643 From fjiang at openjdk.org Tue May 9 09:08:26 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 9 May 2023 09:08:26 GMT Subject: RFR: 8307651: RISC-V: Correct format typo for stringL_indexof_char instruction [v2] In-Reply-To: References: Message-ID: > Hi. > > Can I have reviews for this trivial patch that fixes a typo in the format of `stringL_indexof_char` instruction? It should be `StringLatin1` instead of `StringUTF16` for `StrIntrinsicNode::L`. > > > instruct stringL_indexof_char(iRegP_R11 str1, iRegI_R12 cnt1, iRegI_R13 ch, > iRegI_R10 result, iRegINoSp tmp1, iRegINoSp tmp2, > iRegINoSp tmp3, iRegINoSp tmp4, rFlagsReg cr) > %{ > match(Set result (StrIndexOfChar (Binary str1 cnt1) ch)); > predicate(!UseRVV && (((StrIndexOfCharNode*)n)->encoding() == StrIntrinsicNode::L)); > effect(USE_KILL str1, USE_KILL cnt1, USE_KILL ch, TEMP_DEF result, > TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, KILL cr); > > format %{ "StringUTF16 IndexOf char[] $str1,$cnt1,$ch -> $result" %} ====> Should be StringLatin1 here. > ins_encode %{ > __ string_indexof_char($str1$$Register, $cnt1$$Register, $ch$$Register, > $result$$Register, $tmp1$$Register, $tmp2$$Register, > $tmp3$$Register, $tmp4$$Register, true /* isL */); > %} > ins_pipe(pipe_class_memory); > %} Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: add spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13881/files - new: https://git.openjdk.org/jdk/pull/13881/files/3f386f95..fab8b5ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13881&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13881&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13881.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13881/head:pull/13881 PR: https://git.openjdk.org/jdk/pull/13881 From fjiang at openjdk.org Tue May 9 09:08:27 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 9 May 2023 09:08:27 GMT Subject: RFR: 8307651: RISC-V: Correct format typo for stringL_indexof_char instruction [v2] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:28:17 GMT, Fei Yang wrote: > Looks good. You might want to leave a space among the operands both for 'stringU_indexof_char' and 'stringL_indexof_char'. Thanks for the review! I have added some spaces among the operands. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13881#issuecomment-1539737680 From duke at openjdk.org Tue May 9 09:24:11 2023 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 9 May 2023 09:24:11 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized [v2] In-Reply-To: References: Message-ID: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: Update benchmark copyright and remove invalid copypasted comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13587/files - new: https://git.openjdk.org/jdk/pull/13587/files/b7cc690d..04a4118e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13587&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13587&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13587.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13587/head:pull/13587 PR: https://git.openjdk.org/jdk/pull/13587 From duke at openjdk.org Tue May 9 09:26:25 2023 From: duke at openjdk.org (Tobias Hotz) Date: Tue, 9 May 2023 09:26:25 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: <894mJ_qWaSSwun8ctVwiaMgpucpUZNRD64G6rReeTdE=.13fbaff3-8512-4153-bddf-2e5c75ce38d9@github.com> On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. Regarding operations on memory: I am not sure if this would be benificial, as (if I understand it correctly) these operations cannot be marco-fused. C++ compilers like clang or gcc also don't seem to emit test instructions with memory operands and immediates. So I would like to omit that from the pull request and limit it to register case, which never result in performance degradation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13587#issuecomment-1539768370 From fyang at openjdk.org Tue May 9 09:52:25 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 May 2023 09:52:25 GMT Subject: RFR: 8307651: RISC-V: Correct format typo for stringL_indexof_char instruction [v2] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 09:08:26 GMT, Feilong Jiang wrote: >> Hi. >> >> Can I have reviews for this trivial patch that fixes a typo in the format of `stringL_indexof_char` instruction? It should be `StringLatin1` instead of `StringUTF16` for `StrIntrinsicNode::L`. >> >> >> instruct stringL_indexof_char(iRegP_R11 str1, iRegI_R12 cnt1, iRegI_R13 ch, >> iRegI_R10 result, iRegINoSp tmp1, iRegINoSp tmp2, >> iRegINoSp tmp3, iRegINoSp tmp4, rFlagsReg cr) >> %{ >> match(Set result (StrIndexOfChar (Binary str1 cnt1) ch)); >> predicate(!UseRVV && (((StrIndexOfCharNode*)n)->encoding() == StrIntrinsicNode::L)); >> effect(USE_KILL str1, USE_KILL cnt1, USE_KILL ch, TEMP_DEF result, >> TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, KILL cr); >> >> format %{ "StringUTF16 IndexOf char[] $str1,$cnt1,$ch -> $result" %} ====> Should be StringLatin1 here. >> ins_encode %{ >> __ string_indexof_char($str1$$Register, $cnt1$$Register, $ch$$Register, >> $result$$Register, $tmp1$$Register, $tmp2$$Register, >> $tmp3$$Register, $tmp4$$Register, true /* isL */); >> %} >> ins_pipe(pipe_class_memory); >> %} > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > add spaces Thanks for the update. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13881#pullrequestreview-1418291479 From roland at openjdk.org Tue May 9 09:57:37 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 May 2023 09:57:37 GMT Subject: RFR: 8307131: C2: assert(false) failed: malformed control flow In-Reply-To: References: Message-ID: On Fri, 5 May 2023 15:47:33 GMT, Vladimir Kozlov wrote: >> The IR graph has a loop nest with 2 loops and 2 safepoints. Both >> safepoints are in the inner loop. One is on the backedge of the inner >> loop. The inner loop is transformed into a counted loop and that >> safepoint is removed. The other safepoint is right above the inner >> loop's exit condition. The outer strip mined loop is constructed and >> the safepoint is moved to the outer strip mined loop eventhough that >> safepoint is marked as non deleteable. The inner loop is later on >> removed, the outer strip mined loop is too, so is the safepoint. What >> was the outer loop of the 2 loop nest becomes an infinite loop without >> a safepoint and is considered dead code which in turn causes the >> assert to fire. >> >> The fix I propose is to only build the strip mined loop if the >> safepoint that's moved to the outer strip mined loop is deleteable. > > Good. @vnkozlov @chhagedorn @TobiHartmann thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13826#issuecomment-1539816409 From roland at openjdk.org Tue May 9 09:57:39 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 May 2023 09:57:39 GMT Subject: Integrated: 8307131: C2: assert(false) failed: malformed control flow In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:48:06 GMT, Roland Westrelin wrote: > The IR graph has a loop nest with 2 loops and 2 safepoints. Both > safepoints are in the inner loop. One is on the backedge of the inner > loop. The inner loop is transformed into a counted loop and that > safepoint is removed. The other safepoint is right above the inner > loop's exit condition. The outer strip mined loop is constructed and > the safepoint is moved to the outer strip mined loop eventhough that > safepoint is marked as non deleteable. The inner loop is later on > removed, the outer strip mined loop is too, so is the safepoint. What > was the outer loop of the 2 loop nest becomes an infinite loop without > a safepoint and is considered dead code which in turn causes the > assert to fire. > > The fix I propose is to only build the strip mined loop if the > safepoint that's moved to the outer strip mined loop is deleteable. This pull request has now been integrated. Changeset: d2b3eef0 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/d2b3eef0f2d48446613955cabe69cb4236042878 Stats: 58 lines in 2 files changed: 57 ins; 0 del; 1 mod 8307131: C2: assert(false) failed: malformed control flow Reviewed-by: kvn, chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13826 From thartmann at openjdk.org Tue May 9 11:56:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 11:56:14 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v2] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 08:53:33 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Eager computation to avoid racy update of remaining fields I have another set of refactoring changes. Will push these later today after some more cleanup / testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1540012712 From jeisl at openjdk.org Tue May 9 12:32:29 2023 From: jeisl at openjdk.org (Josef Eisl) Date: Tue, 9 May 2023 12:32:29 GMT Subject: Integrated: 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 In-Reply-To: References: Message-ID: On Mon, 8 May 2023 08:34:15 GMT, Josef Eisl wrote: > As a result of the changes for [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), `HotSpotConstantPool#lookupBootstrapMethodIntrospection` is broken as it no longer decodes the correct constant pool index for looking up the bootstrap method invocation for invokedynamic. > > This fixes the problem and modifies `TestDynamicConstant` to exercise the code in question. > > I've manually verified that this solved the native image issues that uncovered the problem. This pull request has now been integrated. Changeset: 040cb7b5 Author: Josef Eisl Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/040cb7b5a9d0d11c601749951df8ff3089250049 Stats: 21 lines in 2 files changed: 3 ins; 13 del; 5 mod 8307588: [JVMCI] HotSpotConstantPool#lookupBootstrapMethodInvocation broken by JDK-8301995 Reviewed-by: dnsimon, never, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13858 From duke at openjdk.org Tue May 9 13:11:33 2023 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 9 May 2023 13:11:33 GMT Subject: RFR: 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 Message-ID: - The `finalize()` method is replaced with `cleanup()`. - A new constructor is added to register the cleanup method. - A static `Cleaner` is defined to have only one cleaner thread for all the 15000 instances. Otherwise, we get an `OutOfMemoryException` on cleaner thread creation. ------------- Commit messages: - 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 Changes: https://git.openjdk.org/jdk/pull/13886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13886&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305081 Stats: 13 lines in 1 file changed: 11 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13886/head:pull/13886 PR: https://git.openjdk.org/jdk/pull/13886 From thartmann at openjdk.org Tue May 9 14:14:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 14:14:15 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13868/files - new: https://git.openjdk.org/jdk/pull/13868/files/27d502d0..d0dce7b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=01-02 Stats: 52 lines in 3 files changed: 19 ins; 15 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/13868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13868/head:pull/13868 PR: https://git.openjdk.org/jdk/pull/13868 From thartmann at openjdk.org Tue May 9 14:14:18 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 14:14:18 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v2] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 08:53:33 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Eager computation to avoid racy update of remaining fields Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1540202441 From fjiang at openjdk.org Tue May 9 14:32:16 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 9 May 2023 14:32:16 GMT Subject: RFR: 8307758: RISC-V: Improve bit test code introduced by JDK-8291555 Message-ID: [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) introduced some single-bit tests that use `andi`, we can replace it with `test_bit` to avoid using the temp register when UseZbs is enabled. Testing: - [x] tier1-tier3 on Unmatched board (release build) ------------- Commit messages: - more test_bit Changes: https://git.openjdk.org/jdk/pull/13882/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13882&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307758 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13882.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13882/head:pull/13882 PR: https://git.openjdk.org/jdk/pull/13882 From sviswanathan at openjdk.org Tue May 9 18:40:21 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 May 2023 18:40:21 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized [v2] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 09:24:11 GMT, Tobias Hotz wrote: >> This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. >> Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. >> I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: >> Before: >> >> Benchmark Mode Cnt Score Error Units >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op >> >> After: >> >> Benchmark Mode Cnt Score Error Units Improvement >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) >> >> As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. >> I've tested my changes using the Tier1 jtreg Tests on Windows. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Update benchmark copyright and remove invalid copypasted comment Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13587#pullrequestreview-1419252721 From qamai at openjdk.org Tue May 9 18:53:23 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 May 2023 18:53:23 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized [v2] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 09:24:11 GMT, Tobias Hotz wrote: >> This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. >> Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. >> I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: >> Before: >> >> Benchmark Mode Cnt Score Error Units >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op >> >> After: >> >> Benchmark Mode Cnt Score Error Units Improvement >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) >> >> As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. >> I've tested my changes using the Tier1 jtreg Tests on Windows. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Update benchmark copyright and remove invalid copypasted comment Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13587#pullrequestreview-1419272003 From cslucas at openjdk.org Tue May 9 19:06:36 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 9 May 2023 19:06:36 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 9 May 2023 00:03:26 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: > * new list of objects which enumerates all scalarized instances which needs to be rematerialized; > * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). > > It should be performed before `rematerialize_objects`. > > By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. Thanks a lot for clarifying @iwanowww . I'll start working on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1540738709 From sviswanathan at openjdk.org Wed May 10 01:03:19 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 May 2023 01:03:19 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v6] In-Reply-To: <2ODJH1IFMOVjRgjQIeobF2eb_nxTCgnxcV__ttNz9nw=.7cbf388a-0a65-4d1c-8b60-d29ae3502123@github.com> References: <2ODJH1IFMOVjRgjQIeobF2eb_nxTCgnxcV__ttNz9nw=.7cbf388a-0a65-4d1c-8b60-d29ae3502123@github.com> Message-ID: <26rzNFCK7HGbnD6uJkCZ7niSgLDZz4fEPl7OeVkxqrQ=.8622554c-eefb-4a72-8682-297ec3c27cf3@github.com> On Tue, 25 Apr 2023 14:43:23 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into conv2b-x86-lowering > - Whitespace tweak > - Make transform conditional > - Remove Conv2B from backend as it's macro expanded now > - Re-work transform to happen in macro expansion > - Fix whitespace and add bug tag to IR test > - Merge branch 'master' into conv2b-x86-lowering > - Merge branch 'master' into conv2b-x86-lowering > - Merge branch 'master' into conv2b-x86-lowering > - Merge branch 'master' into conv2b-x86-lowering > - ... and 1 more: https://git.openjdk.org/jdk/compare/bad6aa68...295b9a67 src/hotspot/share/opto/cfgnode.cpp line 1530: > 1528: if (phase->C->post_loop_opts_phase()) { > 1529: return nullptr; > 1530: } Should this only be done if (!Matcher::match_rule_supported(Op_Conv2B))? src/hotspot/share/opto/convertnode.hpp line 36: > 34: class Conv2BNode : public Node { > 35: public: > 36: Conv2BNode(Node* i) : Node(nullptr, i) {} Need to also update the copyright year to 2023 for convertnode.hpp. src/hotspot/share/opto/movenode.cpp line 213: > 211: if (phase->C->post_loop_opts_phase()) { > 212: return nullptr; > 213: } Should this only be done if (!Matcher::match_rule_supported(Op_Conv2B))? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1189253370 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1189254778 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1189253774 From fjiang at openjdk.org Wed May 10 03:13:35 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 10 May 2023 03:13:35 GMT Subject: Integrated: 8307651: RISC-V: stringL_indexof_char instruction has wrong format string In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:02:11 GMT, Feilong Jiang wrote: > Hi. > > Can I have reviews for this trivial patch that fixes a typo in the format of `stringL_indexof_char` instruction? It should be `StringLatin1` instead of `StringUTF16` for `StrIntrinsicNode::L`. > > > instruct stringL_indexof_char(iRegP_R11 str1, iRegI_R12 cnt1, iRegI_R13 ch, > iRegI_R10 result, iRegINoSp tmp1, iRegINoSp tmp2, > iRegINoSp tmp3, iRegINoSp tmp4, rFlagsReg cr) > %{ > match(Set result (StrIndexOfChar (Binary str1 cnt1) ch)); > predicate(!UseRVV && (((StrIndexOfCharNode*)n)->encoding() == StrIntrinsicNode::L)); > effect(USE_KILL str1, USE_KILL cnt1, USE_KILL ch, TEMP_DEF result, > TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, KILL cr); > > format %{ "StringUTF16 IndexOf char[] $str1,$cnt1,$ch -> $result" %} ====> Should be StringLatin1 here. > ins_encode %{ > __ string_indexof_char($str1$$Register, $cnt1$$Register, $ch$$Register, > $result$$Register, $tmp1$$Register, $tmp2$$Register, > $tmp3$$Register, $tmp4$$Register, true /* isL */); > %} > ins_pipe(pipe_class_memory); > %} This pull request has now been integrated. Changeset: d3e6d04e Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/d3e6d04e3eddfd26433f9cb95cfa9bff05b14bd6 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8307651: RISC-V: stringL_indexof_char instruction has wrong format string Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/13881 From fyang at openjdk.org Wed May 10 03:50:23 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 May 2023 03:50:23 GMT Subject: RFR: 8307758: RISC-V: Improve bit test code introduced by JDK-8291555 In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:14:30 GMT, Feilong Jiang wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) introduced some single-bit tests that use `andi`, we can replace it with `test_bit` to avoid using the temp register when UseZbs is enabled. > > Testing: > - [x] tier1-tier3 on Unmatched board (release build) Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13882#pullrequestreview-1419736700 From ysuenaga at openjdk.org Wed May 10 04:14:27 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 10 May 2023 04:14:27 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo In-Reply-To: References: Message-ID: On Mon, 8 May 2023 13:14:26 GMT, Severin Gehwolf wrote: >> @tstuefe @robcasloz >> >> I updated this PR to implement both `free_memory` and `available_memory`. In Linux, `free_memory` refers MemFree (equivalent with older `available_memory`), and `available_memory` refers MemAvailable. In other platforms, `free_memory` proxies `available_memory`. And also `CompileBroker` uses `free_memory` rather than `available_memory`. Some GHA checks were failed, but I think they are not caused by this change. > > @YaSuenag Windows GHA issue should go away if you merge with latest master. See https://bugs.openjdk.org/browse/JDK-8306543 @jerboaa @dholmes-ora I added comment for `available_memory` and `free_memory`. Is it enough? https://github.com/openjdk/jdk/pull/13398/commits/6aba3440131936a82daccdb65e7b241c022a21c7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13398#issuecomment-1541326841 From epeter at openjdk.org Wed May 10 04:57:39 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 04:57:39 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v3] In-Reply-To: References: Message-ID: <-JXzmyqpa8WptOh0CpaAk_4DpTzJrQ8KQre1vy4eH1M=.a4f7730b-7d80-4d80-a335-817ccd8df71b@github.com> > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > Performance testing did not show any regressions. > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Address review suggestion by @pfustc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/5a51ac37..56990bde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=01-02 Stats: 66 lines in 3 files changed: 21 ins; 29 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From thartmann at openjdk.org Wed May 10 05:48:20 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 May 2023 05:48:20 GMT Subject: RFR: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized [v2] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 09:24:11 GMT, Tobias Hotz wrote: >> This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. >> Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. >> I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: >> Before: >> >> Benchmark Mode Cnt Score Error Units >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op >> >> After: >> >> Benchmark Mode Cnt Score Error Units Improvement >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) >> AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) >> AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) >> AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) >> >> As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. >> I've tested my changes using the Tier1 jtreg Tests on Windows. > > Tobias Hotz has updated the pull request incrementally with one additional commit since the last revision: > > Update benchmark copyright and remove invalid copypasted comment Looks good to me too. Testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13587#pullrequestreview-1419816581 From duke at openjdk.org Wed May 10 05:51:37 2023 From: duke at openjdk.org (Tobias Hotz) Date: Wed, 10 May 2023 05:51:37 GMT Subject: Integrated: 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized In-Reply-To: References: Message-ID: <0NToBymPGqYFIaxMitn0LXypKxFbQvtzz7EAS_m2ZcA=.2596b124-03f3-4547-8c94-6a210227aa14@github.com> On Fri, 21 Apr 2023 17:30:39 GMT, Tobias Hotz wrote: > This patch aims to optimize a case where a And-Node followed by a Cmp-Node would not be converted into a single "test" instruction. The Architecture Description file currently only handles the cases where the right operand of the And-Node is a constant, but not if both are a register. > Before this patch, a "and" followed by a "test" would be emitted, so the removed "and" means 2 bytes less have to be emitted. > I've attached a JMH Benchmark to demonstrate the performance improvements. Here are the numbers of my Windows 11 machine: > Before: > > Benchmark Mode Cnt Score Error Units > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 26,736 ? 0,131 ns/op > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 24,305 ? 0,610 ns/op > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,052 ? 0,056 ns/op > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,355 ? 0,030 ns/op > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,587 ? 0,107 ns/op > > After: > > Benchmark Mode Cnt Score Error Units Improvement > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsInt avgt 8 22,665 ? 0,170 ns/op (~18%) > AndCmpTestInstruction.benchmarkOpaqueAndCmpEqualsLong avgt 8 18,880 ? 0,123 ns/op (~29%) > AndCmpTestInstruction.benchmarkStaticAndCmpEqualsInt avgt 8 33,198 ? 0,126 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticLargeAndCmpEqualsLong avgt 8 18,427 ? 0,079 ns/op (unchanged) > AndCmpTestInstruction.benchmarkStaticSmallAndCmpEqualsLong avgt 8 25,641 ? 0,168 ns/op (unchanged) > > As you can see, the cases with a small static mask have not improved as they have already been covered by another match rule. The test with the large static mask (benchmarkStaticLargeAndCmpEqualsLong) has a sorted instruction sequence as the value is moved to a register and it is not being used directly in the instruction, but there is no measurable performance uplift here. > I've tested my changes using the Tier1 jtreg Tests on Windows. This pull request has now been integrated. Changeset: 4b4c80bb Author: Tobias Hotz Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/4b4c80bb3171c0ab3377f1cbf62a62289ef55817 Stats: 125 lines in 2 files changed: 125 ins; 0 del; 0 mod 8307351: (CmpI/L(AndI/L reg1 reg2)) on x86 can be optimized Reviewed-by: sviswanathan, qamai, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13587 From epeter at openjdk.org Wed May 10 06:30:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 06:30:13 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v4] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > Performance testing did not show any regressions. > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: small bugfix. And put TraceNewVector in VectorNode::trace_new_vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/56990bde..72fa58e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=02-03 Stats: 54 lines in 4 files changed: 11 ins; 35 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From epeter at openjdk.org Wed May 10 06:32:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 06:32:26 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v3] In-Reply-To: <-JXzmyqpa8WptOh0CpaAk_4DpTzJrQ8KQre1vy4eH1M=.a4f7730b-7d80-4d80-a335-817ccd8df71b@github.com> References: <-JXzmyqpa8WptOh0CpaAk_4DpTzJrQ8KQre1vy4eH1M=.a4f7730b-7d80-4d80-a335-817ccd8df71b@github.com> Message-ID: On Wed, 10 May 2023 04:57:39 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Address review suggestion by @pfustc @pfustc I refactored the reshaping-code drastically. I hope it is now a bit more clear and concise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1541425233 From pli at openjdk.org Wed May 10 08:08:19 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 10 May 2023 08:08:19 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 06:30:13 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > small bugfix. And put TraceNewVector in VectorNode::trace_new_vector Thanks for your changes. It generally looks good to me. src/hotspot/share/opto/loopnode.cpp line 4636: > 4634: for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { > 4635: IdealLoopTree* lpt = iter.current(); > 4636: if (lpt->_head->is_CountedLoop()) { Using `lpt->is_counted()`? And how about adding one more condition of `lpt->is_innermost()` in this `if`? As all loops in the IdealLoopTree are iterated here but only reduction operations in innermost loops may be vectorized by current Superword, we don't need to run into your `move_unordered_reduction_out_of_loop()` function for further analysis if a loop is not innermost. ------------- PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1420008845 PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1189509929 From sgehwolf at openjdk.org Wed May 10 08:24:16 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 10 May 2023 08:24:16 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo In-Reply-To: References: Message-ID: On Mon, 8 May 2023 13:14:26 GMT, Severin Gehwolf wrote: >> @tstuefe @robcasloz >> >> I updated this PR to implement both `free_memory` and `available_memory`. In Linux, `free_memory` refers MemFree (equivalent with older `available_memory`), and `available_memory` refers MemAvailable. In other platforms, `free_memory` proxies `available_memory`. And also `CompileBroker` uses `free_memory` rather than `available_memory`. Some GHA checks were failed, but I think they are not caused by this change. > > @YaSuenag Windows GHA issue should go away if you merge with latest master. See https://bugs.openjdk.org/browse/JDK-8306543 > @jerboaa @dholmes-ora I added comment for `available_memory` and `free_memory`. Is it enough? [6aba344](https://github.com/openjdk/jdk/commit/6aba3440131936a82daccdb65e7b241c022a21c7) It's fine with me. I'll let @dholmes-ora have the final say. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13398#issuecomment-1541560821 From sgehwolf at openjdk.org Wed May 10 08:24:18 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 10 May 2023 08:24:18 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v2] In-Reply-To: <5xy2JOZ1lQY25wI1cLAM3noFSOAB0JtT3L1zjXpwHPU=.1c0736cf-13b2-4702-b2df-c37892a955d3@github.com> References: <1IB3l4_DimiOYMACS_dEGE8OzrtfUlCy5QwPahT-Bx8=.8af56e59-ce08-4160-8e3a-3899d012e46f@github.com> <5xy2JOZ1lQY25wI1cLAM3noFSOAB0JtT3L1zjXpwHPU=.1c0736cf-13b2-4702-b2df-c37892a955d3@github.com> Message-ID: On Tue, 9 May 2023 04:37:41 GMT, Yasumasa Suenaga wrote: >> Yes some detailed commentary is very necessary. It looks very odd to have two functions that do the same thing on every platform but Linux. > > I added comments into os_linux.hpp . I will move them to os.hpp . > https://github.com/openjdk/jdk/pull/13398/files#diff-b6c4026228694834053813a8a8ea4795b5edb29eeb9cc46cd234c8e3b92336f4R61-R64 This looks fine to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1189528184 From epeter at openjdk.org Wed May 10 09:41:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 09:41:31 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v2] In-Reply-To: <25lksRBizIIfNL3HxxyG7YUCm1KF1FdgvRocnrlxtqI=.9d604245-3f20-49fd-bfae-f9a2b9e336c6@github.com> References: <25lksRBizIIfNL3HxxyG7YUCm1KF1FdgvRocnrlxtqI=.9d604245-3f20-49fd-bfae-f9a2b9e336c6@github.com> Message-ID: On Mon, 8 May 2023 14:43:20 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix summary Nice work! The blog was a pleasure to read. I left a few comments. src/hotspot/share/opto/loopPredicate.cpp line 44: > 42: /* > 43: * The general idea of Loop Predication is to insert a predicate on the entry > 44: * path to a loop, and raise a uncommon trap if the check of the condition fails. Maybe say `uncommon trap` for runtime checks (ok if they fail), and `HaltNode` for assertion predicates (cannot fail)? src/hotspot/share/opto/loopPredicate.cpp line 73: > 71: * predicate is created during Loop Predication and is inserted above the Profiled Loop > 72: * Parse Predicate. > 73: * - Loop Limit Check Predicate: This predicate is created when transforming a loop to a counted loop. It does not replace What does it do? Check that the counter does never overflow? src/hotspot/share/opto/loopPredicate.cpp line 82: > 80: * Parse and Assertion Predicates are always removed before code generation (except for Initialized > 81: * Assertion Predicates which are kept in debug builds while being removed in product builds). > 82: * - Regular Predicate: This term is used to refer to a Runtime Predicate or a Parse Predicate and can be used to Is this exactly the class of predicates that lead to uncommon traps? You could consider making this list more hierarchical. Name superclasser first. src/hotspot/share/opto/loopPredicate.cpp line 93: > 91: * away to avoid a broken graph. Assertion Predicates are left in the graph as a sanity checks in > 92: * debug builds (they must never fail at runtime) while they are being removed in product builds. > 93: * We use special Opaque4 nodes to block some optimizations and replace the Assertion Predicates Have you considered giving `Opaque4` a more descriptive name? src/hotspot/share/opto/loopPredicate.cpp line 101: > 99: * This predicate does not represent an actual check, yet, and > 100: * just serves as a template to create an Initialized Assertion > 101: * Predicate from for a (sub) loop. Suggestion: * Predicate for a (sub) loop. src/hotspot/share/opto/loopPredicate.cpp line 112: > 110: * - Loop Predication: A range check inside a loop is replaced by a Hoisted Predicate before > 111: * the loop. We add two additional Template Assertion Predicates which > 112: * are later used to create Initialized Assertion Predicates from. One Suggestion: * are later used to create Initialized Assertion Predicates. One Or: `We add two additional Template Assertion Predicates from which we can later create the Initialized Assertion Predicates.` src/hotspot/share/opto/loopPredicate.cpp line 137: > 135: * equal. The Initialized Assertion Predicates are always true because > 136: * their range is covered by a corresponding Hoisted Predicate. > 137: * - Range Check Elimination: A range check is removed from the main-loop by changing the pre Optional: Mention why Loop Predication does not cover this case. src/hotspot/share/opto/loopPredicate.cpp line 145: > 143: * OpaqueLoop* nodes by actual values for the unrolled loop. > 144: * The Initialized Assertion Predicates are always true because their > 145: * range is covered by the main-loop entry guard. As discussed in person: reason is pre and main exit condition. src/hotspot/share/opto/loopPredicate.cpp line 155: > 153: * (JDK-8288981). > 154: * - Regular Predicate Block: A Regular Predicate Block consists of a Parse Predicate a Runtime Predicate Block (all its > 155: * Runtime Predicates, if any). There are three such blocks: A Regular Predicate Block consists of a Runtime Predicate Block (all its Runtime Predicates, if any) with a Parse Predicate after it. src/hotspot/share/opto/loopPredicate.cpp line 167: > 165: * ... | Block | Loop Predicate Block > 166: * [Loop Hoisted Predicate n + 2 Template Assertion Predicates] / | > 167: * Loop Parse Predicate / Suggestion: * Loop Parse Predicate / src/hotspot/share/opto/loopPredicate.cpp line 182: > 180: * and applying Range Check Elimination (the order is insignificant): > 181: * > 182: * Main Loop entry guard Suggestion: * Main Loop entry (zero-trip) guard src/hotspot/share/opto/loopPredicate.cpp line 182: > 180: * and applying Range Check Elimination (the order is insignificant): > 181: * > 182: * Main Loop entry guard Suggestion: * Main Loop entry (zero-trip) guard src/hotspot/share/opto/loopPredicate.cpp line 456: > 454: IfProjNode* PhaseIdealLoop::clone_parse_predicate_to_unswitched_loop(ParsePredicateSuccessProj* predicate_proj, > 455: Node* new_entry, Deoptimization::DeoptReason reason, > 456: const bool slow_loop) { Why not just one argument per line? ------------- PR Review: https://git.openjdk.org/jdk/pull/13864#pullrequestreview-1420018645 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189516154 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189556637 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189559673 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189562771 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189563920 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189569192 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189575199 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189591532 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189606227 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189594084 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189611843 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189612071 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189613598 From roland at openjdk.org Wed May 10 11:33:30 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 May 2023 11:33:30 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" Message-ID: pre/main/post loops are created for an inner loop of a loop nest but assert predicates cause the main and post loops to be removed. The OpaqueZeroTripGuard nodes for the loops are not removed: there's no logic to trigger removal of the opaque nodes once the loops are no longer there. With the inner loops gone, the outer loop becomes candidate for optimizations and is unrolled which causes the zero trip guards of the now removed loops to be duplicated and the opaque nodes to have more than one use. The fix I propose is, using logic similar to `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop opts if every OpaqueZeroTripGuard node guards a loop and if not, remove it. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/13901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305189 Stats: 134 lines in 5 files changed: 134 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13901/head:pull/13901 PR: https://git.openjdk.org/jdk/pull/13901 From epeter at openjdk.org Wed May 10 11:45:38 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 11:45:38 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: Message-ID: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. > > The lines without note show clear speedup as expected. > > Notes: > 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) > 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. > 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). > 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. > 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. > 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). > > **Testing** > > I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. > > Passes up to tier5 and stress-testing. > Performance testing did not show any regressions. > **TODO** can someone benchmark on `aarch64`? > > **Discussion** > > We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: > https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 > I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). > > So far, I did not work on `byte, char, short`, we can investigate this in the future. > > FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use is_counted and is_innermost ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/72fa58e0..31d977c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From epeter at openjdk.org Wed May 10 11:45:43 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 11:45:43 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 08:03:57 GMT, Pengfei Li wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> small bugfix. And put TraceNewVector in VectorNode::trace_new_vector > > src/hotspot/share/opto/loopnode.cpp line 4636: > >> 4634: for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { >> 4635: IdealLoopTree* lpt = iter.current(); >> 4636: if (lpt->_head->is_CountedLoop()) { > > Using `lpt->is_counted()`? And how about adding one more condition of `lpt->is_innermost()` in this `if`? As all loops in the IdealLoopTree are iterated here but only reduction operations in innermost loops may be vectorized by current Superword, we don't need to run into your `move_unordered_reduction_out_of_loop()` function for further analysis if a loop is not innermost. Thanks for the suggestion, I changed it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1189781256 From roland at openjdk.org Wed May 10 11:57:18 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 May 2023 11:57:18 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 14:14:15 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Looks good to me. Nice refactoring and simplification. src/hotspot/share/opto/type.cpp line 3311: > 3309: } > 3310: > 3311: bool TypePtr::InterfaceSet::eq(ciInstanceKlass* k, InterfaceHandling interface_handling) const { Why not remove the interface_handling parameter? It doesn't seem useful. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13868#pullrequestreview-1420451736 PR Review Comment: https://git.openjdk.org/jdk/pull/13868#discussion_r1189788911 From yzheng at openjdk.org Wed May 10 11:58:37 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 10 May 2023 11:58:37 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. Message-ID: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. ------------- Commit messages: - export markWord::lock_mask_in_place to JVMCI compilers. Changes: https://git.openjdk.org/jdk/pull/13902/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13902&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307813 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13902/head:pull/13902 PR: https://git.openjdk.org/jdk/pull/13902 From dnsimon at openjdk.org Wed May 10 12:39:29 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 12:39:29 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13902#pullrequestreview-1420543625 From qamai at openjdk.org Wed May 10 13:10:28 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 10 May 2023 13:10:28 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 14:14:15 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13868#pullrequestreview-1420605016 From epeter at openjdk.org Wed May 10 13:20:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 13:20:24 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 13:14:37 GMT, Emanuel Peter wrote: > **Bug** > In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). > > The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) > On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 > > The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: > https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 > > The wrong results with `NaN` are because of a bug in `x`: > https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 > The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). > > **Solution** > @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. > > This has a few benefits: > - `VectorMaskCmp + VectorBlend` is more powerful: > - `CMoveVF/D` required the same inputs to the compare than to the move itself. > - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. > - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). > - We need less code (I completely removed all code for `CMoveVF/D`). > > I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. > > As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, the CMove code did not properly maintain the `packset` / `my_pack`. I now added some verification here, since I also just removed the problematic code, and the verification passes now with this patch. I was also able to remove the unwanted `UseVectorCmov` in an assert. > > Most of the changes come from the regression tests in `TestVectorConditionalMove.java`: I generalized it from `aarch64` to all platforms, the IR rules only apply with `avx/asimd`. And I added many new tests to cover the newly implemented cases. Further, I modified the tests to include `NaN's` among the random numbers, to verify that the ordered/unordered comparisions are correct. > > **Discussion / Context / Future Work** > > 1. From what I understand, we currently never introduce a `CMoveF/D`, unless asked for by `UseCMoveUnconditionally` (`C->use_c_move()`). If the flag is set, we attribute no cost to the CMove, else we take `Matcher::float_cmove_cost()`, which seems to be `ConditionalMoveLimit`, and so the Phi is never converted into a CMove. > > An then if one wants to convert these scalar-CMove into a vector-CMove, one needs to activate the flag `UseVectorCmov`. @vnkozlov did some research: the goal was always to have this be on by default eventually. > > I see 2 paths here: either we obsolete `UseVectorCmov`, and implicitly have it on. Or we keep it, but make it by default on. I can do some performance measurements in a follow-up **RFE**. > > 2. I also saw that `int` and `long` are also CMove'd in `PhaseIdealLoop::conditional_move`. Especially `int` can currently be CMove'd without the `UseCMoveUnconditionally` flag. It would be nice to allow them to be vectorized. This is a small fix, but I'd like to do the testing and performance analysis for it. So a separate **RFE**. A slightly more involved idea: also allow cmp / blend with types of different widths (eg. compare `int` but cmove `double`). That would require a cast on the vector-mask. > > 3. It is a shame that scalar-CMove is on its own usually not profitable. But together with `SuperWord` it would be profitable. But if it is not scalar-CMove'd first, we fail to vectorize, since the loop has control-flow. It is one of my dreams: allow `SuperWord` to handle control flow, and to the `If-conversion` (CMove) directly with vectorization. Let me know if you have any thoughts or ideas. @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1542198445 From chagedorn at openjdk.org Wed May 10 13:33:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 May 2023 13:33:10 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v3] In-Reply-To: References: Message-ID: > This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. > > To make reviewing the entire change easier, I've decided to split the work into several PRs. > > This first PR includes the following _semantic-preserving_ changes: > - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: > - Updating the code (variables, method names etc.) accordingly. > - Renaming "Skeleton Predicates" to "Assertion Predicates". > - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. > - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). > - Change `class Predicates` -> `class ParsePredicates`. > - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). > - Removing unused variables. > - Removing unnecessary checks. > - Code style fixes in touched code. > > Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. > > The blog post can be found on my Github page at: > https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html > > Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Emanuel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13864/files - new: https://git.openjdk.org/jdk/pull/13864/files/cf4525e9..97207be4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=01-02 Stats: 58 lines in 1 file changed: 21 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/13864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13864/head:pull/13864 PR: https://git.openjdk.org/jdk/pull/13864 From chagedorn at openjdk.org Wed May 10 13:33:11 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 May 2023 13:33:11 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v3] In-Reply-To: References: Message-ID: <11tm4oKKxCDZ0q0y8KFY1w4lAwbcWmevq9EnGkGZ0Q4=.10f6513f-9716-4533-9f76-c2e7ddebb933@github.com> On Wed, 10 May 2023 13:28:04 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's review Thanks a lot Emanuel for your careful code and predicate summary review and your offline feedback about the blog! I've pushed an updated addressing your comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/13864#pullrequestreview-1420547736 From chagedorn at openjdk.org Wed May 10 13:33:19 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 May 2023 13:33:19 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v2] In-Reply-To: References: <25lksRBizIIfNL3HxxyG7YUCm1KF1FdgvRocnrlxtqI=.9d604245-3f20-49fd-bfae-f9a2b9e336c6@github.com> Message-ID: On Wed, 10 May 2023 08:09:26 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix summary > > src/hotspot/share/opto/loopPredicate.cpp line 44: > >> 42: /* >> 43: * The general idea of Loop Predication is to insert a predicate on the entry >> 44: * path to a loop, and raise a uncommon trap if the check of the condition fails. > > Maybe say `uncommon trap` for runtime checks (ok if they fail), and `HaltNode` for assertion predicates (cannot fail)? I've reworded this paragraph to only reflect the idea of Loop Predication to keep things simple. > src/hotspot/share/opto/loopPredicate.cpp line 73: > >> 71: * predicate is created during Loop Predication and is inserted above the Profiled Loop >> 72: * Parse Predicate. >> 73: * - Loop Limit Check Predicate: This predicate is created when transforming a loop to a counted loop. It does not replace > > What does it do? Check that the counter does never overflow? I've added a short description. > src/hotspot/share/opto/loopPredicate.cpp line 82: > >> 80: * Parse and Assertion Predicates are always removed before code generation (except for Initialized >> 81: * Assertion Predicates which are kept in debug builds while being removed in product builds). >> 82: * - Regular Predicate: This term is used to refer to a Runtime Predicate or a Parse Predicate and can be used to > > Is this exactly the class of predicates that lead to uncommon traps? > You could consider making this list more hierarchical. Name superclasser first. Good idea, I've rearranged the terms to better reflect the hierarchy. > src/hotspot/share/opto/loopPredicate.cpp line 93: > >> 91: * away to avoid a broken graph. Assertion Predicates are left in the graph as a sanity checks in >> 92: * debug builds (they must never fail at runtime) while they are being removed in product builds. >> 93: * We use special Opaque4 nodes to block some optimizations and replace the Assertion Predicates > > Have you considered giving `Opaque4` a more descriptive name? Yes, in later commits, I change that name to `OpaqueAssertionPredicateNode`. I could have already done this change here. > src/hotspot/share/opto/loopPredicate.cpp line 155: > >> 153: * (JDK-8288981). >> 154: * - Regular Predicate Block: A Regular Predicate Block consists of a Parse Predicate a Runtime Predicate Block (all its >> 155: * Runtime Predicates, if any). There are three such blocks: > > A Regular Predicate Block consists of a Runtime Predicate Block (all its Runtime Predicates, if any) with a Parse Predicate after it. I've extended the text and switched their order. > src/hotspot/share/opto/loopPredicate.cpp line 456: > >> 454: IfProjNode* PhaseIdealLoop::clone_parse_predicate_to_unswitched_loop(ParsePredicateSuccessProj* predicate_proj, >> 455: Node* new_entry, Deoptimization::DeoptReason reason, >> 456: const bool slow_loop) { > > Why not just one argument per line? I've tried wrap at 120 characters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189898040 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189848295 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189848738 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189850859 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189868295 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1189878876 From dnsimon at openjdk.org Wed May 10 14:39:26 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 14:39:26 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit Message-ID: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: * Tracks upcalls into libjvmci or creation of libjvmci. * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). When JVMCI compilation is disabled, a warning is emitted: [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. With `-Xlog:jit+compilation`, the extra detail shown is: [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError java.lang.InternalError: aborting compilation of HotSpotMethod()> at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). ------------- Commit messages: - make JMCI more robust in low resource conditions Changes: https://git.openjdk.org/jdk/pull/13905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306992 Stats: 175 lines in 7 files changed: 112 ins; 18 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/13905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13905/head:pull/13905 PR: https://git.openjdk.org/jdk/pull/13905 From dnsimon at openjdk.org Wed May 10 14:39:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 14:39:28 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: <4j7Kdo-Mq6jr74WXinRiWVp2jGVw1oiFVnx3VQI50TI=.08b5ca4a-6958-4180-af4a-cd5b1b853f4f@github.com> On Wed, 10 May 2023 14:00:51 GMT, Doug Simon wrote: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). src/hotspot/share/jvmci/jvmci.cpp line 236: > 234: JavaThreadState state = JavaThread::cast(thread)->thread_state(); > 235: if (state == _thread_in_vm || state == _thread_in_Java || state == _thread_new) { > 236: tty->print("JVMCITrace-%d[%s]:%*c", level, thread->name(), level, ' '); This change helps correlate threads in a trace that transition in and out of libgraal (thread name is not available when in libgraal). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13905#discussion_r1189978954 From chagedorn at openjdk.org Wed May 10 16:13:28 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 May 2023 16:13:28 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:27:06 GMT, Roland Westrelin wrote: > pre/main/post loops are created for an inner loop of a loop nest but > assert predicates cause the main and post loops to be removed. The > OpaqueZeroTripGuard nodes for the loops are not removed: there's no > logic to trigger removal of the opaque nodes once the loops are no > longer there. With the inner loops gone, the outer loop becomes > candidate for optimizations and is unrolled which causes the zero trip > guards of the now removed loops to be duplicated and the opaque nodes > to have more than one use. > > The fix I propose is, using logic similar to > `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop > opts if every OpaqueZeroTripGuard node guards a loop and if not, > remove it. That looks reasonable. src/hotspot/share/opto/loopnode.cpp line 6180: > 6178: > 6179: if (!_verify_only && n->Opcode() == Op_OpaqueZeroTripGuard) { > 6180: _zero_trip_guard_opaque_nodes.push(n); That's a good idea to collect them newly here for each loop opts pass. src/hotspot/share/opto/opaquenode.cpp line 58: > 56: } > 57: > 58: CountedLoopNode* OpaqueZeroTripGuardNode::guarded_loop() const { Could be guarded with `ifdef ASSERT` since you are only using it for an assertion. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1420924277 PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190130323 PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190092689 From kvn at openjdk.org Wed May 10 16:27:14 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 May 2023 16:27:14 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. Do you need changes in Java side of JVMCI for this? ------------- PR Review: https://git.openjdk.org/jdk/pull/13902#pullrequestreview-1421001013 From never at openjdk.org Wed May 10 16:27:16 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 10 May 2023 16:27:16 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13902#pullrequestreview-1421003456 From kvn at openjdk.org Wed May 10 16:29:27 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 May 2023 16:29:27 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 14:14:15 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13868#pullrequestreview-1421007888 From dnsimon at openjdk.org Wed May 10 16:45:25 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 16:45:25 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 16:22:24 GMT, Vladimir Kozlov wrote: > Do you need changes in Java side of JVMCI for this? This value is only read by `GraalHotSpotVMConfig` which is not in the JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13902#issuecomment-1542511632 From kvn at openjdk.org Wed May 10 17:05:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 May 2023 17:05:13 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13902#pullrequestreview-1421063988 From kvn at openjdk.org Wed May 10 18:20:19 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 May 2023 18:20:19 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 2451 2493 2498 | 6 | >> >> Legend: `M` master, `P` with patch, `N` no superword reductions (`-XX:-SuperWordReductions`), `2` AVX2, `3` AVX512. >> >> The lines without note show clear speedup as expected. >> >> Notes: >> 1. `int min/max`: bug [JDK-8302673](https://bugs.openjdk.org/browse/JDK-8302673) >> 2. `long add/mul`: without the patch, it seems that vectorization actually would be slower. Even now, only AVX512 really leads to a speedup. Note: `MulReductionVL` requires `avx512dq`. >> 3. `long min/max`: `Math.max(long, long)` is currently not intrinsified [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513). >> 4. `long and/or/xor`: without patch on AVX2, vectorization is slower. With patch, it is always faster now. >> 5. `float/double add/mul`: IEEE requires linear reduction. This cannot be moved outside loop. Vectorization has no benefit in these examples. >> 6. `double min/max`: bug [JDK-8300865](https://bugs.openjdk.org/browse/JDK-8300865). >> >> **Testing** >> >> I modified the reduction IR tests, so that they expect at most 2 Reduction nodes (one per main-loop, and optionally one for the vectorized post-loop). Before my patch, these IR tests would find many Reduction nodes, and would have failed. This is because after SuperWord, we unroll the loop multiple times, and so we clone the Reduction nodes inside the main loop. >> >> Passes up to tier5 and stress-testing. >> Performance testing did not show any regressions. >> **TODO** can someone benchmark on `aarch64`? >> >> **Discussion** >> >> We should investigate if we can now allow reductions more eagerly, at least for `UnorderedReduction`, as the overhead is now much lower. @jatin-bhateja pointed to this: >> https://github.com/openjdk/jdk/blob/941a7ac7dab243c6033a78880fd31faa803e62ab/src/hotspot/share/opto/superword.cpp#L2265 >> I filed [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516). >> >> So far, I did not work on `byte, char, short`, we can investigate this in the future. >> >> FYI: I investigated if this may be helpful for the Vector API. As far as I can see, Reductions are only introduced with a vector-iunput, and the scalar-input is always the identity-element. This optimization here assumes that we have the Phi-loop going through the scalar-input. So I think this optimization here really only helps `SuperWord` for now. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1421167213 From xuelei at openjdk.org Wed May 10 20:53:56 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 10 May 2023 20:53:56 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils Message-ID: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Hi, May I have this update reviewed? The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. Thanks, Xuelei ------------- Commit messages: - 8307855: update for deprecated sprintf for src/utils Changes: https://git.openjdk.org/jdk/pull/13915/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13915&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307855 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13915/head:pull/13915 PR: https://git.openjdk.org/jdk/pull/13915 From pli at openjdk.org Thu May 11 01:12:49 2023 From: pli at openjdk.org (Pengfei Li) Date: Thu, 11 May 2023 01:12:49 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost Marked as reviewed by pli (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1421571863 From fjiang at openjdk.org Thu May 11 01:31:51 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 01:31:51 GMT Subject: RFR: 8307758: RISC-V: Improve bit test code introduced by JDK-8291555 In-Reply-To: References: Message-ID: On Wed, 10 May 2023 03:47:00 GMT, Fei Yang wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) introduced some single-bit tests that use `andi`, we can replace it with `test_bit` to avoid using the temp register when UseZbs is enabled. >> >> Testing: >> - [x] tier1-tier3 on Unmatched board (release build) > > Looks good. Thanks. @RealFYang Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13882#issuecomment-1543044930 From fjiang at openjdk.org Thu May 11 01:35:54 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 01:35:54 GMT Subject: Integrated: 8307758: RISC-V: Improve bit test code introduced by JDK-8291555 In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:14:30 GMT, Feilong Jiang wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) introduced some single-bit tests that use `andi`, we can replace it with `test_bit` to avoid using the temp register when UseZbs is enabled. > > Testing: > - [x] tier1-tier3 on Unmatched board (release build) This pull request has now been integrated. Changeset: 39f4e4d3 Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/39f4e4d3c3450ed8fe314e2abde6a6cecd5fa0a5 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod 8307758: RISC-V: Improve bit test code introduced by JDK-8291555 Co-authored-by: Fei Yang Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/13882 From fgao at openjdk.org Thu May 11 04:01:40 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 11 May 2023 04:01:40 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:17:47 GMT, Emanuel Peter wrote: > @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? Hi @eme64 , nice rewrite! BTW, have you tested your patch with `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` for all jtreg? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1543296155 From fgao at openjdk.org Thu May 11 04:01:43 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 11 May 2023 04:01:43 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 13:14:37 GMT, Emanuel Peter wrote: > **Bug** > In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). > > The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) > On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 > > The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: > https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 > > The wrong results with `NaN` are because of a bug in `x`: > https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 > The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). > > **Solution** > @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. > > This has a few benefits: > - `VectorMaskCmp + VectorBlend` is more powerful: > - `CMoveVF/D` required the same inputs to the compare than to the move itself. > - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. > - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). > - We need less code (I completely removed all code for `CMoveVF/D`). > > I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. > > As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, the CMove code did not prop... src/hotspot/share/opto/superword.cpp line 2855: > 2853: // > 2854: // The VectorMaskCmpNode does a comparison directly on in1 and in2, in the java > 2855: // standard way (all comparisons are ordered, except NEQ is unordered). Sorry, I'm a bit confusing about the comment here. Based on your following description, are `LT` and `LE` unordered either? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1190605859 From epeter at openjdk.org Thu May 11 04:13:42 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 04:13:42 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Thu, 11 May 2023 03:47:58 GMT, Fei Gao wrote: >> **Bug** >> In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). >> >> The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) >> On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 >> >> The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: >> https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 >> >> The wrong results with `NaN` are because of a bug in `x`: >> https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 >> The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). >> >> **Solution** >> @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. >> >> This has a few benefits: >> - `VectorMaskCmp + VectorBlend` is more powerful: >> - `CMoveVF/D` required the same inputs to the compare than to the move itself. >> - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. >> - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). >> - We need less code (I completely removed all code for `CMoveVF/D`). >> >> I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. >> >> As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, ... > > src/hotspot/share/opto/superword.cpp line 2855: > >> 2853: // >> 2854: // The VectorMaskCmpNode does a comparison directly on in1 and in2, in the java >> 2855: // standard way (all comparisons are ordered, except NEQ is unordered). > > Sorry, I'm a bit confusing about the comment here. Based on your following description, are `LT` and `LE` unordered either? The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. How would you improve my comments? @fg1417 thanks for the suggestion about running with the flags over all jtreg. I'll do that now... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1190614478 PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1190614695 From dholmes at openjdk.org Thu May 11 04:21:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 04:21:45 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v5] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 08:36:35 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Move description for MemFree and MemAvailable to os.hpp I will pre-approve for expediency but have suggested an alternate comment. Thanks. src/hotspot/share/runtime/os.hpp line 316: > 314: // than free memory (MemFree in /proc/meminfo) because Linux would use > 315: // free memory aggressively (e.g. caches). > 316: // Thus we distinguish free memory and available memory in Linux. May I suggest a slight refocus: // On some platforms there is a distinction between "available" memory and "free" memory. // For example, on Linux, "available" memory (`MemAvailable` in `/proc/meminfo`) is greater // than "free" memory (`MemFree` in `/proc/meminfo`) because Linux can free memory // aggressively (e.g. clear caches) so that it becomes available. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1421728680 PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1190617239 From stuefe at openjdk.org Thu May 11 05:15:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 May 2023 05:15:35 GMT Subject: RFR: JDK-8307869: Remove unnecessary log statements from arm32 fastlocking code Message-ID: Trivial patch to remove some logging. Remnant of [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555): During review it was noted by @shipilev that the logging in arm fastlocking code is superfluous and can be removed. ------------- Commit messages: - JDK-8307869-Remove-unnecessary-log-statements-from-arm32-fastlocking-code Changes: https://git.openjdk.org/jdk/pull/13922/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13922&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307869 Stats: 9 lines in 3 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13922/head:pull/13922 PR: https://git.openjdk.org/jdk/pull/13922 From ysuenaga at openjdk.org Thu May 11 06:14:48 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 11 May 2023 06:14:48 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v6] In-Reply-To: References: Message-ID: <6mKiswwfPHPrmw8w3dgmYwML7tsSUF_Wvmeg0H66odw=.7ef56f19-7be8-4bce-9540-b28c197d6638@github.com> > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13398/files - new: https://git.openjdk.org/jdk/pull/13398/files/6aba3440..75ab4eb5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13398&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13398/head:pull/13398 PR: https://git.openjdk.org/jdk/pull/13398 From ysuenaga at openjdk.org Thu May 11 06:14:57 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Thu, 11 May 2023 06:14:57 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v5] In-Reply-To: References: Message-ID: <7Efr9qZhH83z393x7iwOaGfkQ85CMMKWk0UMjJjiE90=.1b20913b-7000-4d4a-bb5e-d7687a35af26@github.com> On Thu, 11 May 2023 04:18:06 GMT, David Holmes wrote: >> Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: >> >> Move description for MemFree and MemAvailable to os.hpp > > src/hotspot/share/runtime/os.hpp line 316: > >> 314: // than free memory (MemFree in /proc/meminfo) because Linux would use >> 315: // free memory aggressively (e.g. caches). >> 316: // Thus we distinguish free memory and available memory in Linux. > > May I suggest a slight refocus: > > // On some platforms there is a distinction between "available" memory and "free" memory. > // For example, on Linux, "available" memory (`MemAvailable` in `/proc/meminfo`) is greater > // than "free" memory (`MemFree` in `/proc/meminfo`) because Linux can free memory > // aggressively (e.g. clear caches) so that it becomes available. Thanks @dholmes-ora ! I updated comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13398#discussion_r1190676912 From jbhateja at openjdk.org Thu May 11 07:44:51 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 May 2023 07:44:51 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost I agree with the phasing of the optimization as it gives us an opportunity to perform similar optimization for Vectors created at parse time i.e., though VectorAPIs. ![image](https://github.com/openjdk/jdk/assets/59989778/e786cc8b-fca8-49f4-814a-788c429ed473) VectorAPI based kernel will have a different graph shape and proposed pattern matching will not be able to handle it. Also, trip count is feeding into LoadVector and scalar reduction operation (scalar add in above example) is secondary an induction variable. I think we can still handle it in a follow up patch by doing a two pass over loop. - Scan loop body and collect all the UnorderdReduction and their users. - Exist optimization if any of following condition holds good. - Different UnorderedReduction have different reduction opcodes. - Reduction node has more than one user. - If above conditions are met, then your algorithm will have to traverse the Scalar operations chain and check if one of its input is UnorderedReduction and other input should be driven by same graph pattern. - Once we find a legal graph pallet then replace Reductions with Vector counterparts and move reduction out of loop as is being done currently by your patch. ![image](https://github.com/openjdk/jdk/assets/59989778/ed54aabe-6438-4256-b774-9010c04999ae) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1543493579 From jbhateja at openjdk.org Thu May 11 07:58:49 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 May 2023 07:58:49 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost > I agree with the phasing of the optimization as it gives us an opportunity to perform similar optimization for Vectors created at parse time i.e., though VectorAPIs. > > ![image](https://user-images.githubusercontent.com/59989778/237607853-e786cc8b-fca8-49f4-814a-788c429ed473.png) > > VectorAPI based kernel will have a different graph shape and proposed pattern matching will not be able to handle it. Also, trip count is feeding into LoadVector and scalar reduction operation (scalar add in above example) is secondary an induction variable. I think we can still handle it in a follow up patch by doing a two pass over loop. > > * Scan loop body and collect all the UnorderdReduction and their users. > * Exist optimization if any of following condition holds good. > > * Different UnorderedReduction have different reduction opcodes. > * Reduction node has more than one user. > * If above conditions are met, then your algorithm will have to traverse the Scalar operations chain and check if one of its input is UnorderedReduction and other input should be driven by same graph pattern. > * Once we find a legal graph pallet then replace Reductions with Vector counterparts and move reduction out of loop as is being done currently by your patch. > > ![image](https://user-images.githubusercontent.com/59989778/237607878-ed54aabe-6438-4256-b774-9010c04999ae.png) BTW, with VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g. outer_loop : hand_unrolled_vector_loop: v1 = VectorADD(broadcast(0)) v2 = v1.VectorADD(LoadVector) v3 = v2.VectorADD(LoadVector) ... ... inner_loop_end res += v3.ReductionAdd() outer_loop_end So its not a pressing issue anyways for us. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1543507479 From epeter at openjdk.org Thu May 11 07:58:50 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 07:58:50 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: On Thu, 11 May 2023 07:52:10 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > >> I agree with the phasing of the optimization as it gives us an opportunity to perform similar optimization for Vectors created at parse time i.e., though VectorAPIs. >> >> ![image](https://user-images.githubusercontent.com/59989778/237607853-e786cc8b-fca8-49f4-814a-788c429ed473.png) >> >> VectorAPI based kernel will have a different graph shape and proposed pattern matching will not be able to handle it. Also, trip count is feeding into LoadVector and scalar reduction operation (scalar add in above example) is secondary an induction variable. I think we can still handle it in a follow up patch by doing a two pass over loop. >> >> * Scan loop body and collect all the UnorderdReduction and their users. >> * Exist optimization if any of following condition holds good. >> >> * Different UnorderedReduction have different reduction opcodes. >> * Reduction node has more than one user. >> * If above conditions are met, then your algorithm will have to traverse the Scalar operations chain and check if one of its input is UnorderedReduction and other input should be driven by same graph pattern. >> * Once we find a legal graph pallet then replace Reductions with Vector counterparts and move reduction out of loop as is being done currently by your patch. >> >> ![image](https://user-images.githubusercontent.com/59989778/237607878-ed54aabe-6438-4256-b774-9010c04999ae.png) > > BTW, with VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g. > > > outer_loop : > hand_unrolled_vector_loop: > v1 = VectorADD(broadcast(0)) > v2 = v1.VectorADD(LoadVector) > v3 = v2.VectorADD(LoadVector) > ... > ... > inner_loop_end > res += v3.ReductionAdd() > outer_loop_end > > > So its not a pressing issue anyways for us. @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1543511250 From jbhateja at openjdk.org Thu May 11 07:58:51 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 May 2023 07:58:51 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: On Thu, 11 May 2023 07:52:10 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > >> I agree with the phasing of the optimization as it gives us an opportunity to perform similar optimization for Vectors created at parse time i.e., though VectorAPIs. >> >> ![image](https://user-images.githubusercontent.com/59989778/237607853-e786cc8b-fca8-49f4-814a-788c429ed473.png) >> >> VectorAPI based kernel will have a different graph shape and proposed pattern matching will not be able to handle it. Also, trip count is feeding into LoadVector and scalar reduction operation (scalar add in above example) is secondary an induction variable. I think we can still handle it in a follow up patch by doing a two pass over loop. >> >> * Scan loop body and collect all the UnorderdReduction and their users. >> * Exist optimization if any of following condition holds good. >> >> * Different UnorderedReduction have different reduction opcodes. >> * Reduction node has more than one user. >> * If above conditions are met, then your algorithm will have to traverse the Scalar operations chain and check if one of its input is UnorderedReduction and other input should be driven by same graph pattern. >> * Once we find a legal graph pallet then replace Reductions with Vector counterparts and move reduction out of loop as is being done currently by your patch. >> >> ![image](https://user-images.githubusercontent.com/59989778/237607878-ed54aabe-6438-4256-b774-9010c04999ae.png) > > BTW, with VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g. > > > outer_loop : > hand_unrolled_vector_loop: > v1 = VectorADD(broadcast(0)) > v2 = v1.VectorADD(LoadVector) > v3 = v2.VectorADD(LoadVector) > ... > ... > inner_loop_end > res += v3.ReductionAdd() > outer_loop_end > > > So its not a pressing issue anyways for us. > @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. Your changes looks good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1543515005 From jbhateja at openjdk.org Thu May 11 08:01:48 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 May 2023 08:01:48 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1421973635 From roland at openjdk.org Thu May 11 08:05:49 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 May 2023 08:05:49 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: Message-ID: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> > pre/main/post loops are created for an inner loop of a loop nest but > assert predicates cause the main and post loops to be removed. The > OpaqueZeroTripGuard nodes for the loops are not removed: there's no > logic to trigger removal of the opaque nodes once the loops are no > longer there. With the inner loops gone, the outer loop becomes > candidate for optimizations and is unrolled which causes the zero trip > guards of the now removed loops to be duplicated and the opaque nodes > to have more than one use. > > The fix I propose is, using logic similar to > `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop > opts if every OpaqueZeroTripGuard node guards a loop and if not, > remove it. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13901/files - new: https://git.openjdk.org/jdk/pull/13901/files/49b2a2f9..d360f92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=00-01 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13901/head:pull/13901 PR: https://git.openjdk.org/jdk/pull/13901 From roland at openjdk.org Thu May 11 08:05:52 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 May 2023 08:05:52 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 16:09:13 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopnode.cpp line 6180: > >> 6178: >> 6179: if (!_verify_only && n->Opcode() == Op_OpaqueZeroTripGuard) { >> 6180: _zero_trip_guard_opaque_nodes.push(n); > > That's a good idea to collect them newly here for each loop opts pass. I'm not sure if you're expecting me to comment on this or not. > src/hotspot/share/opto/opaquenode.cpp line 58: > >> 56: } >> 57: >> 58: CountedLoopNode* OpaqueZeroTripGuardNode::guarded_loop() const { > > Could be guarded with `ifdef ASSERT` since you are only using it for an assertion. I made that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190781947 PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190781227 From chagedorn at openjdk.org Thu May 11 08:14:42 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 May 2023 08:14:42 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> References: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> Message-ID: On Thu, 11 May 2023 08:05:49 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1421994114 From roland at openjdk.org Thu May 11 08:14:44 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 May 2023 08:14:44 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: Message-ID: <6MbpjTuLLCrzfIRd-kL7hATvHn6S0raDMqyC_H3zSJc=.7d402f19-7b74-4c05-8914-d4e8e7df7263@github.com> On Wed, 10 May 2023 16:10:27 GMT, Christian Hagedorn wrote: > That looks reasonable. Thanks for reviewing this @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1543539946 From chagedorn at openjdk.org Thu May 11 08:14:46 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 May 2023 08:14:46 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:01:02 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 6180: >> >>> 6178: >>> 6179: if (!_verify_only && n->Opcode() == Op_OpaqueZeroTripGuard) { >>> 6180: _zero_trip_guard_opaque_nodes.push(n); >> >> That's a good idea to collect them newly here for each loop opts pass. > > I'm not sure if you're expecting me to comment on this or not. No action required. I was first trying to suggest to move it to `Compile` to the other predicate opaque node lists but then I've thought that this solution here is cleaner. So it was more of a ? (I will refactor these predicate lists in the assertion predicate changes - otherwise, I would have suggested to move them here as well at some point). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190792057 From roland at openjdk.org Thu May 11 08:14:47 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 May 2023 08:14:47 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: Message-ID: <825CEBUEmco-wR7vYqAvdpP8by-GKQCZgXa6oM5LOsU=.5de46c62-462b-4103-88d4-ca498764d002@github.com> On Thu, 11 May 2023 08:09:52 GMT, Christian Hagedorn wrote: >> I'm not sure if you're expecting me to comment on this or not. > > No action required. I was first trying to suggest to move it to `Compile` to the other predicate opaque node lists but then I've thought that this solution here is cleaner. So it was more of a ? (I will refactor these predicate lists in the assertion predicate changes - otherwise, I would have suggested to move them here as well at some point). Thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1190793632 From fgao at openjdk.org Thu May 11 08:26:48 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 11 May 2023 08:26:48 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> On Thu, 11 May 2023 04:11:16 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 2855: >> >>> 2853: // >>> 2854: // The VectorMaskCmpNode does a comparison directly on in1 and in2, in the java >>> 2855: // standard way (all comparisons are ordered, except NEQ is unordered). >> >> Sorry, I'm a bit confusing about the comment here. Based on your following description, are `LT` and `LE` unordered either? > > @fg1417 thanks for the suggestion about running with the flags over all jtreg. I'll do that now... > The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. > > But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. > > How would you improve my comments? Thanks for your clarification. Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: // // But with these two cases, which `VectorMaskCmp` interprets as ordered, // we must convert the unordered into an ordered comparison: // BoolTest::lt: Case -1 -> LT_U // BoolTest::le: Case -1, 0 -> LE_U // ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1190808991 From epeter at openjdk.org Thu May 11 08:45:45 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 08:45:45 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: Message-ID: > **Bug** > In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). > > The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) > On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 > > The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: > https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 > > The wrong results with `NaN` are because of a bug in `x`: > https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 > The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). > > **Solution** > @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. > > This has a few benefits: > - `VectorMaskCmp + VectorBlend` is more powerful: > - `CMoveVF/D` required the same inputs to the compare than to the move itself. > - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. > - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). > - We need less code (I completely removed all code for `CMoveVF/D`). > > I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. > > As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, the CMove code did not prop... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Improved comment on request of @fg1417 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13493/files - new: https://git.openjdk.org/jdk/pull/13493/files/81d4de72..89fe29e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13493&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13493&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13493.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13493/head:pull/13493 PR: https://git.openjdk.org/jdk/pull/13493 From epeter at openjdk.org Thu May 11 08:45:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 08:45:47 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Thu, 11 May 2023 08:23:46 GMT, Fei Gao wrote: >> @fg1417 thanks for the suggestion about running with the flags over all jtreg. I'll do that now... > >> The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. >> >> But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. >> >> How would you improve my comments? > > Thanks for your clarification. > > Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: > > // > // But with these two cases, which `VectorMaskCmp` interprets as ordered, > // we must convert the unordered into an ordered comparison: > // BoolTest::lt: Case -1 -> LT_U > // BoolTest::le: Case -1, 0 -> LE_U > // @fg1417 Ah yes, that part could be a bit more explicit, thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1190837473 From shade at openjdk.org Thu May 11 08:46:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 08:46:48 GMT Subject: RFR: JDK-8307869: Remove unnecessary log statements from arm32 fastlocking code In-Reply-To: References: Message-ID: On Thu, 11 May 2023 05:06:47 GMT, Thomas Stuefe wrote: > Trivial patch to remove some logging. > > Remnant of [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555): During review it was noted by @shipilev that the logging in arm fastlocking code is superfluous and can be removed. Looks fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13922#pullrequestreview-1422074325 From stuefe at openjdk.org Thu May 11 09:27:53 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 May 2023 09:27:53 GMT Subject: Integrated: JDK-8307869: Remove unnecessary log statements from arm32 fastlocking code In-Reply-To: References: Message-ID: On Thu, 11 May 2023 05:06:47 GMT, Thomas Stuefe wrote: > Trivial patch to remove some logging. > > Remnant of [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555): During review it was noted by @shipilev that the logging in arm fastlocking code is superfluous and can be removed. This pull request has now been integrated. Changeset: ecc1d85d Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ecc1d85dbea84c291c4014f2237ae9326f14cccb Stats: 9 lines in 3 files changed: 0 ins; 9 del; 0 mod 8307869: Remove unnecessary log statements from arm32 fastlocking code Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/13922 From yzheng at openjdk.org Thu May 11 10:09:43 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 11 May 2023 10:09:43 GMT Subject: RFR: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13902#issuecomment-1543713552 From yzheng at openjdk.org Thu May 11 10:41:52 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 11 May 2023 10:41:52 GMT Subject: Integrated: 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. In-Reply-To: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> References: <0LroiUtDrEr4flVh_axHbxFQPc5HZwi5HWsdVrUqjhs=.efefd906-5d74-4975-a7a4-f46425e5f86d@github.com> Message-ID: On Wed, 10 May 2023 11:49:03 GMT, Yudi Zheng wrote: > Export markWord::lock_mask_in_place to JVMCI compilers. This field is essential for accessing cached identity hash code. This pull request has now been integrated. Changeset: 0cbfbc40 Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/0cbfbc400aac53b098a3d8a7dda1aec2180a47a7 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8307813: [JVMCI] Export markWord::lock_mask_in_place to JVMCI compilers. Reviewed-by: dnsimon, kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/13902 From stuefe at openjdk.org Thu May 11 11:37:54 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 May 2023 11:37:54 GMT Subject: RFR: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo [v6] In-Reply-To: <6mKiswwfPHPrmw8w3dgmYwML7tsSUF_Wvmeg0H66odw=.7ef56f19-7be8-4bce-9540-b28c197d6638@github.com> References: <6mKiswwfPHPrmw8w3dgmYwML7tsSUF_Wvmeg0H66odw=.7ef56f19-7be8-4bce-9540-b28c197d6638@github.com> Message-ID: <_YwRB_yaaq-To7e1TrMfivRA5HhCLeFOtsYhQQ-wRoo=.cec5427f-cf7a-4a5e-865d-d0bf0dee969c@github.com> On Thu, 11 May 2023 06:14:48 GMT, Yasumasa Suenaga wrote: >> `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). >> >> `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. >> >> AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. >> >> [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 >> [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable >> [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Update comments Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13398#pullrequestreview-1422379556 From coleenp at openjdk.org Thu May 11 12:36:43 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 May 2023 12:36:43 GMT Subject: RFR: 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 In-Reply-To: References: Message-ID: <5gQOMR8BxQ6Tc4eFb2901c5Ig3rIuqmNbZWY8uod2OA=.c09912db-5c46-4c67-a01f-a24871250578@github.com> On Tue, 9 May 2023 13:02:45 GMT, Afshin Zafari wrote: > - The `finalize()` method is replaced with `cleanup()`. > - A new constructor is added to register the cleanup method. > - A static `Cleaner` is defined to have only one cleaner thread for all the 15000 instances. Otherwise, we get an `OutOfMemoryException` on cleaner thread creation. I think this looks good, and still tests what the original test failure was. Unless the bug was with the _return_register_finalizer bytecode, but I don't think that's the case. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13886#pullrequestreview-1422508567 From bulasevich at openjdk.org Thu May 11 14:23:43 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 11 May 2023 14:23:43 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section Message-ID: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) The objections to change #10025 were: - specialized algorithm for given data complicates things, makes it hard to learn, test and support - algorithm is changed for DebugInfo, and the benefit is only for one type of data - statistics of the debug info data can (will) change, breaking the optimization The suggestion was: - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. Performance impact: Renaisance and DaCapo benchmarks do not show any difference. ------------- Commit messages: - 8293170: Improve encoding of the debuginfo nmethod section Changes: https://git.openjdk.org/jdk/pull/12387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293170 Stats: 67 lines in 3 files changed: 63 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12387/head:pull/12387 PR: https://git.openjdk.org/jdk/pull/12387 From bulasevich at openjdk.org Thu May 11 14:55:42 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 11 May 2023 14:55:42 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: On Thu, 2 Feb 2023 12:54:06 GMT, Boris Ulasevich wrote: > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. @rose00 @vnkozlov I would appreciate it if you could find time to review this change. Thank you very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12387#issuecomment-1544133625 From epeter at openjdk.org Thu May 11 15:09:38 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 15:09:38 GMT Subject: RFR: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java Message-ID: **The Problem** During CCP, we get to a state like that: x (int:1) Phi (int:4) | | | +-----+ | | LShiftI (int:16) | CastII (top) ConI (int:3) | | +----+ +---------+ | | AndI We call `AddINode::Value` during CCP, and in `MulNode::AndIL_shift_and_mask_is_always_zero` we `uncast` both inputs, which leaves us with `LShiftI` and `ConI` as the "true" inputs. They both have non-top types, and so we determine that this `AndI-LShiftI` combination always leads to `zero`: The `Phi` has a constant type (`int:4`). So this leaves the lowest 4 bits zero after the `LShiftI`. Then and-ing that with `int:3` means we extract the lowest 3 bits that are zero. So the result is provably always zero - that is the idea. Then, we have some type updates (here of `x` and `Phi` and `LShiftI`), and the graph looks like this: x (int) Phi (int:0..4) | | | +-----+ | | LShiftI (int) | CastII (top) ConI (int:3) | | +----+ +---------+ | | AndI This leads to `shift2` failing to have constant type: https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L1964-L1967 And with that, we fall back to `MulNode::Value`: https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L559-L566 In `MulNode::Value` we detect that the `CastII` has type `top`, and return `top` for `AndI`. CCP expects the types to become more wide over time, so going from `int:0` to `top` is the wrong direction. **Solution** The problem is with the relatively rare `CastII` still being `top` - this seems to be very rare. But the new regression test `TestShiftCastAndNotification.java` seems to create exactly that case, in combination with `-XX:StressCCP`. We should guard against `top` in one of the `AndI` inputs inside `MulNode::AndIL_shift_and_mask_is_always_zero`. This will prevent it from detecting the zero-case, untill `MulNode::Value` would get a chance to compute a non-top type. **Argument for Solution** Is there still a threat from `MulNode::AndIL_shift_and_mask_is_always_zero` computing a zero first, and `MulNode::Value` a type that does not include zero after ward? As types only widen during CCP, having a zero first means that all inputs now are non-top - in fact they are all `T_INT`. Since types only widen in the inputs, and a `zero` combination was possible first, it must also be possible later. **Testing** It used to reproduce with `-XX:RepeatCompilation=1000` very quickly, by restricting to that single failing method. This seems fixed now, I verified it locally. Passes up to tier5 and stress testing. ------------- Commit messages: - 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java Changes: https://git.openjdk.org/jdk/pull/13908/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13908&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307619 Stats: 7 lines in 1 file changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13908/head:pull/13908 PR: https://git.openjdk.org/jdk/pull/13908 From jlu at openjdk.org Thu May 11 20:21:57 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 11 May 2023 20:21:57 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Convert the merged master changes to UTF-8 - Merge master and fix conflicts - Close streams when finished loading into props - Adjust CF test to read in with UTF-8 to fix failing test - Reconvert CS.properties to UTF-8 - Revert all changes to CurrencySymbols.properties - Bug6204853 should not be converted - Copyright year for CompileProperties - Redo translation for CS.properties - Spot convert CurrencySymbols.properties - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a ------------- Changes: https://git.openjdk.org/jdk/pull/12726/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12726&range=05 Stats: 28877 lines in 493 files changed: 14 ins; 1 del; 28862 mod Patch: https://git.openjdk.org/jdk/pull/12726.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12726/head:pull/12726 PR: https://git.openjdk.org/jdk/pull/12726 From jlu at openjdk.org Thu May 11 21:39:50 2023 From: jlu at openjdk.org (Justin Lu) Date: Thu, 11 May 2023 21:39:50 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 11 May 2023 20:21:57 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Convert the merged master changes to UTF-8 > - Merge master and fix conflicts > - Close streams when finished loading into props > - Adjust CF test to read in with UTF-8 to fix failing test > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties > - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a Wondering if anyone has any thoughts on the consequences of this PR, in relation to Intellj's (and other IDEs) default encoding for .properties files. Intellj sets the default encoding for .properties files to ISO-8859-1, which would be the wrong encoding if the .properties files are converted to UTF-8 native. This would cause certain key,values to be skewed when represented in the file. Although the default file-encoding for .properties can be switched to UTF-8, it is not the default. Wondering what some solutions/thoughts to this are. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1544708830 From naoto at openjdk.org Thu May 11 21:51:13 2023 From: naoto at openjdk.org (Naoto Sato) Date: Thu, 11 May 2023 21:51:13 GMT Subject: RFR: 8301991: Convert l10n properties resource bundles to UTF-8 native [v6] In-Reply-To: References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: <2apKDcin5cwY53zz5jOIPhqm7cCWhyYMdsXGU4TauEk=.781d695e-39fe-46f7-bd03-be514ca0b85c@github.com> On Thu, 11 May 2023 20:21:57 GMT, Justin Lu wrote: >> This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. >> >> In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. > > Justin Lu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Convert the merged master changes to UTF-8 > - Merge master and fix conflicts > - Close streams when finished loading into props > - Adjust CF test to read in with UTF-8 to fix failing test > - Reconvert CS.properties to UTF-8 > - Revert all changes to CurrencySymbols.properties > - Bug6204853 should not be converted > - Copyright year for CompileProperties > - Redo translation for CS.properties > - Spot convert CurrencySymbols.properties > - ... and 6 more: https://git.openjdk.org/jdk/compare/4386d42d...f15b373a I think this is fine, as those properties files are JDK's own. I believe the benefit of moving to UTF-8 outweighs the issue you wrote, which can be remedied by changing the encoding in the IDEs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12726#issuecomment-1544722480 From shade at openjdk.org Thu May 11 23:16:46 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 23:16:46 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin Looks reasonable. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13577#pullrequestreview-1395788527 From xlinzheng at openjdk.org Thu May 11 23:16:45 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 11 May 2023 23:16:45 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register Message-ID: The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. Testing in progress. Thanks, Xiaolin ------------- Commit messages: - Merge branch 'master' into heapbase-crash - Merge branch 'master' into heapbase-crash - A simple fix Changes: https://git.openjdk.org/jdk/pull/13577/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13577&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306667 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13577/head:pull/13577 PR: https://git.openjdk.org/jdk/pull/13577 From vkempik at openjdk.org Thu May 11 23:16:47 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 23:16:47 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: <_Xjx0iAMGGKjKkLimWSsBJreLVcQ56MbaGhX1_iFYdw=.27e2e126-a9cf-4880-93dc-a07b73dda3d0@github.com> On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin btw, should this also be backported to 17u-riscv ? after this PR is done ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1517817514 From xlinzheng at openjdk.org Thu May 11 23:16:48 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 11 May 2023 23:16:48 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: <_Xjx0iAMGGKjKkLimWSsBJreLVcQ56MbaGhX1_iFYdw=.27e2e126-a9cf-4880-93dc-a07b73dda3d0@github.com> References: <_Xjx0iAMGGKjKkLimWSsBJreLVcQ56MbaGhX1_iFYdw=.27e2e126-a9cf-4880-93dc-a07b73dda3d0@github.com> Message-ID: On Fri, 21 Apr 2023 13:12:15 GMT, Vladimir Kempik wrote: > btw, should this also be backported to 17u-riscv ? after this PR is done Sure, I can do that after verifying and merging this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1517893838 From xlinzheng at openjdk.org Thu May 11 23:16:49 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 11 May 2023 23:16:49 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 13:38:00 GMT, Aleksey Shipilev wrote: > Looks reasonable. Thanks for the review, Aleksey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1517894396 From gli at openjdk.org Thu May 11 23:16:49 2023 From: gli at openjdk.org (Guoxiong Li) Date: Thu, 11 May 2023 23:16:49 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin A potential issue in the comment of [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/riscv/riscv.ad#L3544): // heap base register -- used for encoding immN0 operand iRegIHeapbase() And unfortunately, `aarch64` has the same issue in [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/aarch64/aarch64.ad#L5241) since [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449). They can be fixed together in a follow-up patch. Or only fix the `riscv64` part in this patch and fix the `aarch64` part in another patch. >> And unfortunately, aarch64 has the same issue in [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/aarch64/aarch64.ad#L5241) since [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449). > > Though, I think the AArch64 backend does not have this issue because it has the [CompressedOops::base() == NULL && CompressedKlassPointers::base() == NULL](https://hg.openjdk.org/jdk/jdk/rev/aedc9bf21743#l1.24) guard before JDK-8242449 - if I understand your comments correctly? `the same issue` I mentioned means `the same unnecessary comments`. Sorry for the ambiguity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1535867825 PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1536010896 From xlinzheng at openjdk.org Thu May 11 23:16:49 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 11 May 2023 23:16:49 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: <57yH43lHLjh-5WgwWUlmpi4UxlJ_-dQn2_2NqAuN_Es=.82938526-097b-497f-af85-1bfc038c6b88@github.com> On Fri, 5 May 2023 07:48:36 GMT, Guoxiong Li wrote: > A potential issue in the comment of [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/riscv/riscv.ad#L3544): > > ``` > // heap base register -- used for encoding immN0 > operand iRegIHeapbase() > ``` > > And unfortunately, `aarch64` has the same issue in [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/aarch64/aarch64.ad#L5241) since [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449). > > They can be fixed together in a follow-up patch. Or only fix the `riscv64` part in this patch and fix the `aarch64` part in another patch. Indeed, `iRegIHeapbase` gets used nowhere in both backends and thanks for the catching. I can file one patch to clean them up after this. Though, I think the AArch64 backend does not have this issue because it has the [`CompressedOops::base() == NULL && CompressedKlassPointers::base() == NULL`](https://hg.openjdk.org/jdk/jdk/rev/aedc9bf21743#l1.24) guard before JDK-8242449 - if I understand your comments correctly? P.S. Sorry for the delay in this fix - I have tested hotspot tier1~4 on QEMU but still not yet on an unmatched board since mine has other jobs to test. Will carry this fix on soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1536003434 From xlinzheng at openjdk.org Thu May 11 23:16:50 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 11 May 2023 23:16:50 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin Verified hotspot tier1~2 (fastdebug) on my machine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1544810350 From sviswanathan at openjdk.org Fri May 12 00:46:59 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 May 2023 00:46:59 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: On Thu, 11 May 2023 07:54:23 GMT, Emanuel Peter wrote: >> Removed the noisy comment from the patch! >> >> With VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g. >> >> >> outer_loop : >> hand_unrolled_vector_loop: >> v1 = VectorADD(broadcast(0)) >> v2 = v1.VectorADD(LoadVector) >> v3 = v2.VectorADD(LoadVector) >> ... >> ... >> inner_loop_end >> res += v3.ReductionAdd() >> outer_loop_end >> >> >> So its not a pressing issue anyways for us. > > @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. @eme64 Very nice and clean work. Thanks a lot for taking this up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1544942484 From gli at openjdk.org Fri May 12 01:03:57 2023 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 12 May 2023 01:03:57 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin LGTM ------------- Marked as reviewed by gli (Committer). PR Review: https://git.openjdk.org/jdk/pull/13577#pullrequestreview-1423631219 From sviswanathan at openjdk.org Fri May 12 01:13:49 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 May 2023 01:13:49 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost src/hotspot/share/opto/loopopts.cpp line 4210: > 4208: if (use != phi && ctrl_or_self(use) == cl) { > 4209: DEBUG_ONLY( current->dump(-1); ) > 4210: assert(false, "reduction has use inside loop"); I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: public static int sumReductionImplement( int[] a, int[] b, int[] c, int total) { int sum = 0; for (int i = 0; i < a.length; i++) { total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); sum = total + i; } return total + sum; } Do you think this is a valid concern? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191814579 From fyang at openjdk.org Fri May 12 02:44:48 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 12 May 2023 02:44:48 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: <-uD7S30dtpGcQkEHj0G1GkSJuxEiW0XFTcq_BsDY7x8=.feb1f57e-fd4d-48ad-9a0f-ab972022ec67@github.com> On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin Nice catch. Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13577#pullrequestreview-1423690221 From xlinzheng at openjdk.org Fri May 12 03:46:45 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 12 May 2023 03:46:45 GMT Subject: RFR: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 5 May 2023 09:52:00 GMT, Guoxiong Li wrote: >> The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. >> >> Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. >> >> x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. >> >> Testing in progress. >> >> Thanks, >> Xiaolin > >>> And unfortunately, aarch64 has the same issue in [iRegIHeapbase](https://github.com/openjdk/jdk/blob/302bc2fd7fdfc02314e22ecc34ba2c78ef5ca9a1/src/hotspot/cpu/aarch64/aarch64.ad#L5241) since [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449). >> >> Though, I think the AArch64 backend does not have this issue because it has the [CompressedOops::base() == NULL && CompressedKlassPointers::base() == NULL](https://hg.openjdk.org/jdk/jdk/rev/aedc9bf21743#l1.24) guard before JDK-8242449 - if I understand your comments correctly? > > `the same issue` I mentioned means `the same unnecessary comments`. Sorry for the ambiguity. Thanks for reviewing! @lgxbslgx @RealFYang ------------- PR Comment: https://git.openjdk.org/jdk/pull/13577#issuecomment-1545065224 From epeter at openjdk.org Fri May 12 06:47:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 06:47:52 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 01:10:29 GMT, Sandhya Viswanathan wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > src/hotspot/share/opto/loopopts.cpp line 4210: > >> 4208: if (use != phi && ctrl_or_self(use) == cl) { >> 4209: DEBUG_ONLY( current->dump(-1); ) >> 4210: assert(false, "reduction has use inside loop"); > > I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: > public static int sumReductionImplement( > int[] a, > int[] b, > int[] c, > int total) { > int sum = 0; > for (int i = 0; i < a.length; i++) { > total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); > sum = total + i; > } > return total + sum; > } > Do you think this is a valid concern? I agree, the assert is not very necessary, but I'd rather have an assert more in there and figure out what cases I missed when the fuzzer eventually finds a case. But if it is wished I can also just remove that assert. I wrote this `Test.java`: class Test { static final int RANGE = 1024; static final int ITER = 10_000; static void init(int[] data) { for (int i = 0; i < RANGE; i++) { data[i] = i + 1; } } static int test(int[] data, int sum) { int x = 0; for (int i = 0; i < RANGE; i++) { sum += 11 * data[i]; x = sum & i; // what happens with this AndI ? } return sum + x; } public static void main(String[] args) { int[] data = new int[RANGE]; init(data); for (int i = 0; i < ITER; i++) { test(data, i); } } } And ran it like this, with my patch: ./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord Test.java Everything vectorized as usual. But what happens with the `AndI`? It actually drops outside the loop. Its left input is the `AddReductionVI`, and the right input is `(Phi #tripcount) + 63` (the last `i` thus already drops outside the loop). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191971362 From epeter at openjdk.org Fri May 12 06:52:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 06:52:49 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 06:45:24 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4210: >> >>> 4208: if (use != phi && ctrl_or_self(use) == cl) { >>> 4209: DEBUG_ONLY( current->dump(-1); ) >>> 4210: assert(false, "reduction has use inside loop"); >> >> I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: >> public static int sumReductionImplement( >> int[] a, >> int[] b, >> int[] c, >> int total) { >> int sum = 0; >> for (int i = 0; i < a.length; i++) { >> total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >> sum = total + i; >> } >> return total + sum; >> } >> Do you think this is a valid concern? > > I agree, the assert is not very necessary, but I'd rather have an assert more in there and figure out what cases I missed when the fuzzer eventually finds a case. But if it is wished I can also just remove that assert. > > I wrote this `Test.java`: > > class Test { > static final int RANGE = 1024; > static final int ITER = 10_000; > > static void init(int[] data) { > for (int i = 0; i < RANGE; i++) { > data[i] = i + 1; > } > } > > static int test(int[] data, int sum) { > int x = 0; > for (int i = 0; i < RANGE; i++) { > sum += 11 * data[i]; > x = sum & i; // what happens with this AndI ? > } > return sum + x; > } > > public static void main(String[] args) { > int[] data = new int[RANGE]; > init(data); > for (int i = 0; i < ITER; i++) { > test(data, i); > } > } > } > > And ran it like this, with my patch: > > ./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord Test.java > > > Everything vectorized as usual. But what happens with the `AndI`? It actually drops outside the loop. Its left input is the `AddReductionVI`, and the right input is `(Phi #tripcount) + 63` (the last `i` thus already drops outside the loop). Note: If I have uses of the reduction in each iteration, then we already refuse to vectorize the reduction, as in this case: static int test(int[] data, int sum) { int x = 0; for (int i = 0; i < RANGE; i++) { sum += 11 * data[i]; x += sum & i; // vector use of sum prevents vectorization of sum's reduction-vectorization -> whole chain not vectorized } return sum + x; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191973738 From jbhateja at openjdk.org Fri May 12 06:52:51 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 May 2023 06:52:51 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost src/hotspot/share/opto/loopopts.cpp line 4284: > 4282: // Create post-loop reduction. > 4283: Node* last_accumulator = phi->in(2); > 4284: Node* post_loop_reduction = ReductionNode::make_from_vopc(first_ur->Opcode(), nullptr, init, last_accumulator, bt); Should this be guarded by a safe _Matcher::match_rule_supported_vector_ . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191974683 From epeter at openjdk.org Fri May 12 07:05:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 07:05:49 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 06:48:20 GMT, Emanuel Peter wrote: >> I agree, the assert is not very necessary, but I'd rather have an assert more in there and figure out what cases I missed when the fuzzer eventually finds a case. But if it is wished I can also just remove that assert. >> >> I wrote this `Test.java`: >> >> class Test { >> static final int RANGE = 1024; >> static final int ITER = 10_000; >> >> static void init(int[] data) { >> for (int i = 0; i < RANGE; i++) { >> data[i] = i + 1; >> } >> } >> >> static int test(int[] data, int sum) { >> int x = 0; >> for (int i = 0; i < RANGE; i++) { >> sum += 11 * data[i]; >> x = sum & i; // what happens with this AndI ? >> } >> return sum + x; >> } >> >> public static void main(String[] args) { >> int[] data = new int[RANGE]; >> init(data); >> for (int i = 0; i < ITER; i++) { >> test(data, i); >> } >> } >> } >> >> And ran it like this, with my patch: >> >> ./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord Test.java >> >> >> Everything vectorized as usual. But what happens with the `AndI`? It actually drops outside the loop. Its left input is the `AddReductionVI`, and the right input is `(Phi #tripcount) + 63` (the last `i` thus already drops outside the loop). > > Note: If I have uses of the reduction in each iteration, then we already refuse to vectorize the reduction, as in this case: > > static int test(int[] data, int sum) { > int x = 0; > for (int i = 0; i < RANGE; i++) { > sum += 11 * data[i]; > x += sum & i; // vector use of sum prevents vectorization of sum's reduction-vectorization -> whole chain not vectorized > } > return sum + x; > } My conclusion, given my best understanding: eigher we have a use of the `sum` in all iterations, which prevents vectorization of the reduction. Or we only have a use of the last iteration, and it drops out of the loop already. So if there is such an odd example, I'd rather we run into an assert in debug and look at it again. Maybe it would be perfectly legal, or maybe it reveals a bug here or elsewhere in the reduction code. @sviswa7 what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191981425 From epeter at openjdk.org Fri May 12 07:05:54 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 07:05:54 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 06:49:29 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > src/hotspot/share/opto/loopopts.cpp line 4284: > >> 4282: // Create post-loop reduction. >> 4283: Node* last_accumulator = phi->in(2); >> 4284: Node* post_loop_reduction = ReductionNode::make_from_vopc(first_ur->Opcode(), nullptr, init, last_accumulator, bt); > > Should this be guarded by a safe _Matcher::match_rule_supported_vector_ . Do you think that is necessary? After all I am just creating the same type of `ReductionNode` that I already replaced in the loop, right? If anything, we should guard this: https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4272 But what platform would support the reduction if it does not support the normal vector-op? What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1191986280 From xlinzheng at openjdk.org Fri May 12 07:13:51 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 12 May 2023 07:13:51 GMT Subject: Integrated: 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register In-Reply-To: References: Message-ID: On Fri, 21 Apr 2023 10:16:44 GMT, Xiaolin Zheng wrote: > The `storeImmN0` in the RISC-V backend missed a `CompressedOops::base() == NULL` predication. Under non-zero-based compressed oops mode, the `xheapbase` can be a non-zero value and crashes the VM. > > Reproduced by `/bin/java -Xcomp -XX:HeapBaseMinAddress=72030M -version` simply. A hs_err file is attached in the JBS issue. > > x86 uses `r12` as a zero register in `storeImmN0`, but RISC-V has a zero register so we can use it to implement the matching rule. > > Testing in progress. > > Thanks, > Xiaolin This pull request has now been integrated. Changeset: e32de7ef Author: Xiaolin Zheng Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/e32de7efd6f3173a0bba5829e8de3edd01cfdbab Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8306667: RISC-V: Fix storeImmN0 matching rule by using zr register Reviewed-by: shade, gli, fyang ------------- PR: https://git.openjdk.org/jdk/pull/13577 From epeter at openjdk.org Fri May 12 07:26:51 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 07:26:51 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 06:57:41 GMT, Emanuel Peter wrote: >> Note: If I have uses of the reduction in each iteration, then we already refuse to vectorize the reduction, as in this case: >> >> static int test(int[] data, int sum) { >> int x = 0; >> for (int i = 0; i < RANGE; i++) { >> sum += 11 * data[i]; >> x += sum & i; // vector use of sum prevents vectorization of sum's reduction-vectorization -> whole chain not vectorized >> } >> return sum + x; >> } > > My conclusion, given my best understanding: eigher we have a use of the `sum` in all iterations, which prevents vectorization of the reduction. Or we only have a use of the last iteration, and it drops out of the loop already. > > So if there is such an odd example, I'd rather we run into an assert in debug and look at it again. Maybe it would be perfectly legal, or maybe it reveals a bug here or elsewhere in the reduction code. > > @sviswa7 what do you think? Ah, but this hits one of my asserts: static int test(int[] data, int sum) { int x = 0; for (int i = 0; i < RANGE; i+=8) { sum += 11 * data[i+0]; sum += 11 * data[i+1]; sum += 11 * data[i+2]; sum += 11 * data[i+3]; x = sum + i; sum += 11 * data[i+4]; sum += 11 * data[i+5]; sum += 11 * data[i+6]; sum += 11 * data[i+7]; } return sum + x; } With ./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:MaxVectorSize=16 Test.java Triggers https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4217 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192004984 From jbhateja at openjdk.org Fri May 12 07:31:54 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 May 2023 07:31:54 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: <9BokGvSp13Iz_8ZTe5mgclCUMjdlVq1YbYwTkQ8AmRE=.b10100a0-a8c9-4d52-9609-5bd45f6e24cb@github.com> On Wed, 10 May 2023 11:45:38 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > use is_counted and is_innermost src/hotspot/share/opto/loopopts.cpp line 4272: > 4270: Node* last_vector_accumulator = current->in(1); > 4271: Node* vector_input = current->in(2); > 4272: VectorNode* vector_accumulator = current->make_normal_vector_op(last_vector_accumulator, vector_input, vec_t); Should this be guarded by a safe _Matcher::match_rule_supported_vector_ . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192010289 From jbhateja at openjdk.org Fri May 12 07:31:56 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 May 2023 07:31:56 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 07:03:23 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > src/hotspot/share/opto/loopopts.cpp line 4284: > >> 4282: // Create post-loop reduction. >> 4283: Node* last_accumulator = phi->in(2); >> 4284: Node* post_loop_reduction = ReductionNode::make_from_vopc(first_ur->Opcode(), nullptr, init, last_accumulator, bt); > > Do you think that is necessary? After all I am just creating the same type of `ReductionNode` that I already replaced in the loop, right? If anything, we should guard this: > https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4272 > > But what platform would support the reduction if it does not support the normal vector-op? > > What do you think? Yes, I think my comment just got misplaced , I meant we should guard the vector creation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192008931 From epeter at openjdk.org Fri May 12 07:41:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 07:41:52 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 01:10:29 GMT, Sandhya Viswanathan wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > src/hotspot/share/opto/loopopts.cpp line 4210: > >> 4208: if (use != phi && ctrl_or_self(use) == cl) { >> 4209: DEBUG_ONLY( current->dump(-1); ) >> 4210: assert(false, "reduction has use inside loop"); > > I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: > public static int sumReductionImplement( > int[] a, > int[] b, > int[] c, > int total) { > int sum = 0; > for (int i = 0; i < a.length; i++) { > total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); > sum = total + i; > } > return total + sum; > } > Do you think this is a valid concern? I will add this as a regression test, and remove that assert. Thanks @sviswa7 for making me look at this more closely :) Still, I think it may be valuable to keep these two asserts - both indicate that something strange has happened: https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4210 https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4199 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192020072 From epeter at openjdk.org Fri May 12 07:44:56 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 07:44:56 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: <9BokGvSp13Iz_8ZTe5mgclCUMjdlVq1YbYwTkQ8AmRE=.b10100a0-a8c9-4d52-9609-5bd45f6e24cb@github.com> References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <9BokGvSp13Iz_8ZTe5mgclCUMjdlVq1YbYwTkQ8AmRE=.b10100a0-a8c9-4d52-9609-5bd45f6e24cb@github.com> Message-ID: On Fri, 12 May 2023 07:29:16 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > src/hotspot/share/opto/loopopts.cpp line 4272: > >> 4270: Node* last_vector_accumulator = current->in(1); >> 4271: Node* vector_input = current->in(2); >> 4272: VectorNode* vector_accumulator = current->make_normal_vector_op(last_vector_accumulator, vector_input, vec_t); > > Should this be guarded by a safe _Matcher::match_rule_supported_vector_ . Right, makes sense. I'd have to guard before any transformations take place. So maybe I'll have a second method: `UnorderedReductionNode::make_normal_vector_op_supported`, and use `Matcher::match_rule_supported_vector` inside. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192023372 From rcastanedalo at openjdk.org Fri May 12 07:58:48 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 May 2023 07:58:48 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int Message-ID: The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. #### Testing ##### Functionality - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). ##### Performance - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. ------------- Commit messages: - Refine comments - Update copyright header - Uncomment tests - Complement idealization tests with negative ones - Add some comments - Refactor - Simplify - Re-add comment - Flatten nested if - Extract MaxINode::Ideal() and MinINode::Ideal() into MaxNode::IdealI() - ... and 4 more: https://git.openjdk.org/jdk/compare/8ac71863...9fd482b5 Changes: https://git.openjdk.org/jdk/pull/13924/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302673 Stats: 424 lines in 5 files changed: 266 ins; 103 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/13924.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13924/head:pull/13924 PR: https://git.openjdk.org/jdk/pull/13924 From ysuenaga at openjdk.org Fri May 12 08:51:55 2023 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 12 May 2023 08:51:55 GMT Subject: Integrated: 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo In-Reply-To: References: Message-ID: <1dTcFCCggGtOuspRvSWA7mRZTwR8iD6psfiSD8j-3mE=.40173f45-2a30-4535-a8bf-5a460bbc694b@github.com> On Sat, 8 Apr 2023 02:24:44 GMT, Yasumasa Suenaga wrote: > `os::Linux::available_memory()` returns available memory from cgroups or sysinfo(2). In case of the process which run on out of container, that value is based on `freeram` from sysinfo(2). > > `freeram` is equivalent to `MemFree` in `/proc/meminfo` [1]. However it means just a free RAM. We should use `MemAvailable` when we want to know how much memory is available for the process [2]. `MemAvailable` is available in modern Linux kernel, and it has been backported some older kernels (e.g. RHEL). In `sar` from sysstat, it refers that value and shows it as `kbavail` [3]. > > AFAIK PhysicalMemory event in JFR depends on `os::Linux::available_memory()`, and it is used in automated analysis in JMC. So the JFR/JMC user could misunderstand physical memory was exhausted even if the memory was available enough. > > [1] https://github.com/torvalds/linux/blob/c9c3395d5e3dcc6daee66c6908354d47bf98cb0c/fs/proc/meminfo.c#L59 > [2] https://docs.kernel.org/filesystems/proc.html?highlight=memavailable > [3] https://github.com/sysstat/sysstat/blob/ac1df71ca252c158e8d418ded93e5ed52f5e8765/rd_stats.c#L325-L328 This pull request has now been integrated. Changeset: b6bcbc0c Author: Yasumasa Suenaga URL: https://git.openjdk.org/jdk/commit/b6bcbc0cbcb3729e4eb298f2198e0b6570e1f566 Stats: 91 lines in 10 files changed: 69 ins; 9 del; 13 mod 8305770: os::Linux::available_memory() should refer MemAvailable in /proc/meminfo Reviewed-by: stuefe, sgehwolf, rcastanedalo, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13398 From thartmann at openjdk.org Fri May 12 09:27:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 09:27:45 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Tue, 9 May 2023 14:14:15 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Refactoring Thanks for the reviews, Roland, Quan Anh and Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1545448269 From thartmann at openjdk.org Fri May 12 09:27:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 09:27:47 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v3] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Wed, 10 May 2023 11:48:24 GMT, Roland Westrelin wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactoring > > src/hotspot/share/opto/type.cpp line 3311: > >> 3309: } >> 3310: >> 3311: bool TypePtr::InterfaceSet::eq(ciInstanceKlass* k, InterfaceHandling interface_handling) const { > > Why not remove the interface_handling parameter? It doesn't seem useful. Right, I'll remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13868#discussion_r1192132501 From epeter at openjdk.org Fri May 12 09:48:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 May 2023 09:48:47 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom Message-ID: This is the second step in the `VerifyLoopOptimizations` revival. Last step: [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 Next step: [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop ------- There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. ------------- Commit messages: - 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom Changes: https://git.openjdk.org/jdk/pull/13951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305073 Stats: 30 lines in 2 files changed: 9 ins; 14 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13951/head:pull/13951 PR: https://git.openjdk.org/jdk/pull/13951 From thartmann at openjdk.org Fri May 12 09:50:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 09:50:46 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v4] In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed interface_handling argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13868/files - new: https://git.openjdk.org/jdk/pull/13868/files/d0dce7b6..d5781d8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=02-03 Stats: 9 lines in 2 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13868/head:pull/13868 PR: https://git.openjdk.org/jdk/pull/13868 From dzhang at openjdk.org Fri May 12 11:17:47 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 12 May 2023 11:17:47 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API Message-ID: Hi all, We have added support for Extract, Compress, Expand and other nodes for Vector API. It was implemented by referring to RVV v1.0 [1]. Please take a look and have some reviews. Thanks a lot. In this PR, we will support these new nodes: CompressM/CompressV/ExpandV LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked Extract VectorLongToMask/VectorMaskToLong PopulateIndex VectorLongToMask/VectorMaskToLong VectorMaskTrueCount/VectorMaskFirstTrue VectorInsert At the same time, we refactored methods such as `match_rule_supported_vector_mask`. All implemented vector nodes support mask operations by default now, so we also added mask nodes for all implemented nodes. By the way, we will implement the VectorTest node in the next PR. We can use the tests under `test/jdk/jdk/incubator/vector` to print the compilation log for most of the new nodes. And we can use the following command to print the compilation log of a jtreg test case: $ jtreg \ -v:default \ -concurrency:16 -timeout:50 \ -javaoption:-XX:+UnlockExperimentalVMOptions \ -javaoption:-XX:+UseRVV \ -javaoption:-XX:+PrintOptoAssembly \ -javaoption:-XX:LogFile=log_name.log \ -jdk:build/linux-riscv64-server-fastdebug/jdk \ -compilejdk:build/linux-x86_64-server-release/images/jdk \ ### CompressM/CompressV/ExpandV There is no inverse vdecompress provided in RVV, as this operation can be readily synthesized using iota and a masked vrgather in `ExpandV`. We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit these nodes and the compilation log is as follows: ## CompressM 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm 2ae mcompress V0, V30 # KILL R30 2c2 vstoremask V2, V0 2ce storeV [R7], V2 # vector (rvv) 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 ## CompressV 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm 0f2 vcompress V1, V2, V0 0fe storeV [R7], V1 # vector (rvv) 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 ## ExpandV 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm 0f2 vexpand V3, V2, V0 102 storeV [R7], V3 # vector (rvv) 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked We use the vsoxei32_v instruction regardless of what sew is set to. The indexMap in fromArray is an int array, so the index is always 32 bits. Because index stores the index value, and vs2 of vsoxei32_v requires an offset, we need to multiply the value corresponding to idx by the number of bytes of data width. We can use `test/jdk/jdk/incubator/vector/Float256VectorLoadStoreTests.java` to emit these nodes and the compilation log is as follows: ## LoadVectorGather 7ee B56: # out( B26 ) <- in( B55 ) Freq: 338.569 7ee spill [sp, #144] -> R7 # spill size = 64 7f0 spill [sp, #192] -> V3 # vector spill size = 256 7f8 gather_load V1, [R7], V3 # KILL V2 808 j B26 #@branch ## StoreVectorScatter 290 loadV V1, [R7] # vector (rvv) 298 addi R7, R8, #16 # ptr, #@addP_reg_imm 29c spill [sp, #32] -> V3 # vector spill size = 256 2a4 scatter_store [R7], V3, V1 # KILL V2 2b4 # pop frame 208 ## LoadVectorGatherMasked 41a addi R30, R10, #16 # ptr, #@addP_reg_imm 41e spill [sp, #48] -> V3 # vector spill size = 256 426 gather_load_masked V1, [R7], V3, V0 # KILL V2 43a storeV [R28], V1 # vector (rvv) 442 bgeu R30, R29, B46 #@cmpP_branch P=0.000100 C=-1.000000 ## StoreVectorScatterMasked 2ae vloadmask V0, V1 2b6 spill [sp, #8] -> R7 # spill size = 64 2b8 addi R7, R7, #16 # ptr, #@addP_reg_imm 2ba spill [sp, #48] -> V3 # vector spill size = 256 2c2 scatter_store_masked [R7], V3, V2, V0 # KILL V1 2d2 # pop frame 224 ### Extract Extract is used to return the element from a vector with the given index. We can use `test/jdk/jdk/incubator/vector/*MaxVectorTests.java` to emit these nodes and the compilation log is as follows: ## Extract 0fa loadV V1, [R11] # vector (rvv) 102 add R11, R19, R30 # ptr, #@addP_reg_reg 106 extract R15, V1, #0 # KILL V2 112 extract R12, V1, #1 # KILL V2 122 extract R13, V1, #2 # KILL V2 132 bgeu R14, R7, B44 #@cmpU_branch P=0.000001 C=-1.000000 ## ExtractL 0fa loadV V1, [R11] # vector (rvv) 102 add R11, R19, R28 # ptr, #@addP_reg_reg 106 extractL R15, V1, #0 # KILL V2 112 extractL R13, V1, #1 # KILL V2 122 extractL R14, V1, #2 # KILL V2 132 bgeu R7, R10, B44 #@cmpU_branch P=0.000001 C=-1.000000 ## ExtractF 0fa loadV V1, [R12] # vector (rvv) 102 add R12, R19, R28 # ptr, #@addP_reg_reg 106 extractF F0, V1, #0 # KILL V2 112 extractF F2, V1, #1 # KILL V2 122 extractF F1, V1, #2 # KILL V2 132 bgeu R7, R11, B44 #@cmpU_branch P=0.000001 C=-1.000000 ## ExtractD 0fa loadV V1, [R13] # vector (rvv) 102 add R13, R19, R28 # ptr, #@addP_reg_reg 106 extractD F0, V1, #0 # KILL V2 112 extractD F1, V1, #1 # KILL V2 122 extractD F2, V1, #2 # KILL V2 132 bgeu R7, R12, B44 #@cmpU_branch P=0.000001 C=-1.000000 ### AndV/OrV/XorV masked We can use `Byte128VectorTests.java` to emit these nodes and the compilation log is as follows: ## AndV masked 1d0 B30: # out( B57 B31 ) <- in( B29 ) Freq: 75.1104 1d0 loadV V3, [R15] # vector (rvv) 1d8 vloadmask V0, V1 1e0 vand_masked V2, V3, V0 1e8 spill [sp, #48] -> R14 # spill size = 64 1ea add R14, R14, R31 # ptr, #@addP_reg_reg 1ec addi R31, R14, #16 # ptr, #@addP_reg_imm 1f0 bgeu R9, R29, B57 #@cmpU_branch P=0.000001 C=-1.000000 ## OrV masked 1d0 B30: # out( B57 B31 ) <- in( B29 ) Freq: 75.1104 1d0 loadV V3, [R15] # vector (rvv) 1d8 vloadmask V0, V1 1e0 vor_masked V2, V3, V0 1e8 spill [sp, #48] -> R14 # spill size = 64 1ea add R14, R14, R31 # ptr, #@addP_reg_reg 1ec addi R31, R14, #16 # ptr, #@addP_reg_imm 1f0 bgeu R9, R29, B57 #@cmpU_branch P=0.000001 C=-1.000000 ## XorV masked 1d0 B30: # out( B57 B31 ) <- in( B29 ) Freq: 75.1104 1d0 loadV V3, [R15] # vector (rvv) 1d8 vloadmask V0, V1 1e0 vxor_masked V2, V3, V0 1e8 spill [sp, #48] -> R14 # spill size = 64 1ea add R14, R14, R31 # ptr, #@addP_reg_reg 1ec addi R31, R14, #16 # ptr, #@addP_reg_imm 1f0 bgeu R9, R29, B57 #@cmpU_branch P=0.000001 C=-1.000000 ### VectorLongToMask/VectorMaskToLong We can use `VectorMaskLoadStoreTest.java` and `Float256VectorTests.java` to emit these nodes and the compilation log is as follows: ## VectorLongToMask 05e B3: # out( B29 B4 ) <- in( B22 B2 ) Freq: 1 05e vmask_fromlong V0, R30 066 vstoremask V1, V0 072 addi R7, R10, #16 # ptr, #@addP_reg_imm 076 storeV [R7], V1 # vector (rvv) ## VectorMaskToLong 064 addi R7, R7, #16 # ptr, #@addP_reg_imm 066 loadV V1, [R7] # vector (rvv) 06e vloadmask V0, V1 076 vmask_tolong R7, V0 084 li R29, #8 # int, #@loadConI 086 bgeu R12, R29, B5 #@cmpU_branch P=0.000001 C=-1.000000 ### PopulateIndex We need `PopulateIndexNode` to enable the vectorization of operations with loop induction variable by extending current scope of C2 superword vectorizable packs, just like [JDK-8280510](https://bugs.openjdk.java.net/browse/JDK-8280510). With this we can vectorize some operations in loop with the induction variable operand, such as below. for (int i = 0; i < count; i++) { b[i] = a[i] * i; } Final compilation log for above loop expression is like below. add R16, R12, R15 # ptr, #@addP_reg_reg addi R17, R16, #16 # ptr, #@addP_reg_imm loadV V1, [R17] # vector (rvv) add R15, R14, R15 # ptr, #@addP_reg_reg addi R17, R15, #16 # ptr, #@addP_reg_imm addiw R18, R30, #8 #@addI_reg_imm populateindex V3, R30, #1 # KILL V2, R9 vmul.vv V1, V3, V1 #@vmulI storeV [R17], V1 # vector (rvv) Hotspot jtreg has existing tests in `compiler/c2/cr7192963/Test*Vect.java` and will be all passed. ### VectorLongToMask/VectorMaskToLong We can use `VectorMaskLoadStoreTest.java` and `Float256VectorTests.java` to emit these nodes and the compilation log is as follows: ## VectorLongToMask 05e B3: # out( B29 B4 ) <- in( B22 B2 ) Freq: 1 05e vmask_fromlong V0, R30 066 vstoremask V1, V0 072 addi R7, R10, #16 # ptr, #@addP_reg_imm 076 storeV [R7], V1 # vector (rvv) ## VectorMaskToLong 064 addi R7, R7, #16 # ptr, #@addP_reg_imm 066 loadV V1, [R7] # vector (rvv) 06e vloadmask V0, V1 076 vmask_tolong R7, V0 084 li R29, #8 # int, #@loadConI 086 bgeu R12, R29, B5 #@cmpU_branch P=0.000001 C=-1.000000 ### VectorMaskTrueCount/VectorMaskFirstTrue We can use `Double128VectorTests.java` to emit these nodes and the compilation log is as follows: ## VectorMaskTrueCount 050 addi R7, R7, #16 # ptr, #@addP_reg_imm 052 loadV V1, [R7] # vector (rvv) 05a vloadmask V0, V1 062 vmask_truecount R10, V0 06a # pop frame 32 ## VectorMaskFirstTrue 070 loadV V1, [R7] # vector (rvv) 078 vmv.v.i V2, #0 #@replicateL_imm5 080 spill V1 -> V3 # vector spill size = 256 084 # reinterpret V3 # do nothing 084 vmaskcmp V0, V3, V2, #4 090 vmask_firsttrue R8, V0 # KILL V30 09c li R28, #2 # int, #@loadConI 09e bge R8, R28, B42 #@cmpI_branch P=0.000000 C=5952.000000 ### VectorInsert We can use `test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java` to emit lt32 node and the compilation log is as follows: 05e B4: # out( B13 B5 ) <- in( B3 ) Freq: 0.999997 05e loadV V1, [R30] # vector (rvv) 066 li R28, #0 # int, #@loadConI 068 lwu R29, [R7, #120] # loadN, compressed ptr, #@loadN ! Field: TestVectorInsertByte.rb 06c decode_heap_oop R29, R29 #@decodeHeapOop 06e insertI_index_lt32 V1, V1, R28, #0 082 lwu R7, [R29, #12] # range, #@loadRange 086 NullCheck R29 In order to cover the case where idx is greater than 31, we need to modify `TestVectorInsertByte.java`? diff --git a/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java b/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java index 7969b7bea40..480d6bec074 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java +++ b/test/hotspot/jtreg/compiler/vectorapi/TestVectorInsertByte.java @@ -51,7 +51,7 @@ public class TestVectorInsertByte { static void testByteVectorInsert() { ByteVector av = ByteVector.fromArray(SPECIESb, ab, 0); - av = av.withLane(0, (byte) (0)); + av = av.withLane(32, (byte) (0)); av.intoArray(rb, 0); } Then the compilation log is as follows: 060 B4: # out( B13 B5 ) <- in( B3 ) Freq: 0.999997 060 loadV V1, [R30] # vector (rvv) 068 li R28, #0 # int, #@loadConI 06a lwu R29, [R7, #120] # loadN, compressed ptr, #@loadN ! Field: TestVectorInsertByte.rb 06e decode_heap_oop R29, R29 #@decodeHeapOop 070 insertI_index V1, V1, R28, #32 # KILL R7, V2 088 lwu R28, [R29, #12] # range, #@loadRange 08c NullCheck R29 ### MaskAll masked SVE can use the case `shuffleTest()` in `Int64VectorTests.java` to emit vmaskAllI_masked, and the function `vector_needs_partial_operations` will judge and emit masked vmaskAllI node. RISC-V uses vsetvl to set vector element length, so we do not need partial operations. But we can use `vector_needs_partial_operations` to cover vmaskAllI_masked this point. Apply patch: diff --git a/src/hotspot/cpu/riscv/riscv.ad b/src/hotspot/cpu/riscv/riscv.ad index 6c5ceb9c359..b4ef13768fc 100644 --- a/src/hotspot/cpu/riscv/riscv.ad +++ b/src/hotspot/cpu/riscv/riscv.ad @@ -1968,7 +1968,19 @@ const bool Matcher::match_rule_supported_vector_masked(int opcode, int vlen, Bas } const bool Matcher::vector_needs_partial_operations(Node* node, const TypeVect* vt) { - return false; + if (UseRVV == 0) { + return false; + } + switch(node->Opcode()) { + case Op_MaskAll: + return !node->in(1)->is_Con(); + default: + return false; + } } const bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) { Then the compilation log is as follows: 0c8 B7: # out( B13 B8 ) <- in( B12 B6 ) Freq: 0.999999 0c8 addi R7, R30, #16 # ptr, #@addP_reg_imm 0cc vmask_gen_imm V0, #2 0d4 vmaskAllI_masked V30, R31, V0 # KILL V1 0e4 spill V30 -> V0 # vmask spill size = 32 0e8 vstoremask V1, V0 # elem size is #4 byte[s] 0f4 storeV [R7], V1 # vector (rvv) [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc ## Testing: qemu with UseRVV: - [ ] Tier1 tests (release) - [ ] Tier2 tests (release) - [ ] Tier3 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) - [x] test/hotspot/jtreg/compiler/c2/cr7192963/Test*Vect.java ------------- Commit messages: - Remove VectorTest - Merge remote-tracking branch 'upstream/master' into JDK-8307609 - Optimize vmask_gen_imm - Add VectorTest - FFix some vsetvli_helper location - Remove useless INSN and simplify gather load - Refactor match_rule_supported_vector - 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API Changes: https://git.openjdk.org/jdk/pull/13862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307609 Stats: 1591 lines in 6 files changed: 1425 ins; 107 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From thartmann at openjdk.org Fri May 12 14:14:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 14:14:47 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v3] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:33:10 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's review Great work in cleaning up / renaming this complex code, Christian. The changes look good to me. Also, nice blog post. It was very helpful to refresh my memory. src/hotspot/share/opto/loopPredicate.cpp line 46: > 44: * uncommon trap on the entry path to the loop. The old check inside the loop can be eliminated. If the condition of the > 45: * Hoisted Predicate fails at runtime, we'll execute the uncommon trap to avoid entering the loop which misses the check. > 46: * Loop Predication can currently remove array range check and loop invariant checks (such as null checks). Suggestion: * Loop Predication can currently remove array range checks and loop invariant checks (such as null checks). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13864#pullrequestreview-1424564260 PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1192410184 From thartmann at openjdk.org Fri May 12 14:18:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 14:18:45 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils In-Reply-To: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Wed, 10 May 2023 18:26:40 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Looks good to me otherwise. src/utils/hsdis/binutils/hsdis-binutils.c line 2: > 1: /* > 2: 3* Copyright (c) 2008, 2023, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2008, 2023, Oracle and/or its affiliates. All rights reserved. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13915#pullrequestreview-1424602363 PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1192434582 From thartmann at openjdk.org Fri May 12 14:29:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 14:29:03 GMT Subject: RFR: 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:02:45 GMT, Afshin Zafari wrote: > - The `finalize()` method is replaced with `cleanup()`. > - A new constructor is added to register the cleanup method. > - A static `Cleaner` is defined to have only one cleaner thread for all the 15000 instances. Otherwise, we get an `OutOfMemoryException` on cleaner thread creation. The fix looks good to me. Not sure though how much sense that test still makes after the AOT removal but I guess keeping it does not hurt. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13886#pullrequestreview-1424619342 From azafari at openjdk.org Fri May 12 14:29:06 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 12 May 2023 14:29:06 GMT Subject: RFR: 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 In-Reply-To: <5gQOMR8BxQ6Tc4eFb2901c5Ig3rIuqmNbZWY8uod2OA=.c09912db-5c46-4c67-a01f-a24871250578@github.com> References: <5gQOMR8BxQ6Tc4eFb2901c5Ig3rIuqmNbZWY8uod2OA=.c09912db-5c46-4c67-a01f-a24871250578@github.com> Message-ID: On Thu, 11 May 2023 12:34:02 GMT, Coleen Phillimore wrote: >> - The `finalize()` method is replaced with `cleanup()`. >> - A new constructor is added to register the cleanup method. >> - A static `Cleaner` is defined to have only one cleaner thread for all the 15000 instances. Otherwise, we get an `OutOfMemoryException` on cleaner thread creation. > > I think this looks good, and still tests what the original test failure was. Unless the bug was with the _return_register_finalizer bytecode, but I don't think that's the case. Thank you @coleenp and @TobiHartmann for your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13886#issuecomment-1545830843 From azafari at openjdk.org Fri May 12 14:29:06 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 12 May 2023 14:29:06 GMT Subject: Integrated: 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:02:45 GMT, Afshin Zafari wrote: > - The `finalize()` method is replaced with `cleanup()`. > - A new constructor is added to register the cleanup method. > - A static `Cleaner` is defined to have only one cleaner thread for all the 15000 instances. Otherwise, we get an `OutOfMemoryException` on cleaner thread creation. This pull request has now been integrated. Changeset: 39dc40fe Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/39dc40fed4e1af3e77355fa9f4abb0c72279a140 Stats: 13 lines in 1 file changed: 11 ins; 0 del; 2 mod 8305081: Remove finalize() from test/hotspot/jtreg/compiler/runtime/Test8168712 Reviewed-by: coleenp, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13886 From xuelei at openjdk.org Fri May 12 14:53:02 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 12 May 2023 14:53:02 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Fri, 12 May 2023 14:13:22 GMT, Tobias Hartmann wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> update typo in copyright statement > > src/utils/hsdis/binutils/hsdis-binutils.c line 2: > >> 1: /* >> 2: 3* Copyright (c) 2008, 2023, Oracle and/or its affiliates. All rights reserved. > > Suggestion: > > * Copyright (c) 2008, 2023, Oracle and/or its affiliates. All rights reserved. Oops. Thank you for the catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1192472311 From xuelei at openjdk.org Fri May 12 14:52:59 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 12 May 2023 14:52:59 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: update typo in copyright statement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13915/files - new: https://git.openjdk.org/jdk/pull/13915/files/6a7d4e69..e585bbf0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13915&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13915&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13915.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13915/head:pull/13915 PR: https://git.openjdk.org/jdk/pull/13915 From xuelei at openjdk.org Fri May 12 14:55:53 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Fri, 12 May 2023 14:55:53 GMT Subject: Integrated: 8307855: update for deprecated sprintf for src/utils In-Reply-To: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Wed, 10 May 2023 18:26:40 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > May I have this update reviewed? > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei This pull request has now been integrated. Changeset: 4b0f4213 Author: Xue-Lei Andrew Fan URL: https://git.openjdk.org/jdk/commit/4b0f4213a566c3c6d49c034ab6e022c93c4289b1 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod 8307855: update for deprecated sprintf for src/utils Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13915 From cslucas at openjdk.org Fri May 12 21:09:01 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 May 2023 21:09:01 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR review 5: refactor on rematerialization & add tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/542c5ef1..68694126 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=11-12 Stats: 225 lines in 10 files changed: 98 ins; 97 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Fri May 12 21:09:04 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 May 2023 21:09:04 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <8pyn8ASJ6-PLoNIfI9FGvA6rfZXpc3Ud4hDWpesNlxg=.de6be879-e4cf-45a2-beca-00d7f3cd7429@github.com> On Tue, 9 May 2023 00:03:26 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: > * new list of objects which enumerates all scalarized instances which needs to be rematerialized; > * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). > > It should be performed before `rematerialize_objects`. > > By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. @iwanowww - I pushed some changes to address your feedback about the rematerialization part. I added only two more tests for now, but I'm working on adding others. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1546298856 From sviswanathan at openjdk.org Fri May 12 21:52:53 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 May 2023 21:52:53 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Fri, 12 May 2023 07:38:42 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopopts.cpp line 4210: >> >>> 4208: if (use != phi && ctrl_or_self(use) == cl) { >>> 4209: DEBUG_ONLY( current->dump(-1); ) >>> 4210: assert(false, "reduction has use inside loop"); >> >> I have been wondering, it is right to bailout here from the optimization but why do we assert here? It is perfectly legal (if not very meaningful) to have a scalar use of the last unordered reduction within the loop. This will still auto vectorize as the reduction is to a scalar. e.g. a slight modification of the SumRed_Int.java still auto vectorizes and has a use of the last unordered reduction within the loop: >> public static int sumReductionImplement( >> int[] a, >> int[] b, >> int[] c, >> int total) { >> int sum = 0; >> for (int i = 0; i < a.length; i++) { >> total += (a[i] * b[i]) + (a[i] * c[i]) + (b[i] * c[i]); >> sum = total + i; >> } >> return total + sum; >> } >> Do you think this is a valid concern? > > I will add this as a regression test, and remove that assert. Thanks @sviswa7 for making me look at this more closely :) > > Still, I think it may be valuable to keep these two asserts - both indicate that something strange has happened: > > https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4210 > > https://github.com/openjdk/jdk/blob/31d977c21f7a2b62fb8123bc7967731aa961e373/src/hotspot/share/opto/loopopts.cpp#L4199 Sounds good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1192820468 From kbarrett at openjdk.org Sun May 14 00:49:55 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 14 May 2023 00:49:55 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Fri, 12 May 2023 14:52:59 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > update typo in copyright statement A couple of issues noticed after this PR has been integrated. src/utils/hsdis/binutils/hsdis-binutils.c line 222: > 220: const decode_func_stype decode_func_address = &decode_instructions; > 221: > 222: #define remaining_buflen(buf, bufsize, p) ((bufsize) - ((p) - (buf))) This shouldn't be a macro. Among other reasons, see the style guide. src/utils/hsdis/binutils/hsdis-binutils.c line 251: > 249: if (type) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " type='%s'", type); > 250: if (dsize) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " dsize='%d'", dsize); > 251: if (delays) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " delay='%d'", delays); What is the value for p that is used in the call to remaining_buflen? It is being assumed to be the value after the assignment by the first argument. However, according to the standard, it is unspecified, and the whole snprintf call invokes UB. This is because there aren't any sequence points between the update of p in the first argument and that reference. (C++17 changes this, but we aren't using C++17 yet.) ------------- PR Review: https://git.openjdk.org/jdk/pull/13915#pullrequestreview-1425474314 PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1193056435 PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1193056453 From fyang at openjdk.org Mon May 15 02:02:54 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 15 May 2023 02:02:54 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API In-Reply-To: References: Message-ID: On Mon, 8 May 2023 11:04:09 GMT, Dingli Zhang wrote: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Some initial comments from a cursory look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1635: > 1633: > 1634: // Set dst to NaN if any NaN input. > 1635: void C2_MacroAssembler::minmax_fp_masked_v(VectorRegister dst_src1, VectorRegister src2, Better to break down `dst_src1` into two seperate operands, i.e., `dst` and `src1`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1644: > 1642: // Check vector elements of src1 and src2 for quiet and signaling NaN. > 1643: vfclass_v(tmp1, dst_src1); > 1644: vfclass_v(tmp2, src2); As discussed offline, a better way for finding NaN from the vector elements is with `vmfne` instruction, like: `vmfeq.vv v0, va, va`. vmfne writes 1 to the destination element when the corresponding element of `va` is NaN. src/hotspot/cpu/riscv/riscv_v.ad line 4134: > 4132: __ vsetvli_helper(bt, Matcher::vector_length(this)); > 4133: __ vid_v(as_VectorRegister($v0$$reg)); > 4134: __ mv($tmp1$$Register, (int)($idx$$constant)); Suggestion: make `idx` an register input operand and eliminate this `mv` instruction and maybe the `tmp1` register reserved. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1425664153 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193257710 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193261133 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193256663 From duke at openjdk.org Mon May 15 03:05:53 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 15 May 2023 03:05:53 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon Message-ID: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. For example, var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); m.not().trueCount(); will produce following assembly on a Neon machine before this patch: ... mvn v16.16b, v16.16b // VectorMask.not() xtn v16.4h, v16.4s xtn v16.8b, v16.8h neg v16.8b, v16.8b // VectorStoreMask addv b17, v16.8b umov w0, v17.b[0] // VectorMask.trueCount() ... After this patch: ... mvn v16.16b, v16.16b // VectorMask.not() addv s17, v16.4s smov x0, v17.b[0] neg x0, x0 // Optimized VectorMask.trueCount() ... In this case, we can save two xtn insns. Performance: Benchmark Before After Unit testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 [2]: https://github.com/openjdk/jdk/blob/f968da97a5a5c68c28ad29d13fdfbe3a4adf5ef7/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4841 [3]: https://developer.arm.com/documentation/dui0801/h/A64-SIMD-Vector-Instructions/XTN--XTN2--vector- [4]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() ------------- Commit messages: - 8307795: AArch64: Optimize VectorMask.truecount() on Neon Changes: https://git.openjdk.org/jdk/pull/13974/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307795 Stats: 240 lines in 5 files changed: 240 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13974/head:pull/13974 PR: https://git.openjdk.org/jdk/pull/13974 From thartmann at openjdk.org Mon May 15 05:17:55 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:17:55 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: <81TE6e3xzQ8gpHtab6tmrvRM8SYaE16lStVB14LKMMo=.0cc1991d-a723-4263-b51a-39422249e62e@github.com> On Fri, 12 May 2023 14:52:59 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > update typo in copyright statement The change is also missing a second review. I'm backing it out for now with https://github.com/openjdk/jdk/pull/13975. Let's redo with [JDK-8308071](https://bugs.openjdk.org/browse/JDK-8308071). Thanks, Tobias ------------- PR Comment: https://git.openjdk.org/jdk/pull/13915#issuecomment-1547205741 From thartmann at openjdk.org Mon May 15 05:19:51 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:19:51 GMT Subject: RFR: 8308072: [BACKOUT] update for deprecated sprintf for src/utils Message-ID: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> Clean backout of https://github.com/openjdk/jdk/pull/13915 / [JDK-8307855](https://bugs.openjdk.org/browse/JDK-8307855) because @kimbarrett noticed some issues after integration. Also the change is non-trivial and therefore missing a second review. Thanks, Tobias ------------- Commit messages: - 8308072: [BACKOUT] update for deprecated sprintf for src/utils Changes: https://git.openjdk.org/jdk/pull/13975/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13975&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308072 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13975.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13975/head:pull/13975 PR: https://git.openjdk.org/jdk/pull/13975 From thartmann at openjdk.org Mon May 15 05:23:48 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:23:48 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v4] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Fri, 12 May 2023 09:50:46 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed interface_handling argument I'm seeing issues in higher tier testing. Investigating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1547209856 From iris at openjdk.org Mon May 15 05:45:52 2023 From: iris at openjdk.org (Iris Clark) Date: Mon, 15 May 2023 05:45:52 GMT Subject: RFR: 8308072: [BACKOUT] update for deprecated sprintf for src/utils In-Reply-To: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> References: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> Message-ID: On Mon, 15 May 2023 05:13:00 GMT, Tobias Hartmann wrote: > Clean backout of https://github.com/openjdk/jdk/pull/13915 / [JDK-8307855](https://bugs.openjdk.org/browse/JDK-8307855) because @kimbarrett noticed some issues after integration. Also the change is non-trivial and therefore missing a second review. > > Thanks, > Tobias Verified this PR is a backout of the earlier change. ------------- Marked as reviewed by iris (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13975#pullrequestreview-1425795219 From thartmann at openjdk.org Mon May 15 05:45:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:45:53 GMT Subject: RFR: 8308072: [BACKOUT] update for deprecated sprintf for src/utils In-Reply-To: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> References: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> Message-ID: On Mon, 15 May 2023 05:13:00 GMT, Tobias Hartmann wrote: > Clean backout of https://github.com/openjdk/jdk/pull/13915 / [JDK-8307855](https://bugs.openjdk.org/browse/JDK-8307855) because @kimbarrett noticed some issues after integration. Also the change is non-trivial and therefore missing a second review. > > Thanks, > Tobias Thanks for the quick review, Iris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13975#issuecomment-1547222559 From thartmann at openjdk.org Mon May 15 05:45:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:45:54 GMT Subject: Integrated: 8308072: [BACKOUT] update for deprecated sprintf for src/utils In-Reply-To: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> References: <__uRbegoIjzZg2m9xpCS17L4InOWaCw5aN0T1JTqojc=.5f485b78-f46c-40be-b930-38ec836b3181@github.com> Message-ID: <75WW5eQV2JZyNjQKcuMLvmbHIbp_NkotYWYCd8F4Sn8=.b76aed95-e051-436f-bdd5-d31163a65a6f@github.com> On Mon, 15 May 2023 05:13:00 GMT, Tobias Hartmann wrote: > Clean backout of https://github.com/openjdk/jdk/pull/13915 / [JDK-8307855](https://bugs.openjdk.org/browse/JDK-8307855) because @kimbarrett noticed some issues after integration. Also the change is non-trivial and therefore missing a second review. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 8d49ba9e Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8d49ba9e8d3095f850b3007b56488a0c0cf8ddff Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod 8308072: [BACKOUT] update for deprecated sprintf for src/utils Reviewed-by: iris ------------- PR: https://git.openjdk.org/jdk/pull/13975 From chagedorn at openjdk.org Mon May 15 06:09:47 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 May 2023 06:09:47 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v4] In-Reply-To: References: Message-ID: > This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. > > To make reviewing the entire change easier, I've decided to split the work into several PRs. > > This first PR includes the following _semantic-preserving_ changes: > - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: > - Updating the code (variables, method names etc.) accordingly. > - Renaming "Skeleton Predicates" to "Assertion Predicates". > - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. > - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). > - Change `class Predicates` -> `class ParsePredicates`. > - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). > - Removing unused variables. > - Removing unnecessary checks. > - Code style fixes in touched code. > > Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. > > The blog post can be found on my Github page at: > https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html > > Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopPredicate.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13864/files - new: https://git.openjdk.org/jdk/pull/13864/files/97207be4..8f80a6e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13864/head:pull/13864 PR: https://git.openjdk.org/jdk/pull/13864 From chagedorn at openjdk.org Mon May 15 06:09:49 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 May 2023 06:09:49 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v3] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:33:10 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's review Thanks a lot Tobias for your careful review and your offline feedback about the blog! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13864#issuecomment-1547239917 From xuelei at openjdk.org Mon May 15 06:15:54 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 15 May 2023 06:15:54 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: <5kuqiwFH_hV37tEZsdZxYziXhlM1kv2XLngoBrbY-mI=.3fb2435a-1362-48e6-93d6-64422e20f244@github.com> On Fri, 12 May 2023 14:52:59 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> May I have this update reviewed? >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > update typo in copyright statement > The change is also missing a second review. I'm backing it out for now with #13975. Let's redo with [JDK-8308071](https://bugs.openjdk.org/browse/JDK-8308071). > > Thanks, Tobias Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13915#issuecomment-1547248097 From xuelei at openjdk.org Mon May 15 06:15:56 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 15 May 2023 06:15:56 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Sun, 14 May 2023 00:43:51 GMT, Kim Barrett wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> update typo in copyright statement > > src/utils/hsdis/binutils/hsdis-binutils.c line 222: > >> 220: const decode_func_stype decode_func_address = &decode_instructions; >> 221: >> 222: #define remaining_buflen(buf, bufsize, p) ((bufsize) - ((p) - (buf))) > > This shouldn't be a macro. Among other reasons, see the style guide. May I have an reference to the style guide? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1193366077 From xuelei at openjdk.org Mon May 15 06:31:59 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 15 May 2023 06:31:59 GMT Subject: RFR: 8307855: update for deprecated sprintf for src/utils [v2] In-Reply-To: References: <_y76ehEcXfM8bS00DzlNsVoOL3MUIp83ReGFdxsFlDA=.f0fc3530-3e27-49b5-a4f7-4082f2d7d06c@github.com> Message-ID: On Sun, 14 May 2023 00:44:45 GMT, Kim Barrett wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> update typo in copyright statement > > src/utils/hsdis/binutils/hsdis-binutils.c line 251: > >> 249: if (type) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " type='%s'", type); >> 250: if (dsize) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " dsize='%d'", dsize); >> 251: if (delays) snprintf(p += strlen(p), remaining_buflen(buf, bufsize, p), " delay='%d'", delays); > > What is the value for p that is used in the call to remaining_buflen? > It is being assumed to be the value after the assignment by the first > argument. However, according to the standard, it is unspecified, and > the whole snprintf call invokes UB. This is because there aren't any > sequence points between the update of p in the first argument and that > reference. (C++17 changes this, but we aren't using C++17 yet.) For the 1st line (line 249), it is fine to use bufsize directly. The p in remaining_buflen is used to calculate the remaining length of the target buffer. To know the remaining buffer length, I need to know the pointer to the beginning buffer (the 'buf' parameter), the pointer to the beginning of the unused buffer (the 'p' parameter), and the total size of the buffer (the 'bufsize' parameter). The initial value of p is assigned to buf(p = buf), and then move forward by the used size ( p += strlen(p)). It's a good point to me about the update sequence of p, and I will make an update in a new PR. Thank you very much! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13915#discussion_r1193376513 From thartmann at openjdk.org Mon May 15 06:41:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 06:41:01 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v5] In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8303512 - Removed interface_handling argument - Refactoring - Eager computation to avoid racy update of remaining fields - Re-ordering of _computed fields initialization - Reverted unrelated change - 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13868/files - new: https://git.openjdk.org/jdk/pull/13868/files/d5781d8a..fbe7578f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13868&range=03-04 Stats: 101511 lines in 1647 files changed: 81705 ins; 7930 del; 11876 mod Patch: https://git.openjdk.org/jdk/pull/13868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13868/head:pull/13868 PR: https://git.openjdk.org/jdk/pull/13868 From dzhang at openjdk.org Mon May 15 07:10:00 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 15 May 2023 07:10:00 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v2] In-Reply-To: References: Message-ID: <1Wn7SkV0tLChp1w4oatKzJW9aqfS8ODnFFAMr7er6O4=.16bc3d64-cf7c-4494-9621-2b780e9158ba@github.com> > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge master and resolve conflict - Optimize call point of vfclass and adjust the parameters of c2 instruct - Remove VectorTest - Merge remote-tracking branch 'upstream/master' into JDK-8307609 - Optimize vmask_gen_imm - Add VectorTest - FFix some vsetvli_helper location - Remove useless INSN and simplify gather load - Refactor match_rule_supported_vector - 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API ------------- Changes: https://git.openjdk.org/jdk/pull/13862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=01 Stats: 1686 lines in 6 files changed: 1448 ins; 141 del; 97 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From dzhang at openjdk.org Mon May 15 07:10:03 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 15 May 2023 07:10:03 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 01:50:57 GMT, Fei Yang wrote: >> Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge master and resolve conflict >> - Optimize call point of vfclass and adjust the parameters of c2 instruct >> - Remove VectorTest >> - Merge remote-tracking branch 'upstream/master' into JDK-8307609 >> - Optimize vmask_gen_imm >> - Add VectorTest >> - FFix some vsetvli_helper location >> - Remove useless INSN and simplify gather load >> - Refactor match_rule_supported_vector >> - 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1635: > >> 1633: >> 1634: // Set dst to NaN if any NaN input. >> 1635: void C2_MacroAssembler::minmax_fp_masked_v(VectorRegister dst_src1, VectorRegister src2, > > Better to break down `dst_src1` into two seperate operands, i.e., `dst` and `src1`. Fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1644: > >> 1642: // Check vector elements of src1 and src2 for quiet and signaling NaN. >> 1643: vfclass_v(tmp1, dst_src1); >> 1644: vfclass_v(tmp2, src2); > > As discussed offline, a better way for finding NaN from the vector elements is with `vmfne` instruction, like: `vmfeq.vv v0, va, va`. vmfne writes 1 to the destination element when the corresponding element of `va` is NaN. Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4134: > >> 4132: __ vsetvli_helper(bt, Matcher::vector_length(this)); >> 4133: __ vid_v(as_VectorRegister($v0$$reg)); >> 4134: __ mv($tmp1$$Register, (int)($idx$$constant)); > > Suggestion: make `idx` an register input operand and eliminate this `mv` instruction and maybe the `tmp1` register reserved. Thanks for the review! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193407301 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193407402 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193407151 From roland at openjdk.org Mon May 15 07:33:51 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 15 May 2023 07:33:51 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v4] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 06:09:47 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Tobias Hartmann Looks good to me. Great to see this move forward. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13864#pullrequestreview-1425928290 From epeter at openjdk.org Mon May 15 07:38:48 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 07:38:48 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Thu, 11 May 2023 08:23:46 GMT, Fei Gao wrote: >> @fg1417 thanks for the suggestion about running with the flags over all jtreg. I'll do that now... > >> The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. >> >> But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. >> >> How would you improve my comments? > > Thanks for your clarification. > > Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: > > // > // But with these two cases, which `VectorMaskCmp` interprets as ordered, > // we must convert the unordered into an ordered comparison: > // BoolTest::lt: Case -1 -> LT_U > // BoolTest::le: Case -1, 0 -> LE_U > // @fg1417 Are you ok with how I worded it now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1193439482 From dzhang at openjdk.org Mon May 15 07:42:08 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 15 May 2023 07:42:08 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v3] In-Reply-To: References: Message-ID: <9-UTXovi11YVdXBKzXYBeta47ztEfkm6NpnOxeZOnlg=.55d971fc-d089-4b4f-9fa9-1f0f7b61daa4@github.com> > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Remove debug warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/4a15c29a..c4351ee1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From rcastanedalo at openjdk.org Mon May 15 07:45:45 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 May 2023 07:45:45 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom In-Reply-To: References: Message-ID: <8LCGAcCDvTyoO4Lo64xdn420EiJri1lTf00_CI8SReY=.c6d486ab-15f8-4643-9926-e02919b1c70d@github.com> On Fri, 12 May 2023 09:09:06 GMT, Emanuel Peter wrote: > This is the second step in the `VerifyLoopOptimizations` revival. > > Last step: > [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure > See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 > > Next step: > [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > ------- > > There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. > > **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: > We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: > https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 > > So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. Hi Emanuel, thanks for your useful work in resurrecting `VerifyLoopOptimizations`! I think the issue in `loopPredicate.cpp` should be reported (as a bug) and addressed separately, ideally (if feasible) with a corresponding test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13951#issuecomment-1547347108 From epeter at openjdk.org Mon May 15 07:46:07 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 07:46:07 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v6] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legen... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Added Matcher::match_rule_supported_vector check, removed bad assert, added test for it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/31d977c2..0a72f4c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=04-05 Stats: 197 lines in 3 files changed: 189 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From fgao at openjdk.org Mon May 15 07:49:49 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 15 May 2023 07:49:49 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Thu, 11 May 2023 08:23:46 GMT, Fei Gao wrote: >> @fg1417 thanks for the suggestion about running with the flags over all jtreg. I'll do that now... > >> The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. >> >> But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. >> >> How would you improve my comments? > > Thanks for your clarification. > > Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: > > // > // But with these two cases, which `VectorMaskCmp` interprets as ordered, > // we must convert the unordered into an ordered comparison: > // BoolTest::lt: Case -1 -> LT_U > // BoolTest::le: Case -1, 0 -> LE_U > // > @fg1417 Are you ok with how I worded it now? Oh, yes. Clear enough! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1193454363 From epeter at openjdk.org Mon May 15 08:14:46 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 08:14:46 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v4] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 06:09:47 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Tobias Hartmann src/hotspot/share/opto/loopPredicate.cpp line 150: > 148: * The Initialized Assertion Predicates are always true because we will > 149: * never enter the main loop because of the changed pre- and main-loop > 150: * exit conditions. This does still not quite sound right. We will never enter the main loop? Sounds like the main-loop is ueseless in all cases. Suggestion: The Initialized Assertion Predicates are always true: they are true when we enter the main loop (because we adjusted the pre-loop exit condition), they are true in the last iteration (because we adjust the main-loop exit condition), and they are true in all iterations in the middle by implication. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1193483805 From epeter at openjdk.org Mon May 15 08:25:44 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 08:25:44 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom In-Reply-To: <8LCGAcCDvTyoO4Lo64xdn420EiJri1lTf00_CI8SReY=.c6d486ab-15f8-4643-9926-e02919b1c70d@github.com> References: <8LCGAcCDvTyoO4Lo64xdn420EiJri1lTf00_CI8SReY=.c6d486ab-15f8-4643-9926-e02919b1c70d@github.com> Message-ID: On Mon, 15 May 2023 07:43:27 GMT, Roberto Casta?eda Lozano wrote: >> This is the second step in the `VerifyLoopOptimizations` revival. >> >> Last step: >> [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure >> See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 >> >> Next step: >> [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop >> >> ------- >> >> There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. >> >> **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: >> We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: >> https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 >> >> So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. > > Hi Emanuel, thanks for your useful work in resurrecting `VerifyLoopOptimizations`! > I think the issue in `loopPredicate.cpp` should be reported (as a bug) and addressed separately, ideally (if feasible) with a corresponding test case. @robcasloz I think this is very difficult, if it is even possible at all. The idom info may be wrong, but I'm not sure if it ever leads to a real failure. Honestly, there are too many idom / ctrl / loop "bugs" that I will encounter with this verification work. If we are worried about backports, we can split the fixes from the verification. But finding regression tests that actually mainfest in an assert or wrong execution, that will be an immense effort. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13951#issuecomment-1547408531 From asotona at openjdk.org Mon May 15 08:47:02 2023 From: asotona at openjdk.org (Adam Sotona) Date: Mon, 15 May 2023 08:47:02 GMT Subject: RFR: 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete Message-ID: Package `jdk.internal.classfile.java.lang.constant` containing `ModuleDesc` and `PackageDesc` become obsolete after [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729). All references to `jdk.internal.classfile.java.lang.constant.ModuleDesc` and `jdk.internal.classfile.java.lang.constant.PackageDesc` across all JDK sources, tests and JMH benchmarks are replaced with `java.lang.constant.ModuleDesc` and `java.lang.constant.PackageDesc`. `jdk.internal.classfile.java.lang.constant` package export hooks are removed from java.base module-info, make files and test headers. Content of `jdk.internal.classfile.java.lang.constant` package and related tests under `jdk.classfile` are deleted. Method references renamed in [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729) are fixed: - `PackageDesc::packageName` to `PackageDesc::name` - `PackageDesc::packageInternalName` to `PackageDesc::internalName` - `ModuleDesc::moduleName` to `ModuleDesc::name`. Please review this pull request. Thanks, Adam ------------- Commit messages: - 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete Changes: https://git.openjdk.org/jdk/pull/13979/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13979&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307326 Stats: 503 lines in 46 files changed: 0 ins; 446 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/13979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13979/head:pull/13979 PR: https://git.openjdk.org/jdk/pull/13979 From rcastanedalo at openjdk.org Mon May 15 09:00:44 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 May 2023 09:00:44 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom In-Reply-To: References: <8LCGAcCDvTyoO4Lo64xdn420EiJri1lTf00_CI8SReY=.c6d486ab-15f8-4643-9926-e02919b1c70d@github.com> Message-ID: <6QVhlQbXus5M_VPGB-TeOEHAOT_piUfKMz8ji_0ul28=.a13550f6-a47b-4287-9e6b-f143ee2506e3@github.com> On Mon, 15 May 2023 08:23:07 GMT, Emanuel Peter wrote: > If we are worried about backports, we can split the fixes from the verification. I think this is a good idea, yes. > But finding regression tests that actually mainfest in an assert or wrong execution, that will be an immense effort. Fair enough. A low-effort, "catch-all" idea would be to add a test with `-Xcomp -XX:+VerifyLoopOptimizations`, similarly to `TestVerifyIterativeGVN.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13951#issuecomment-1547455460 From aph at openjdk.org Mon May 15 09:00:50 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 15 May 2023 09:00:50 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon In-Reply-To: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> On Mon, 15 May 2023 02:58:46 GMT, Chang Peng wrote: > In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. > > For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. > > However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. > > This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. > > For example, > > > var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); > m.not().trueCount(); > > > will produce following assembly on a Neon machine before this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > xtn v16.4h, v16.4s > xtn v16.8b, v16.8h > neg v16.8b, v16.8b // VectorStoreMask > addv b17, v16.8b > umov w0, v17.b[0] // VectorMask.trueCount() > ... > > > After this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > addv s17, v16.4s > smov x0, v17.b[0] > neg x0, x0 // Optimized VectorMask.trueCount() > ... > > > In this case, we can save two xtn insns. > > Performance: > > Benchmark Before After Unit > testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms > testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms > testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms > > [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 > [2]: https://github.com/openjdk/jdk/b... That makes sense. Is it likely that there are more of these combined operations on vector masks that could be matched? if so, it might make sense to do the job earlier, in the C2 optimizer. test/micro/org/openjdk/bench/jdk/incubator/vector/StoreMaskTrueCount.java line 80: > 78: m = m.not(); > 79: res += m.trueCount(); > 80: } This looks like it might be removed by loop opts. I think you might need a blackhole somewhere. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13974#pullrequestreview-1426088017 PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1193540231 From duke at openjdk.org Mon May 15 09:17:46 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 15 May 2023 09:17:46 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon In-Reply-To: <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> Message-ID: <-m588oeb-3gdCctnn2qYLjqrICZREj2URSICtml_PMA=.3de2fd7e-5570-44c4-853f-028ee7af1f9e@github.com> On Mon, 15 May 2023 08:56:37 GMT, Andrew Haley wrote: > That makes sense. Is it likely that there are more of these combined operations on vector masks that could be matched? if so, it might make sense to do the job earlier, in the C2 optimizer. Thanks for your review. I have tried to optimize ```VectorMask.firstTrue()``` [1] and ```VectorMask.lastTrue()``` [2] in the same way as this patch, but these two operations are strong correlated with xtn, we cannot simply remove it. I didn't find a way to optimize these two operations, they are hignly optimized in C2 backend and there are not extra instructions. [1]: https://github.com/openjdk/jdk/blob/8d49ba9e8d3095f850b3007b56488a0c0cf8ddff/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5517 [2]: https://github.com/openjdk/jdk/blob/8d49ba9e8d3095f850b3007b56488a0c0cf8ddff/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5624 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13974#issuecomment-1547484838 From epeter at openjdk.org Mon May 15 09:24:43 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 09:24:43 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v2] In-Reply-To: References: Message-ID: <-Gl1epCb4OmOa_GkewuYxYByzo8mOvvwojGRGLASH3w=.3922230e-890d-49ae-a455-c30e77f0179f@github.com> > This is the second step in the `VerifyLoopOptimizations` revival. > > Last step: > [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure > See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 > > Bug fixing for this step: > [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate > (https://github.com/openjdk/jdk/pull/13980) > > Next step: > [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: remove bug fix, is fixed in JDK-8308084 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13951/files - new: https://git.openjdk.org/jdk/pull/13951/files/1e1abc4c..56f66361 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=00-01 Stats: 15 lines in 1 file changed: 7 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13951/head:pull/13951 PR: https://git.openjdk.org/jdk/pull/13951 From epeter at openjdk.org Mon May 15 09:26:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 09:26:58 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate Message-ID: Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. ------------- Commit messages: - 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate Changes: https://git.openjdk.org/jdk/pull/13980/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13980&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308084 Stats: 15 lines in 1 file changed: 8 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13980.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13980/head:pull/13980 PR: https://git.openjdk.org/jdk/pull/13980 From duke at openjdk.org Mon May 15 09:33:49 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 15 May 2023 09:33:49 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d @nick-arm Could you please help to review this patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1547507698 From epeter at openjdk.org Mon May 15 09:38:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 09:38:52 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v3] In-Reply-To: References: Message-ID: > This is the second step in the `VerifyLoopOptimizations` revival. > > Last step: > [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure > See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 > > Bug fixing for this step: > [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate > (https://github.com/openjdk/jdk/pull/13980) > > Next step: > [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add TestVerifyLoopOptimizations.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13951/files - new: https://git.openjdk.org/jdk/pull/13951/files/56f66361..3963e133 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=01-02 Stats: 37 lines in 1 file changed: 37 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13951/head:pull/13951 PR: https://git.openjdk.org/jdk/pull/13951 From duke at openjdk.org Mon May 15 10:06:47 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 15 May 2023 10:06:47 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon In-Reply-To: <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> Message-ID: <5NfBPdiTQS9KBSWSgJSjfwW_IT7UdKwi__REZAvtxo4=.35641a45-6e4d-4b8a-865f-76784e7cc173@github.com> On Mon, 15 May 2023 08:57:30 GMT, Andrew Haley wrote: > This looks like it might be removed by loop opts. I think you might need a blackhole somewhere. ```m``` will be updated in every iteration of this loop, so ```m``` is not a loop-invariants actually. I can see the assembly code of this loop by using JMH perfasm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1193618375 From aph at openjdk.org Mon May 15 11:01:44 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 15 May 2023 11:01:44 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon In-Reply-To: <5NfBPdiTQS9KBSWSgJSjfwW_IT7UdKwi__REZAvtxo4=.35641a45-6e4d-4b8a-865f-76784e7cc173@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> <5NfBPdiTQS9KBSWSgJSjfwW_IT7UdKwi__REZAvtxo4=.35641a45-6e4d-4b8a-865f-76784e7cc173@github.com> Message-ID: On Mon, 15 May 2023 10:04:22 GMT, Chang Peng wrote: > > This looks like it might be removed by loop opts. I think you might need a blackhole somewhere. > > `m` will be updated in every iteration of this loop, so `m` is not a loop-invariants actually. I can see the assembly code of this loop by using JMH perfasm. Isn't it? Looks to me like all it does is flip `m` each time. Whether or not this code is optimized today isn't relevant. So it's the same as for (int i = 0; i < LENGTH/2; i++) { res += m.trueCount(); } m = m.not(); for (int i = 0; i < LENGTH/2; i++) { res += m.trueCount(); } ... which is trivially optimizable, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1193674595 From epeter at openjdk.org Mon May 15 11:05:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 11:05:06 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legen... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: whitespace fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/0a72f4c4..9291fb31 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From xlinzheng at openjdk.org Mon May 15 11:05:59 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Mon, 15 May 2023 11:05:59 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand Message-ID: The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. Passed fastdebug/release build on both AArch64/RISC-V platforms. Thanks, Xiaolin ------------- Commit messages: - Remove unused iRegIHeapbase() matching operand Changes: https://git.openjdk.org/jdk/pull/13983/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13983&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308091 Stats: 21 lines in 2 files changed: 0 ins; 21 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13983.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13983/head:pull/13983 PR: https://git.openjdk.org/jdk/pull/13983 From thartmann at openjdk.org Mon May 15 11:09:56 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 11:09:56 GMT Subject: RFR: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet [v5] In-Reply-To: References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Mon, 15 May 2023 06:41:01 GMT, Tobias Hartmann wrote: >> [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 >> >> while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 >> >> As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. >> >> Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: >> >> https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 >> >> A loaded type can therefore be replaced by an unloaded type during GVN. >> >> In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). >> >> Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. >> >> The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. >> >> In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8303512 > - Removed interface_handling argument > - Refactoring > - Eager computation to avoid racy update of remaining fields > - Re-ordering of _computed fields initialization > - Reverted unrelated change > - 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet False alarm. I'm integrating this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13868#issuecomment-1547648664 From thartmann at openjdk.org Mon May 15 11:09:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 11:09:58 GMT Subject: Integrated: 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet In-Reply-To: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> References: <3XcFoQeThVfg2qDtbAGD4T0KLUGgo1YIm4SYwh6Q4Mk=.93c6ae9b-fbb6-4e9b-982e-f2cdeb0ab7dc@github.com> Message-ID: On Mon, 8 May 2023 14:52:21 GMT, Tobias Hartmann wrote: > [JDK-8297933](https://bugs.openjdk.org/browse/JDK-8297933) introduced a race condition among compiler threads computing `TypePtr::InterfaceSet::_is_loaded` for shared types (for example, for [TypeInstPtr::NOTNULL](https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L545)). One thread can set `_is_loaded_computed` before setting `_is_loaded`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3473-L3482 > > while another thread can already access `is_loaded()` and wrongly observe `_is_loaded = false`: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3464-L3468 > > As a result, the klass of a `TypePtr` can be loaded while the interfaces it implements appear to be not loaded. > > Another problem is that `TypePtr::InterfaceSet::eq` does not take the `_is_loaded` / `_is_loaded_computed` fields into account: > > https://github.com/openjdk/jdk/blob/5726d31e56530bbe7dee61ae04b126e20cb3611d/src/hotspot/share/opto/type.cpp#L3290-L3301 > > A loaded type can therefore be replaced by an unloaded type during GVN. > > In the case of the failure reported by this bug, `LibraryCallKit::inline_native_hashcode` first null checks the receiver and updates the type. Due to the issues described above, the null-free type is GVN'ed with it's unloaded counterpart and propagated to another, redundant null check emitted by `LibraryCallKit::generate_method_call` (I filed [JDK-8307625](https://bugs.openjdk.org/browse/JDK-8307625) to remove it). Since the type is now unloaded, an uncommon trap is emitted and parsing is stopped(). We crash when trying to de-reference `GraphKit::_map->_jvms` which is NULL (in debug we would hit an assert). > > Another failure mode is reported by [JDK-8305339](https://bugs.openjdk.org/browse/JDK-8305339). Both failures are extremely intermittent and I was never able to reproduce them. > > The fix I propose is to completely remove the `_is_loaded` logic because if a klass is loaded, the interface that it implements should be loaded as well. I added an assert to verify this. Higher tier testing is still running. > > In addition, I noticed some `#ifdef DEBUG` checks where `DEBUG` is never defined. I replaced them by `#ifdef ASSERT` and fixed the verification code that failed to build. > > Thanks, > Tobias This pull request has now been integrated. Changeset: ad348a8c Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ad348a8cec50561d3e295b6289772530f541c6b1 Stats: 117 lines in 3 files changed: 38 ins; 46 del; 33 mod 8303512: Race condition when computing is_loaded property of TypePtr::InterfaceSet Reviewed-by: roland, qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13868 From epeter at openjdk.org Mon May 15 11:14:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 11:14:49 GMT Subject: RFR: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java [v2] In-Reply-To: References: Message-ID: > **The Problem** > > During CCP, we get to a state like that: > > x (int:1) Phi (int:4) > | | > | +-----+ > | | > LShiftI (int:16) > | > CastII (top) ConI (int:3) > | | > +----+ +---------+ > | | > AndI > > > We call `AddINode::Value` during CCP, and in `MulNode::AndIL_shift_and_mask_is_always_zero` we `uncast` both inputs, which leaves us with `LShiftI` and `ConI` as the "true" inputs. They both have non-top types, and so we determine that this `AndI-LShiftI` combination always leads to `zero`: The `Phi` has a constant type (`int:4`). So this leaves the lowest 4 bits zero after the `LShiftI`. Then and-ing that with `int:3` means we extract the lowest 3 bits that are zero. So the result is provably always zero - that is the idea. > > Then, we have some type updates (here of `x` and `Phi` and `LShiftI`), and the graph looks like this: > > x (int) Phi (int:0..4) > | | > | +-----+ > | | > LShiftI (int) > | > CastII (top) ConI (int:3) > | | > +----+ +---------+ > | | > AndI > > > This leads to `shift2` failing to have constant type: > https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L1964-L1967 > > And with that, we fall back to `MulNode::Value`: > https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L559-L566 > > In `MulNode::Value` we detect that the `CastII` has type `top`, and return `top` for `AndI`. > > CCP expects the types to become more wide over time, so going from `int:0` to `top` is the wrong direction. > > **Solution** > > The problem is with the relatively rare `CastII` still being `top` - this seems to be very rare. But the new regression test `TestShiftCastAndNotification.java` seems to create exactly that case, in combination with `-XX:StressCCP`. > > We should guard against `top` in one of the `AndI` inputs inside `MulNode::AndIL_shift_and_mask_is_always_zero`. This will prevent it from detecting the zero-case, untill `MulNode::Value` would get a chance to compute a non-top type. > > **Argument for Solution** > > Is there still a threat from `MulNode::AndIL_shift_and_mask_is_always_zero` computing a zero first, and `MulNode::Value` a type that does not include zero after ward? > As types only widen during CCP, having a zero first means that all inputs now are non-top - in fact they are all `T_INT`. Since types only widen in the input... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor with @chhagedorn's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13908/files - new: https://git.openjdk.org/jdk/pull/13908/files/cf17d4df..4963faea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13908&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13908&range=00-01 Stats: 8 lines in 1 file changed: 3 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13908.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13908/head:pull/13908 PR: https://git.openjdk.org/jdk/pull/13908 From chagedorn at openjdk.org Mon May 15 11:20:49 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 May 2023 11:20:49 GMT Subject: RFR: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java [v2] In-Reply-To: References: Message-ID: <-gNauO0wYVhnu4p1bvZVjqMkXb3XppyehYD2qfdO5Gw=.c053171f-6c86-487d-973b-c24fe0ce4607@github.com> On Mon, 15 May 2023 11:14:49 GMT, Emanuel Peter wrote: >> **The Problem** >> >> During CCP, we get to a state like that: >> >> x (int:1) Phi (int:4) >> | | >> | +-----+ >> | | >> LShiftI (int:16) >> | >> CastII (top) ConI (int:3) >> | | >> +----+ +---------+ >> | | >> AndI >> >> >> We call `AddINode::Value` during CCP, and in `MulNode::AndIL_shift_and_mask_is_always_zero` we `uncast` both inputs, which leaves us with `LShiftI` and `ConI` as the "true" inputs. They both have non-top types, and so we determine that this `AndI-LShiftI` combination always leads to `zero`: The `Phi` has a constant type (`int:4`). So this leaves the lowest 4 bits zero after the `LShiftI`. Then and-ing that with `int:3` means we extract the lowest 3 bits that are zero. So the result is provably always zero - that is the idea. >> >> Then, we have some type updates (here of `x` and `Phi` and `LShiftI`), and the graph looks like this: >> >> x (int) Phi (int:0..4) >> | | >> | +-----+ >> | | >> LShiftI (int) >> | >> CastII (top) ConI (int:3) >> | | >> +----+ +---------+ >> | | >> AndI >> >> >> This leads to `shift2` failing to have constant type: >> https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L1964-L1967 >> >> And with that, we fall back to `MulNode::Value`: >> https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L559-L566 >> >> In `MulNode::Value` we detect that the `CastII` has type `top`, and return `top` for `AndI`. >> >> CCP expects the types to become more wide over time, so going from `int:0` to `top` is the wrong direction. >> >> **Solution** >> >> The problem is with the relatively rare `CastII` still being `top` - this seems to be very rare. But the new regression test `TestShiftCastAndNotification.java` seems to create exactly that case, in combination with `-XX:StressCCP`. >> >> We should guard against `top` in one of the `AndI` inputs inside `MulNode::AndIL_shift_and_mask_is_always_zero`. This will prevent it from detecting the zero-case, untill `MulNode::Value` would get a chance to compute a non-top type. >> >> **Argument for Solution** >> >> Is there still a threat from `MulNode::AndIL_shift_and_mask_is_always_zero` computing a zero first, and `MulNode::Value` a type that does not include zero after ward? >> As types only widen during CCP, having a zero first means that all inputs now are non-top - in fact th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > refactor with @chhagedorn's suggestions That looks good to me! As we've discussed offline, I'm also afraid, that there are more such cases where we do not handle `top` correctly during CCP. Might be worth to further investigate at some point. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13908#pullrequestreview-1426328958 From stuefe at openjdk.org Mon May 15 11:51:51 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 15 May 2023 11:51:51 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port In-Reply-To: References: Message-ID: On Mon, 15 May 2023 08:25:19 GMT, Amit Kumar wrote: > This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. > > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 107: > 105: Unimplemented(); > 106: } else if (LockingMode == LM_LEGACY) { > 107: NearLabel done; I don't understand: should "consistent handling of UseHeavyMonitors" not lead to LM_MONITOR be handled here? E.g. by directly jumping into the slow case, without bothering with stack locking? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3194: > 3192: // Set NE to indicate 'failure' -> take slow-path > 3193: z_ltgr(oop, oop); > 3194: } Dont you need a break to done here? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3270: > 3268: // Set NE to indicate 'failure' -> take slow-path > 3269: z_ltgr(oop, oop); > 3270: } break to done? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193707581 PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193718060 PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193720538 From mdoerr at openjdk.org Mon May 15 12:56:48 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 12:56:48 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port In-Reply-To: References: Message-ID: On Mon, 15 May 2023 08:25:19 GMT, Amit Kumar wrote: > This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. > > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. Please check my findings. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3193: > 3191: } else { > 3192: // Set NE to indicate 'failure' -> take slow-path > 3193: z_ltgr(oop, oop); I think `z_bru(done);` is missing, here. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3269: > 3267: } else { > 3268: // Set NE to indicate 'failure' -> take slow-path > 3269: z_ltgr(oop, oop); Same, here. ------------- PR Review: https://git.openjdk.org/jdk/pull/13978#pullrequestreview-1426489612 PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193791851 PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193792804 From mdoerr at openjdk.org Mon May 15 12:56:52 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 12:56:52 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:43:34 GMT, Thomas Stuefe wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3194: > >> 3192: // Set NE to indicate 'failure' -> take slow-path >> 3193: z_ltgr(oop, oop); >> 3194: } > > Dont you need a break to done here? Ah, I found this, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193795269 From mdoerr at openjdk.org Mon May 15 13:01:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 13:01:47 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:33:08 GMT, Thomas Stuefe wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 107: > >> 105: Unimplemented(); >> 106: } else if (LockingMode == LM_LEGACY) { >> 107: NearLabel done; > > I don't understand: should "consistent handling of UseHeavyMonitors" not lead to LM_MONITOR be handled here? E.g. by directly jumping into the slow case, without bothering with stack locking? I was confused about this, too. But, `LM_MONITOR` is already checked in `LIR_Assembler::emit_lock` and `LIR_Assembler::emit_unwind_handler()`, so we don't reach here. x86 and aarch64 have a similar implementation. I think we could at least assert that LockingMode != LM_MONITOR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1193801284 From erikj at openjdk.org Mon May 15 13:20:47 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 15 May 2023 13:20:47 GMT Subject: RFR: 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete In-Reply-To: References: Message-ID: On Mon, 15 May 2023 08:38:54 GMT, Adam Sotona wrote: > Package `jdk.internal.classfile.java.lang.constant` containing `ModuleDesc` and `PackageDesc` become obsolete after [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729). > All references to `jdk.internal.classfile.java.lang.constant.ModuleDesc` and `jdk.internal.classfile.java.lang.constant.PackageDesc` across all JDK sources, tests and JMH benchmarks are replaced with `java.lang.constant.ModuleDesc` and `java.lang.constant.PackageDesc`. > `jdk.internal.classfile.java.lang.constant` package export hooks are removed from java.base module-info, make files and test headers. > Content of `jdk.internal.classfile.java.lang.constant` package and related tests under `jdk.classfile` are deleted. > Method references renamed in [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729) are fixed: > - `PackageDesc::packageName` to `PackageDesc::name` > - `PackageDesc::packageInternalName` to `PackageDesc::internalName` > - `ModuleDesc::moduleName` to `ModuleDesc::name`. > > Please review this pull request. > > Thanks, > Adam Build changes look ok. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13979#pullrequestreview-1426543240 From fjiang at openjdk.org Mon May 15 13:31:52 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 15 May 2023 13:31:52 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v3] In-Reply-To: <9-UTXovi11YVdXBKzXYBeta47ztEfkm6NpnOxeZOnlg=.55d971fc-d089-4b4f-9fa9-1f0f7b61daa4@github.com> References: <9-UTXovi11YVdXBKzXYBeta47ztEfkm6NpnOxeZOnlg=.55d971fc-d089-4b4f-9fa9-1f0f7b61daa4@github.com> Message-ID: On Mon, 15 May 2023 07:42:08 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Remove debug warning Overall looks good, with some comments. src/hotspot/cpu/riscv/riscv_v.ad line 2895: > 2893: // vector sqrt - predicated > 2894: > 2895: instruct vsqrt_masked(vReg dst_src, vRegMask_V0 v0) %{ Maybe it's better to split predicated version into `vsqrtF_masked` and `vsqrtD_masked` just like vector sqrt did. src/hotspot/cpu/riscv/riscv_v.ad line 3364: > 3362: %} > 3363: > 3364: instruct vmaskAllI_masked(vRegMask dst, iRegI src, vRegMask_V0 v0, vReg tmp) %{ `iRegI` -> `iRegIorL2I` src/hotspot/cpu/riscv/riscv_v.ad line 4049: > 4047: // ------------------------------ Populate Index to a Vector ------------------- > 4048: > 4049: instruct populateindex(vReg dst, iRegI src1, iRegI src2, vReg tmp1) %{ `iRegI` -> `iRegIorL2I` src/hotspot/cpu/riscv/riscv_v.ad line 4068: > 4066: // BYTE, SHORT, INT > 4067: > 4068: instruct insertI_index_lt32(vReg dst, vReg src, iRegI val, immI idx, vRegMask_V0 v0) %{ `iRegI` -> `iRegIorL2I` src/hotspot/cpu/riscv/riscv_v.ad line 4087: > 4085: %} > 4086: > 4087: instruct insertI_index(vReg dst, vReg src, iRegI val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ `iRegI` -> `iRegIorL2I` src/hotspot/cpu/riscv/riscv_v.ad line 4125: > 4123: %} > 4124: > 4125: instruct insertL_index(vReg dst, vReg src, iRegL val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ Can the reg type of `idx` be `iRegIorL2I`? src/hotspot/cpu/riscv/riscv_v.ad line 4160: > 4158: %} > 4159: > 4160: instruct insertF_index(vReg dst, vReg src, fRegF val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ Same here for reg type of `idx`. src/hotspot/cpu/riscv/riscv_v.ad line 4194: > 4192: %} > 4193: > 4194: instruct insertD_index(vReg dst, vReg src, fRegD val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ Same here for reg type of `idx`. ------------- PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1426532344 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193818885 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193822075 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193830583 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193830974 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193831496 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193834535 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193835527 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1193837758 From thartmann at openjdk.org Mon May 15 13:41:56 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 13:41:56 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> References: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> Message-ID: On Thu, 11 May 2023 08:05:49 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review I see the following failure with `TestMissingMulLOptimization` from JDK-8299546 and `-XX:StressLongCountedLoop=2000000`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (c:\sb\prod\1684151031\workspace\open\src\hotspot\share\opto\loopnode.cpp:4157), pid=5368, tid=836 # Error: assert(loop == nullptr) failed Current CompileTask: C2: 267 15 b 4 compiler.ccp.TestMissingMulLOptimization::test (101 bytes) Stack: [0x0000002f4f600000,0x0000002f4f700000] Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [jvm.dll+0xc53091] os::win32::platform_print_native_stack+0xf1 (os_windows_x86.cpp:236) V [jvm.dll+0xee2a99] VMError::report+0x1019 (vmError.cpp:815) V [jvm.dll+0xee4775] VMError::report_and_die+0x645 (vmError.cpp:1596) V [jvm.dll+0xee4e84] VMError::report_and_die+0x64 (vmError.cpp:1361) V [jvm.dll+0x55053b] report_vm_error+0x5b (debug.cpp:191) V [jvm.dll+0xadcab2] PhaseIdealLoop::eliminate_useless_zero_trip_guard+0x2f2 (loopnode.cpp:4157) V [jvm.dll+0xad0fb1] PhaseIdealLoop::build_and_optimize+0x971 (loopnode.cpp:4455) V [jvm.dll+0x4ebc51] Compile::optimize_loops+0x1d1 (compile.cpp:2155) V [jvm.dll+0x4de2e8] Compile::Optimize+0xef8 (compile.cpp:2391) V [jvm.dll+0x4db378] Compile::Compile+0x1458 (compile.cpp:840) V [jvm.dll+0x3f05ba] C2Compiler::compile_method+0x11a (c2compiler.cpp:121) V [jvm.dll+0x4f6a81] CompileBroker::invoke_compiler_on_method+0x881 (compileBroker.cpp:2268) V [jvm.dll+0x4f3ea6] CompileBroker::compiler_thread_loop+0x396 (compileBroker.cpp:1945) V [jvm.dll+0x7f2ff9] JavaThread::thread_main_inner+0x279 (javaThread.cpp:720) V [jvm.dll+0xe5434d] Thread::call_run+0x1cd (thread.cpp:222) V [jvm.dll+0xc519c2] os::win32::thread_native_entry+0xa2 (os_windows.cpp:551) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1547878541 From epeter at openjdk.org Mon May 15 14:01:50 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 May 2023 14:01:50 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: Message-ID: <5nzcFX-4yj9dBlIJTeraPAyEPKaSvEtoiWU1f2GJhSE=.9a11059f-29d1-4309-9185-916c2e52ad35@github.com> On Thu, 11 May 2023 08:45:45 GMT, Emanuel Peter wrote: >> **Bug** >> In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). >> >> The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) >> On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 >> >> The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: >> https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 >> >> The wrong results with `NaN` are because of a bug in `x`: >> https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 >> The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). >> >> **Solution** >> @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. >> >> This has a few benefits: >> - `VectorMaskCmp + VectorBlend` is more powerful: >> - `CMoveVF/D` required the same inputs to the compare than to the move itself. >> - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. >> - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). >> - We need less code (I completely removed all code for `CMoveVF/D`). >> >> I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. >> >> As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment on request of @fg1417 @jatin-bhateja or @sviswa7 would you mind reviewing this too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1547909563 From sviswanathan at openjdk.org Mon May 15 16:36:59 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 15 May 2023 16:36:59 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:05:06 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > whitespace fix Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1426942951 From xuelei at openjdk.org Mon May 15 20:11:46 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Mon, 15 May 2023 20:11:46 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils Message-ID: Hi, This is a redo of JDK-8307855, where issues were found after integration. The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. Thanks, Xuelei ------------- Commit messages: - 8308071: [REDO] update for deprecated sprintf for src/utils Changes: https://git.openjdk.org/jdk/pull/13995/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308071 Stats: 22 lines in 1 file changed: 17 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13995/head:pull/13995 PR: https://git.openjdk.org/jdk/pull/13995 From mikael at openjdk.org Mon May 15 21:50:46 2023 From: mikael at openjdk.org (Mikael Vidstedt) Date: Mon, 15 May 2023 21:50:46 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils In-Reply-To: References: Message-ID: On Mon, 15 May 2023 18:46:00 GMT, Xue-Lei Andrew Fan wrote: > Hi, > > This is a redo of JDK-8307855, where issues were found after integration. > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei src/utils/hsdis/binutils/hsdis-binutils.c line 248: > 246: size_t used_size = strlen(close); > 247: char* p = buf + used_size; > 248: bufsize -= used_size; May not happen in practice, but if `used_size` is larger than `bufsize` this will wrap to a very large value. Perhaps the `strcpy` above should also be an `snprintf`, and the return value handled the same way as for the subsequent `snprintf` calls? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1194394358 From fyang at openjdk.org Tue May 16 01:42:44 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 16 May 2023 01:42:44 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand In-Reply-To: References: Message-ID: On Mon, 15 May 2023 10:56:26 GMT, Xiaolin Zheng wrote: > The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. > > Passed fastdebug/release build on both AArch64/RISC-V platforms. > > Thanks, > Xiaolin Maybe we should remove `reg_class heapbase_reg` at the same time? ------------- PR Review: https://git.openjdk.org/jdk/pull/13983#pullrequestreview-1427572557 From amitkumar at openjdk.org Tue May 16 02:49:34 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 02:49:34 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: > This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. > > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from @TheRealMDoerr and @tstuefe ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13978/files - new: https://git.openjdk.org/jdk/pull/13978/files/f5af1367..c06afdb9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13978&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13978&range=00-01 Stats: 4 lines in 2 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13978.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13978/head:pull/13978 PR: https://git.openjdk.org/jdk/pull/13978 From amitkumar at openjdk.org Tue May 16 02:52:48 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 02:52:48 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: <4RV3EudnGWJMhDRZBZNfcJvMxSh-MOjE_p-zL3VUibo=.06d229f7-cf39-4be4-9206-694febbb3937@github.com> On Mon, 15 May 2023 11:46:04 GMT, Thomas Stuefe wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @TheRealMDoerr and @tstuefe > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3270: > >> 3268: // Set NE to indicate 'failure' -> take slow-path >> 3269: z_ltgr(oop, oop); >> 3270: } > > break to done? Thomas, Please check the update version, I've fixed this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1194544645 From amitkumar at openjdk.org Tue May 16 02:52:46 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 02:52:46 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 12:51:13 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @TheRealMDoerr and @tstuefe > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3193: > >> 3191: } else { >> 3192: // Set NE to indicate 'failure' -> take slow-path >> 3193: z_ltgr(oop, oop); > > I think `z_bru(done);` is missing, here. Please check updated version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13978#discussion_r1194544297 From amitkumar at openjdk.org Tue May 16 04:28:01 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 04:28:01 GMT Subject: RFR: 8308143: [ppc] remove constant and add wrap-up for successor Message-ID: This is cosmetic change, which adds simple range check logic and changes -1 to NOREG_ENCODING. I liked the changes for s390x so aligning PPC with the same. ------------- Commit messages: - changes -1 to NOREG_ENCODING and wraps successor() Changes: https://git.openjdk.org/jdk/pull/13997/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13997&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308143 Stats: 18 lines in 1 file changed: 4 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13997/head:pull/13997 PR: https://git.openjdk.org/jdk/pull/13997 From amitkumar at openjdk.org Tue May 16 04:55:44 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 04:55:44 GMT Subject: RFR: 8308143: [ppc] remove constant and add wrap-up for successor [v2] In-Reply-To: References: Message-ID: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> > This is cosmetic change, which adds simple range check logic and changes -1 to NOREG_ENCODING. I liked the changes for s390x so aligning PPC with the same. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: fix incorrect brackets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13997/files - new: https://git.openjdk.org/jdk/pull/13997/files/582d8a24..0c8c36fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13997&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13997&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13997.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13997/head:pull/13997 PR: https://git.openjdk.org/jdk/pull/13997 From epeter at openjdk.org Tue May 16 04:56:32 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 May 2023 04:56:32 GMT Subject: RFR: 8260943: C2 SuperWord: Revisit vectorization optimization added by 8076284 Message-ID: I suggest we remove this dead `_do_vector_loop_experimental` code. @vnkozlov disabled it 2.5 years ago [JDK-8251994](https://bugs.openjdk.org/browse/JDK-8251994) https://github.com/openjdk/jdk/commit/a7fa1b70f212566e95068936841b6e9702eccaed. His [analysis](https://bugs.openjdk.org/browse/JDK-8251994?focusedCommentId=14364507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14364507). His conclusion back then: Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. Even if pack_parallel() method is able created packs they are all removed by filter_packs() method. And additionally the above cases are vectorized without hoisting loads and pack_parallel - I verified it. That code is useless now and I will put it under flag to not run it. It needs more work to be useful. I reluctant to remove the code because may be in a future we will have time to invest into it. He disabled it by renaming many occurances of `_do_vector_loop` with `_do_vector_loop_experimental = false`. I don't believe anybody wants to fix this code any time soon. Current `SuperWord` can do almost everything that this code promises. If we really want to have parallel iterations for the Stream API, then we should do this in the dependency graph directly, by removing the inter-iteration edges. If you care, you can read my arguments below. I am also using this opportunity to think back: what were the motivations for this code. And I am thinking forward: what could we do to improve our `SuperWord` algorithm? **Testing** Up to tier5 and stress testing, with and without `-XX:CompileCommand=option,path.to.Class::method,Vectorize`. **Running...** ----------- **Background** "Seeding" is crucial: The SPL algorithm (Super Word Parallelism) relies on good detection of parallel instruction that can be packed. This is usually done with "seeding": one finds loads or stores that can be packed - preferrably they are adjacent so that we can use a vectorized load or store (alternatively gather and scatter can be used for strided or random accesses). After this "seeding", the vectorization is extended to non-seed operations (usually greedily). In `C2`'s `SuperWord` algorithm, we have two approaches for this "seeding": 1. Normally, we simply try to find adjacent loads and stores for the same `base` (array). Second, we require load/store packs to be aligned to each other in the same memory slice (this seems unnecessary and I plan to lift that restriction with [JDK-8303113](https://bugs.openjdk.org/browse/JDK-8303113)). 2. With `-XX:CompileCommand=option,path.to.Class::method,Vectorize` we additionally require that the packed nodes are unroll-clones of the same original node, when the main-loop was still a single-iteration loop. There is no alignment constraint for the packs. Let us for now ignore the alignment restrictions, as we may soon remove those anyway. Let us focus on the "seeding": should we try to only pack operations that are unroll-clones of the same node? Example 1 (Yes, only pack unroll-clones of same single-iteration node): // single-iteration loop: for (int i = 0; i < N; i++) { b[i] = a[i] + a[i+1]; } // unrolled to: for (int i = 0; i < N; i+=4) { b[i+0] = a[i+0] + a[i+1]; // i+0 b[i+1] = a[i+1] + a[i+2]; // i+1 b[i+2] = a[i+2] + a[i+3]; // i+2 b[i+3] = a[i+3] + a[i+4]; // i+3 } Here, this helps us to separate out the left and right arguments to the addition. Because we for example have two `a[i+1]` loads. If we pack `a[i+0]` with the wrong one of those, then we will end up rejecting the pack-set, since the packing is not feasible (and certainly very suboptimal). Example 1 (No, just try to find adjacent loads and stores directly): // single-iteration loop: for (int i = 0; i < N; i++) { b[i] = a[i] * 11; b[i+1] = a[i+1] * 11; // hand-unrolled, or just inherent parallelism inside the loop } Here, `b[i]` and `b[i+1]` and their unroll-clones will be cloned from different original nodes. That would prevent us from packing them. We should instead just directly pack adjacent memops, no matter where they come from. Currently, only one or the other strategy is used, and determined by the `_do_vector_loop` (Option Vectorize) flag. It would be nice to try both approaches, and pick the better of the two. To make this happen, we need the following changes: 1. Refactor to enable mutliple pack-sets. 2. Introduce a cost-model. Either evaluate the cost of the scalar loop, and the cost of the vectorized loop (for each pack-set), or just compute the "speedup" of a pack-set versus the scalar loop. This allows us to decide if and which pack-set to use for the vectorization. Side-note: we will probably already require such a cost-model for this task: [JDK-8307516](https://bugs.openjdk.org/browse/JDK-8307516) C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction ----------- **What was this dead code supposed to do?** [JDK-8076284](https://bugs.openjdk.org/browse/JDK-8076284) "Improve vectorization of parallel streams" The goal was to make parallel streams faster, by vectorizing cases that at the time were not vectorized otherwise. (see [review mails](https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017631.html)) These were the use-cases presented as the motivation for that RFE. They are both "load-forward" cases: Parallel stream example: Stream.forEach( id -> c[id] = c[id] + c[id+1] ); Regular loop example: void computeCall(double [] call, double puByDf, double pdByDf) { for(int i = timeStep; i > 0; i--) { for(int j = 0; j <= i - 1; j++) { call[j] = puByDf * call[j + 1] + pdByDf * call[j]; } } } The flag `-XX:CompileCommand=option,path.to.Class::method,Vectorize` was introduced, and always enabled it for the `Stream` API. Both of these cases are what I call "load-forward": they read from array positions that later iterations will store to. Note: in a "load-forward" case it is safe to first execute all loads, and then execute all stores afterwards. Therefore, this is a case that we should be able to vectorize safely, no matter if we have a `parallel stream` (see discussion below) or just a regular loop. Note: before [JDK-8076284](https://bugs.openjdk.org/browse/JDK-8076284), the mentioned examples would not vectorize for two reasons: 1. The left and right operands can get confused, and be packed the wrong way. When we are looking for an adjacent memop for `c[id]`, we have two `c[id+1]` loads to pick from, and only one will lead to vectorization. 2. The memops are not aligned (offset by 1: `c[id] + c[id+1]`) - this is a limitation that we can lift separately in [JDK-8303113](https://bugs.openjdk.org/browse/JDK-8303113). However, there were a few issues with this RFE, especially with `pack_parallel`. @vnkozlov decided to disable the code here in question: It is about half of the code that the RFE had added. And it turns out that that was not needed to vectorize examples that were presented as motivation for the RFE (see above). My best explanation why it is not needed: "seeding" is crucial. It is actually only important to get the "seeding" right to vectorize - all other operations are vectorized because of `extend_packlist`. So if we get `find_adjacent_refs` right, there is no need for `pack_parallel`. The second half of the RFE is still required. During unrolling, we maintain a `CloneMap`. It remembers what was the original node in the single-iteration loop that a node was unroll-cloned from. It also remembers from which unroll-iteration a node is (though this information seems only required by the dead half of that RFE). This information is then used inside `find_adjacent_refs`: https://github.com/openjdk/jdk/blob/ad0e5a99ca1ad9dd04105f502985735a3536c3f4/src/hotspot/share/opto/superword.cpp#L763-L764 https://github.com/openjdk/jdk/blob/ad0e5a99ca1ad9dd04105f502985735a3536c3f4/src/hotspot/share/opto/superword.cpp#L789 If `_do_vector_loop` is on, we only pack memops from the same original node. As explained in the "Background" section above, this can either be helpful or unhelpful, depending on the loop. Recently, I also fixed a bug that touched code with `_do_vector_loop` (https://github.com/openjdk/jdk/pull/12350). This fix brought up a few things: - We can vectorize "load-forward" (currently enabled with `-XX:CompileCommand=option,path.to.Class::method,Vectorize`). - We should not vectorize "store-forward", as it leads to cyclic dependencies between the "forward-store" and the load that loads that value later - we should not break that dependency. - The `Streams` come in two varieties: `sequential` streams must be executed iteration after iteration, like a normal for loop. `parallel` streams allow for parallel execution of the iterations - the user must ensure that there are no dependencies between the iterations, as they could in principle be executed completely out of order. But the way we currently set `_do_vector_loop` does not distinguish between them, as both cases go down to `vmIntrinsics::_forEachRemaining`. Since we cannot separate out the `parallel` case, we should not allow breaking dependencies in `SuperWord`. **How did this dead code work?** Before I delete it, I want to understand the idea behind it, what it was supposed to be able to do, and why it did not quite deliver. First, we ran the normal `SuperWord::SLP_extract` code, including these steps: construct_bb(); dependence_graph(); This basically builds the memops dependency graph. At this point, the dead "experimental" code did this: mark_generations(); // verify some conditions hoist_loads_in_graph(); // reconstruct dependency graph This code relies on the information of the `CloneMap`. This map is constructed during loop unrolling, and remembers to which "generation/iteration" a node belongs to. If we have a 4x unrolled loop, we expect to have 4 such generations, all with the same nodes. So each node can be assigned to one of the generations 1, 2, 3 or 4. Let's look a bit deeper into `hoist_loads_in_graph`: for each memory-slice, it tries to "hoist" all loads that are not from the first iteration (iteration 2+). Those loads probably have a memory state that depends on a store from its previous iteration. Some conditions are checked (I have to guess a bit, but I think it basically wants to check that in an iteration we first do all loads, and only then a store - probably an ok limitation for most cases), and if they pass, the load is moved up to the phi of the memory slice. Hence, they do not depend on any of the stores in the memory slice. One thing that is not among the conditions: checking if any of the stores that used to be before the load reference to the same array position. This violates the store-load dependency. Note: the assumption was that this is ok, because we can execute the iterations in parallel, so we do not have to preserve this store-load dependency. This "hoisting" ruins the dependency-graph, so we rebuild it. We continue with the regular code: compute_vector_element_type(); find_adjacent_refs(); extend_packlist(); This code basically tries to extract the packs. Normally, `find_adjacent_refs` would fail, if the memory accesses do not align, so "store-forward" or "load-forward" cases would be rejected (for exceptions see https://github.com/openjdk/jdk/pull/12350). But with the `_do_vector_loop` flag on, we simply ignore the aliasing checks, and pack anyway. The only thing that this still requires is that neighbors in the packs are `independent`, that seems to be baked into the logic. Hence, this case would not vectorize in any case: for (int i = 0; i < N; i++) { a[i+1] = a[i]; // iteration i+1 loads from the store of iteration i -> neighbors not independent. } If at this point there are no packs in the packset, we would fail to vectorize. At this point, the dead "experimental" code would call `pack_parallel`, to attempt a packing in a different way. It iterates over all nodes in the first "generation/iteration", and finds the nodes of the other iterations, and packs them. No independence-checks what so ever. There are probably a few other checks that are also not done (eg. guarding against strided access, where the loads would not actually be adjacent). Further, it has some arbitrary limitations: it only packs `Load`, `Store`, `Add`, and `Mul` nodes, and curiously only packs with at most 4 elements. **What could the dead code do, if we put a lot of effort into it to fix it?** The original idea was to speed up `parallel` streams - or any method where the user can guarantee that the loops have independent iterations. But most of the desired cases, we could vectorize with the regular `SuperWord` code already. These are cases that we do not vectorize: for (int i = 0; i < N; i++) { a[i+1] = a[i]; // cyclic dependencies } for (int i = 0; i < N; i++) { a[i+inv] = a[i]; // loop invariant inv } In this example, the store has to happen before any load of a subsequent iteration - we do not know to what position the store went to, and if it may be aliasing into the position of a later load. As far as I can see, these are all cases where the user really has to know that he wants to circumvent the aliasing/independence checks for the iterations. And if vectorization is that important for such edge-cases, one may may also use the `Vector API`. **Could we still implement parallel iterations without the dead code?** We would want to respect the dependencies within an iteration, and only ignore the dependencies between iterations. As long as we keep the `CloneMap`, we have this information. Though the question is if this information is 100% reliable. Using it as a "seeding" suggestion, and then validating all dependencies is one thing. But taking the information to drop away dependencies is a bit more dangerous. **Is having a CloneMap really a good idea?** In other compilers (LLVN, gcc, etc) vectorization is usually done with two approaches: - loop-vectorizers: they take a non-unrolled scalar loop, and widen the scalar operations to vector operations. - `SuperWord`: can be applied to any basic block (not just loops), and extracts parallel computations. It can also be applied to unrolled loops (which is what we do). This two-step approach has the following advantages: - The non-unrolled loop is simpler to analyze, it is less fragile as it does not have to find parallel instructions. - The non-unrolled loop is much smaller. We currently have a unroll-limit of `50` or `60` nodes. This means that only very small loop bodies will ever be unrolled and vectorized fully. Large loops will not be unrolled enough, and that means we do not fill the vectors fully, but maybe only half or a quarter (example: loop with byte `a[i] = 11 * b[i]` on a 512bit machine would require unroll-factor 64 to fully vectorize - the node limit is hit very quickly). - CFG vectorization (if-conversion) would be simpler to implement, if we decide to do that at some point. I am not sure why we decided to pick this over a loop-vectorizer, to be honest. Maybe it is because during the unrolling and other loop-opts we remove some things from the loop that would prevent vectorization? But what would that be? Are range-checks and null-checks not already removed before unrolling? Back to `CloneMap`: it basically implements a loop-vectorizer, but with the sad side-effect that we first need to speculatively unroll the loop and hope that it will vectorize enough. Or we just do not unroll enough to vectorize as much as we could. What a `SuperWord` loop vectorization can do, that most likely is not done by a mere loop-vectorizer: for (int i = 0; i < N; i+=2) { a[i+0] = b[i+0] + 1; a[i+1] = b[i+1] + 1; } A hand-unrolled loop! But anyway, the use cases for that are probably very few. This can still be done by a SuperWord (SLP) algorithm. **Conclusion** The dead code is buggy. And we could implement what it does in other ways: we can implement parallel iterations by using `CloneMap` to drop away memop-edges in the dependency graph which go between different unroll-iterations, and keep the ones inside a unroll-iteration. **Future Work** 1. [JDK-8303113](https://bugs.openjdk.org/browse/JDK-8303113) we should try to unify the `_do_vector_loop` code such that `C2` tries the "seeding" with and without the info from the `CloneMap`. That would allow us to remove the flag and get the best from both worlds. 2. For that, we need a cost-model, so that we can compare different vectorization strategies (pack-sets). 3. We could consider implementing parallel iterations properly. But that would require us to distinguish parallel and sequential streams. We can use the info from the `CloneMap` to distinguish between intra and iter-iteration dependency edges. ------------- Commit messages: - Added CloneMap back in, we still need it for CompileCommand Option Vectorize - 8260943: Revisit vectorization optimization added by 8076284 Changes: https://git.openjdk.org/jdk/pull/13930/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13930&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8260943 Stats: 494 lines in 2 files changed: 0 ins; 493 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13930.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13930/head:pull/13930 PR: https://git.openjdk.org/jdk/pull/13930 From amitkumar at openjdk.org Tue May 16 05:13:43 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 05:13:43 GMT Subject: RFR: 8308143: [ppc] remove constant and add wrap-up for successor [v2] In-Reply-To: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> References: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> Message-ID: On Tue, 16 May 2023 04:55:44 GMT, Amit Kumar wrote: >> This is cosmetic change, which adds simple range check logic and changes -1 to NOREG_ENCODING. I liked the changes for s390x so aligning PPC with the same. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > fix incorrect brackets Hi @TheRealMDoerr, please review it. I'm okay to withdraw the changes as well, Please let me know :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13997#issuecomment-1549000466 From epeter at openjdk.org Tue May 16 06:12:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 May 2023 06:12:01 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: On Fri, 12 May 2023 00:44:09 GMT, Sandhya Viswanathan wrote: >> @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. > > @eme64 Very nice and clean work. Thanks a lot for taking this up. @sviswa7 thanks for the review! @jatin-bhateja @fg1417 @vnkozlov can I have at least one more re-review after the last changes, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1549047580 From stuefe at openjdk.org Tue May 16 07:09:45 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 16 May 2023 07:09:45 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 02:49:34 GMT, Amit Kumar wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @TheRealMDoerr and @tstuefe LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13978#pullrequestreview-1427868830 From amitkumar at openjdk.org Tue May 16 07:27:50 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 07:27:50 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 07:07:07 GMT, Thomas Stuefe wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @TheRealMDoerr and @tstuefe > > LGTM Thanks @tstuefe for Review. @RealLucy would you like to review it as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13978#issuecomment-1549139493 From dzhang at openjdk.org Tue May 16 07:35:55 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 16 May 2023 07:35:55 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v3] In-Reply-To: References: <9-UTXovi11YVdXBKzXYBeta47ztEfkm6NpnOxeZOnlg=.55d971fc-d089-4b4f-9fa9-1f0f7b61daa4@github.com> Message-ID: On Mon, 15 May 2023 13:13:00 GMT, Feilong Jiang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove debug warning > > src/hotspot/cpu/riscv/riscv_v.ad line 2895: > >> 2893: // vector sqrt - predicated >> 2894: >> 2895: instruct vsqrt_masked(vReg dst_src, vRegMask_V0 v0) %{ > > Maybe it's better to split predicated version into `vsqrtF_masked` and `vsqrtD_masked` just like vector sqrt did. Thanks for the review! Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 3364: > >> 3362: %} >> 3363: >> 3364: instruct vmaskAllI_masked(vRegMask dst, iRegI src, vRegMask_V0 v0, vReg tmp) %{ > > `iRegI` -> `iRegIorL2I` Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4049: > >> 4047: // ------------------------------ Populate Index to a Vector ------------------- >> 4048: >> 4049: instruct populateindex(vReg dst, iRegI src1, iRegI src2, vReg tmp1) %{ > > `iRegI` -> `iRegIorL2I` Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4068: > >> 4066: // BYTE, SHORT, INT >> 4067: >> 4068: instruct insertI_index_lt32(vReg dst, vReg src, iRegI val, immI idx, vRegMask_V0 v0) %{ > > `iRegI` -> `iRegIorL2I` Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4087: > >> 4085: %} >> 4086: >> 4087: instruct insertI_index(vReg dst, vReg src, iRegI val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ > > `iRegI` -> `iRegIorL2I` Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4125: > >> 4123: %} >> 4124: >> 4125: instruct insertL_index(vReg dst, vReg src, iRegL val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ > > Can the reg type of `idx` be `iRegIorL2I`? Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4160: > >> 4158: %} >> 4159: >> 4160: instruct insertF_index(vReg dst, vReg src, fRegF val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ > > Same here for reg type of `idx`. Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 4194: > >> 4192: %} >> 4193: >> 4194: instruct insertD_index(vReg dst, vReg src, fRegD val, iRegI idx, vReg tmp1, vRegMask_V0 v0) %{ > > Same here for reg type of `idx`. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734501 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734567 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734759 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734819 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734880 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734926 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194734975 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1194735117 From dzhang at openjdk.org Tue May 16 07:35:46 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 16 May 2023 07:35:46 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v4] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Change some iRegI to iRegIorL2I and small refactoring of minmax_fp_masked_v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/c4351ee1..486ebe7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=02-03 Stats: 63 lines in 3 files changed: 14 ins; 12 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From xlinzheng at openjdk.org Tue May 16 08:35:45 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 16 May 2023 08:35:45 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 01:39:52 GMT, Fei Yang wrote: > Maybe we should remove `reg_class heapbase_reg` at the same time? Thanks, that's reasonable. Changes passed AArch64 hotspot tier1~4 (fastdebug). RISC-V hotspot tier1 is running. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13983#issuecomment-1549229759 From xlinzheng at openjdk.org Tue May 16 08:35:45 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Tue, 16 May 2023 08:35:45 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand [v2] In-Reply-To: References: Message-ID: > The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. > > Passed fastdebug/release build on both AArch64/RISC-V platforms. > > Thanks, > Xiaolin Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: Further cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13983/files - new: https://git.openjdk.org/jdk/pull/13983/files/41d36c4e..c26b96c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13983&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13983&range=00-01 Stats: 18 lines in 2 files changed: 0 ins; 10 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13983.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13983/head:pull/13983 PR: https://git.openjdk.org/jdk/pull/13983 From dnsimon at openjdk.org Tue May 16 09:09:03 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 09:09:03 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err Message-ID: When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): JVMCI Events (11 events): ... Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe ------------- Depends on: https://git.openjdk.org/jdk/pull/13905 Commit messages: - send JVMCI exception info to hs-err log and/or tty - remove unused callToString method Changes: https://git.openjdk.org/jdk/pull/14000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308151 Stats: 398 lines in 13 files changed: 335 ins; 31 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From dnsimon at openjdk.org Tue May 16 09:09:05 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 09:09:05 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: References: Message-ID: On Tue, 16 May 2023 08:02:11 GMT, Doug Simon wrote: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe src/hotspot/share/jvmci/jvmciRuntime.cpp line 2047: > 2045: (jlong) compile_state, compile_state->task()->compile_id()); > 2046: #ifdef ASSERT > 2047: if (JVMCIENV->has_pending_exception() && JVMCICompileMethodExceptionIsFatal) { It's a shame to introduce a VM flag (i.e., `JVMCICompileMethodExceptionIsFatal`) for a test case but I don't know of any other way to do this. As far as I know, system properties cannot be accessed here. Maybe using an environment variable is better than a VM flag? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1194855033 From chagedorn at openjdk.org Tue May 16 09:17:48 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 09:17:48 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:12:42 GMT, Emanuel Peter wrote: > Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). > > There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. > > **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: > We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: > https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 > > So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13980#pullrequestreview-1428115385 From chagedorn at openjdk.org Tue May 16 09:19:52 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 09:19:52 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:38:52 GMT, Emanuel Peter wrote: >> This is the second step in the `VerifyLoopOptimizations` revival. >> >> Last step: >> [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure >> See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 >> >> Bug fixing for this step: >> [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate >> (https://github.com/openjdk/jdk/pull/13980) >> >> Next step: >> [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop >> >> I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add TestVerifyLoopOptimizations.java Looks good! I also think it's a good idea to go with a Hello World like test. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13951#pullrequestreview-1428119103 From mdoerr at openjdk.org Tue May 16 09:38:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 16 May 2023 09:38:47 GMT Subject: RFR: 8308143: [ppc] remove constant and add wrap-up for successor [v2] In-Reply-To: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> References: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> Message-ID: On Tue, 16 May 2023 04:55:44 GMT, Amit Kumar wrote: >> This is cosmetic change, which adds simple range check logic and changes -1 to NOREG_ENCODING. I liked the changes for s390x so aligning PPC with the same. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > fix incorrect brackets The PPC64 implementation follows the other platforms in this regard. Please keep it in sync with them. If you want to improve it, please change it for all platforms, not only PPC64. src/hotspot/cpu/ppc/register_ppc.hpp line 99: > 97: constexpr int encoding() const { assert(is_valid(), "invalid register"); return _encoding; } > 98: inline VMReg as_VMReg() const; > 99: Register successor() const { return Register((encoding() + 1) & (number_of_registers - 1)); } I don't think we need this. Only s390 has it, but I don't know the purpose of it. I don't think R0 should be the successor of R31. ------------- PR Review: https://git.openjdk.org/jdk/pull/13997#pullrequestreview-1428155442 PR Review Comment: https://git.openjdk.org/jdk/pull/13997#discussion_r1194892396 From amitkumar at openjdk.org Tue May 16 11:07:54 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 11:07:54 GMT Subject: RFR: 8308143: [ppc] remove constant and add wrap-up for successor [v2] In-Reply-To: References: <6O8zCOhvrk-S522G198jJZh9PbrimEnk34K2KQIN_1Q=.191b2e73-9d09-40b8-8166-13f468c83081@github.com> Message-ID: On Tue, 16 May 2023 09:35:45 GMT, Martin Doerr wrote: >The PPC64 implementation follows the other platforms in this regard. Please keep it in sync with them then it's probably better If I withdraw this PR. Thanks for input. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13997#issuecomment-1549451681 From amitkumar at openjdk.org Tue May 16 11:07:56 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 11:07:56 GMT Subject: Withdrawn: 8308143: [ppc] remove constant and add wrap-up for successor In-Reply-To: References: Message-ID: On Tue, 16 May 2023 04:21:14 GMT, Amit Kumar wrote: > This is cosmetic change, which adds simple range check logic and changes -1 to NOREG_ENCODING. I liked the changes for s390x so aligning PPC with the same. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13997 From dzhang at openjdk.org Tue May 16 12:24:52 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 16 May 2023 12:24:52 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v5] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Fix minmax_fp_masked_v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/486ebe7f..c1d74b71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=03-04 Stats: 44 lines in 3 files changed: 11 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From dzhang at openjdk.org Tue May 16 12:44:36 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 16 May 2023 12:44:36 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v6] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/c1d74b71..2b09a4e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From lucy at openjdk.org Tue May 16 13:03:46 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 16 May 2023 13:03:46 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: <1X8D4pGAvfzXos-uCptiUtXVguT85lI9O1QMGoM_OX4=.46bfe451-a356-4e02-8cae-897445597d85@github.com> On Tue, 16 May 2023 02:49:34 GMT, Amit Kumar wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @TheRealMDoerr and @tstuefe LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13978#pullrequestreview-1428526204 From amitkumar at openjdk.org Tue May 16 13:11:45 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 13:11:45 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: <1X8D4pGAvfzXos-uCptiUtXVguT85lI9O1QMGoM_OX4=.46bfe451-a356-4e02-8cae-897445597d85@github.com> References: <1X8D4pGAvfzXos-uCptiUtXVguT85lI9O1QMGoM_OX4=.46bfe451-a356-4e02-8cae-897445597d85@github.com> Message-ID: On Tue, 16 May 2023 13:00:36 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @TheRealMDoerr and @tstuefe > > LGTM. Thank you @RealLucy for reviewing it. @TheRealMDoerr would you like to take another look before we proceed with integration ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13978#issuecomment-1549639367 From chagedorn at openjdk.org Tue May 16 13:28:38 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 13:28:38 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v5] In-Reply-To: References: Message-ID: > This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. > > To make reviewing the entire change easier, I've decided to split the work into several PRs. > > This first PR includes the following _semantic-preserving_ changes: > - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: > - Updating the code (variables, method names etc.) accordingly. > - Renaming "Skeleton Predicates" to "Assertion Predicates". > - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. > - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). > - Change `class Predicates` -> `class ParsePredicates`. > - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). > - Removing unused variables. > - Removing unnecessary checks. > - Code style fixes in touched code. > > Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. > > The blog post can be found on my Github page at: > https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html > > Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Review Emanuel + Tobias ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13864/files - new: https://git.openjdk.org/jdk/pull/13864/files/8f80a6e8..b152b3e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13864&range=03-04 Stats: 29 lines in 1 file changed: 2 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/13864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13864/head:pull/13864 PR: https://git.openjdk.org/jdk/pull/13864 From epeter at openjdk.org Tue May 16 13:28:38 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 May 2023 13:28:38 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v5] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 13:24:45 GMT, Christian Hagedorn wrote: >> This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. >> >> To make reviewing the entire change easier, I've decided to split the work into several PRs. >> >> This first PR includes the following _semantic-preserving_ changes: >> - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: >> - Updating the code (variables, method names etc.) accordingly. >> - Renaming "Skeleton Predicates" to "Assertion Predicates". >> - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. >> - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). >> - Change `class Predicates` -> `class ParsePredicates`. >> - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). >> - Removing unused variables. >> - Removing unnecessary checks. >> - Code style fixes in touched code. >> >> Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. >> >> The blog post can be found on my Github page at: >> https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html >> >> Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Review Emanuel + Tobias @chhagedorn Thanks for the work, looking forward to the next steps! ? ------------- Marked as reviewed by epeter (Committer). PR Review: https://git.openjdk.org/jdk/pull/13864#pullrequestreview-1428576781 From chagedorn at openjdk.org Tue May 16 13:28:42 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 13:28:42 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v4] In-Reply-To: References: Message-ID: <_jWyR9pq1LBzNO7s-wp3Bls4hz6zEtEsUumNW4zYXng=.8134b976-312c-48c7-9246-3d3ef4fdc612@github.com> On Mon, 15 May 2023 08:11:34 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/loopPredicate.cpp >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/loopPredicate.cpp line 150: > >> 148: * The Initialized Assertion Predicates are always true because we will >> 149: * never enter the main loop because of the changed pre- and main-loop >> 150: * exit conditions. > > This does still not quite sound right. We will never enter the main loop? Sounds like the main-loop is ueseless in all cases. Suggestion: > > > The Initialized Assertion Predicates are always true: they are true when we enter the main loop > (because we adjusted the pre-loop exit condition), they are true in the last iteration (because we > adjust the main-loop exit condition), and they are true in all iterations in the middle by implication. Thanks for the suggestion, I've pushed an update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13864#discussion_r1195158362 From jsjolen at openjdk.org Tue May 16 13:30:38 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 13:30:38 GMT Subject: RFR: 8300081: Replace NULL with nullptr in share/asm/ Message-ID: Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/asm. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Fixes - Replace NULL with nullptr in share/asm/ Changes: https://git.openjdk.org/jdk/pull/14010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300081 Stats: 100 lines in 5 files changed: 0 ins; 0 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/14010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14010/head:pull/14010 PR: https://git.openjdk.org/jdk/pull/14010 From jsjolen at openjdk.org Tue May 16 13:30:41 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 13:30:41 GMT Subject: RFR: 8300081: Replace NULL with nullptr in share/asm/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:10:34 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/asm. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Only 3 comment issues src/hotspot/share/asm/codeBuffer.cpp line 712: > 710: buf_limit = (address)dest->relocation_end() - buf; > 711: } > 712: // if dest == null, this is just the sizing pass is null src/hotspot/share/asm/codeBuffer.cpp line 880: > 878: // Resizing must be allowed > 879: { > 880: if (blob() == nullptr) return; // caller must check for blob == null "check whether blob is null" src/hotspot/share/asm/codeBuffer.cpp line 971: > 969: void CodeBuffer::verify_section_allocation() { > 970: address tstart = _total_start; > 971: if (tstart == badAddress) return; // smashed by set_blob(null) nullptr ------------- PR Review: https://git.openjdk.org/jdk/pull/14010#pullrequestreview-1428434793 PR Review Comment: https://git.openjdk.org/jdk/pull/14010#discussion_r1195070386 PR Review Comment: https://git.openjdk.org/jdk/pull/14010#discussion_r1195070755 PR Review Comment: https://git.openjdk.org/jdk/pull/14010#discussion_r1195070958 From chagedorn at openjdk.org Tue May 16 13:33:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 13:33:59 GMT Subject: RFR: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates [v5] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 13:23:49 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review Emanuel + Tobias > > @chhagedorn Thanks for the work, looking forward to the next steps! ? Thanks @eme64, @TobiHartmann, and @rwestrel for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13864#issuecomment-1549678457 From chagedorn at openjdk.org Tue May 16 13:34:00 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 13:34:00 GMT Subject: Integrated: 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates In-Reply-To: References: Message-ID: <7C3HATYBiKBbxS2KgwowhggNCj1LQCFL7xF3CqLpTqU=.49bd9f85-caa5-4a34-8091-4558c660ca42@github.com> On Mon, 8 May 2023 13:04:18 GMT, Christian Hagedorn wrote: > This is the first patch in a series of patches to fix the remaining issues with Assertion/Skeleton Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). To achieve this task, we've decided to cleanup the naming scheme of predicates, cleanup the code operating on predicates (matching predicates, skipping over predicates etc.), and finally redesign the Assertion/Skeleton Predicates. While the basic idea stays the same, we are using new nodes and unified, cleaned up classes and methods to do the job. This redesign simplifies and only even makes it possible to fix the remaining known issues in a clean way. > > To make reviewing the entire change easier, I've decided to split the work into several PRs. > > This first PR includes the following _semantic-preserving_ changes: > - Establishing a new naming scheme for all the predicates found in C2 which makes it easier to talk about the various kinds of predicates: > - Updating the code (variables, method names etc.) accordingly. > - Renaming "Skeleton Predicates" to "Assertion Predicates". > - Including a summary of all predicates found in C2 in `loopPredicate.cpp`. > - Capitalizing predicate names to better distinguish them in comments (e.g. "Parse Predicate" instead of "parse predicate"). > - Change `class Predicates` -> `class ParsePredicates`. > - Improving type information (e.g. using `IfProjNode` instead of `ProjNode`, using `ParsePredicateSuccessProj/ParsePredicateUncommonProj` typedefs for Parse Predicates etc.). > - Removing unused variables. > - Removing unnecessary checks. > - Code style fixes in touched code. > > Instead of giving more background information about Assertion Predicates and why we need them here, I've decided to write a _blog post_ dedicated to Assertion Predicates and Loop Predication. This provides an overview and introduction which, hopefully, makes reviewing the PRs related to Assertion Predicates easier - at least on a higher level. > > The blog post can be found on my Github page at: > https://chhagedorn.github.io/jdk/2023/05/05/assertion-predicates.html > > Thanks to @rwestrel, @eme64, and @TobiHartmann for your help with discussions, brainstormings, and pre-reviewing some of the changes here and in upcoming PRs. > > Thanks, > Christian This pull request has now been integrated. Changeset: 19c8c30d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/19c8c30d1cfe611945f1bf97018280ae6b48ee8b Stats: 730 lines in 19 files changed: 223 ins; 43 del; 464 mod 8305634: Renaming predicates, simple cleanups, and adding summary about current predicates Reviewed-by: epeter, thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/13864 From coleenp at openjdk.org Tue May 16 14:31:52 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 14:31:52 GMT Subject: RFR: 8300081: Replace NULL with nullptr in share/asm/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:10:34 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/asm. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good, also trivial. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14010#pullrequestreview-1428727253 From mdoerr at openjdk.org Tue May 16 15:11:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 16 May 2023 15:11:49 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 02:49:34 GMT, Amit Kumar wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @TheRealMDoerr and @tstuefe Thanks for the update! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13978#pullrequestreview-1428819745 From amitkumar at openjdk.org Tue May 16 15:17:52 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 15:17:52 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 02:49:34 GMT, Amit Kumar wrote: >> This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. >> >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @TheRealMDoerr and @tstuefe Thanks for reviews, ------------- PR Comment: https://git.openjdk.org/jdk/pull/13978#issuecomment-1549872949 From amitkumar at openjdk.org Tue May 16 15:28:54 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 15:28:54 GMT Subject: Integrated: 8278411: Implement UseHeavyMonitors consistently, s390 port In-Reply-To: References: Message-ID: On Mon, 15 May 2023 08:25:19 GMT, Amit Kumar wrote: > This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. > > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. This pull request has now been integrated. Changeset: 41ee125a Author: Amit Kumar Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/41ee125a0f6cf17c20d148bf2c06db1707e4d889 Stats: 101 lines in 5 files changed: 29 ins; 1 del; 71 mod 8278411: Implement UseHeavyMonitors consistently, s390 port Reviewed-by: mdoerr, stuefe, lucy ------------- PR: https://git.openjdk.org/jdk/pull/13978 From chagedorn at openjdk.org Tue May 16 15:34:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 May 2023 15:34:59 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes Message-ID: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). Changes include: - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. - Cleanup of touched code (dead code, variable renaming, code style) - Added comments (e.g. for some special case in Loop Predication) For more background, have a look at the first PR: #13864 Thanks, Christian ------------- Commit messages: - 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes Changes: https://git.openjdk.org/jdk/pull/14017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14017&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305635 Stats: 623 lines in 12 files changed: 295 ins; 195 del; 133 mod Patch: https://git.openjdk.org/jdk/pull/14017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14017/head:pull/14017 PR: https://git.openjdk.org/jdk/pull/14017 From jsjolen at openjdk.org Tue May 16 15:43:59 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 15:43:59 GMT Subject: RFR: 8300081: Replace NULL with nullptr in share/asm/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:10:34 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/asm. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Thanks Coleen, This passes tier1 so merging now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14010#issuecomment-1549915187 From jsjolen at openjdk.org Tue May 16 15:44:00 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 15:44:00 GMT Subject: Integrated: 8300081: Replace NULL with nullptr in share/asm/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:10:34 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/asm. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 9d5bab11 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/9d5bab11f08a992803399f422d75b17f8607df72 Stats: 99 lines in 5 files changed: 0 ins; 0 del; 99 mod 8300081: Replace NULL with nullptr in share/asm/ Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/jdk/pull/14010 From jsjolen at openjdk.org Tue May 16 16:04:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 16:04:50 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ Message-ID: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Fixes - Replace NULL with nullptr in share/c1/ Changes: https://git.openjdk.org/jdk/pull/14009/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14009&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300086 Stats: 1546 lines in 33 files changed: 0 ins; 0 del; 1546 mod Patch: https://git.openjdk.org/jdk/pull/14009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14009/head:pull/14009 PR: https://git.openjdk.org/jdk/pull/14009 From jsjolen at openjdk.org Tue May 16 16:05:00 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 16:05:00 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Tue, 16 May 2023 12:08:47 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Fairly few issues, very nice! Added my fixes, running tier1. src/hotspot/share/c1/c1_GraphBuilder.cpp line 3421: > 3419: > 3420: # ifdef ASSERT > 3421: //All blocks reachable from start_block have _end != null _end isn't null src/hotspot/share/c1/c1_GraphBuilder.cpp line 3429: > 3427: BlockBegin* current = to_go.pop(); > 3428: assert(current != nullptr, "Should not happen."); > 3429: assert(current->end() != nullptr, "All blocks reachable from start_block should have end() != null."); `end() != nullptr` src/hotspot/share/c1/c1_Instruction.hpp line 239: > 237: if (!(enabled) ) return false; \ > 238: class_name* _v = v->as_##class_name(); \ > 239: if (_v == nullptr ) return false; \ align src/hotspot/share/c1/c1_Instruction.hpp line 252: > 250: if (!(enabled) ) return false; \ > 251: class_name* _v = v->as_##class_name(); \ > 252: if (_v == nullptr ) return false; \ align src/hotspot/share/c1/c1_Instruction.hpp line 266: > 264: if (!(enabled) ) return false; \ > 265: class_name* _v = v->as_##class_name(); \ > 266: if (_v == nullptr ) return false; \ align src/hotspot/share/c1/c1_LIRGenerator.cpp line 390: > 388: assert(instr->subst() == instr, "shouldn't have missed substitution"); > 389: instr->visit(this); > 390: // assert(instr->use_count() > 0 || instr->as_Phi() != null, "leaf instruction must have a use"); nullptr src/hotspot/share/c1/c1_LinearScan.cpp line 3483: > 3481: assert(value->operand()->is_register() && value->operand()->is_virtual(), "value must have virtual operand"); > 3482: assert(value->operand()->vreg_number() == r, "register number must match"); > 3483: // TKR assert(value->as_Constant() == null || value->is_pinned(), "only pinned constants can be alive across block boundaries"); nullptr src/hotspot/share/c1/c1_Runtime1.cpp line 609: > 607: // If the stack guard pages are enabled, check whether there is a handler in > 608: // the current method. Otherwise (guard pages disabled), force an unwind and > 609: // skip the exception cache update (i.e., just leave continuation==null). as null src/hotspot/share/c1/c1_ValueMap.cpp line 157: > 155: ValueMapEntry* prev_entry = nullptr; \ > 156: for (ValueMapEntry* entry = entry_at(i); entry != nullptr; entry = entry->next()) { \ > 157: Value value = entry->value(); \ align src/hotspot/share/c1/c1_ValueMap.cpp line 165: > 163: \ > 164: if (prev_entry == nullptr) { \ > 165: _entries.at_put(i, entry->next()); \ align src/hotspot/share/c1/c1_ValueMap.cpp line 192: > 190: LoadField* lf = value->as_LoadField(); \ > 191: bool must_kill = lf != nullptr \ > 192: && lf->field()->holder() == field->holder() \ align src/hotspot/share/c1/c1_ValueStack.hpp line 256: > 254: index < temp_var && (value = state->local_at(index), true); \ > 255: index += (value == nullptr || value->type()->is_illegal() ? 1 : value->type()->size())) \ > 256: if (value != nullptr) align src/hotspot/share/c1/c1_ValueStack.hpp line 329: > 327: for_each_local_value(cur_state, cur_index, value) { \ > 328: Phi* v_phi = value->as_Phi(); \ > 329: if (v_phi != nullptr && v_phi->block() == v_block) { \ align ------------- PR Review: https://git.openjdk.org/jdk/pull/14009#pullrequestreview-1428800775 PR Comment: https://git.openjdk.org/jdk/pull/14009#issuecomment-1549951636 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195305341 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195307287 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195309598 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195309688 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195309750 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195312850 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195315224 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195317014 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195317413 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195317550 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195317750 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195318364 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1195318504 From never at openjdk.org Tue May 16 16:51:45 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 16 May 2023 16:51:45 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: References: Message-ID: <8Wgr4DuTLQIMk2IgF9EV7H1D5kuzxzNNXoF-nJGFv8E=.55faf559-4da0-4c2b-9a06-b1ca000151cd@github.com> On Tue, 16 May 2023 08:02:11 GMT, Doug Simon wrote: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Marked as reviewed by never (Reviewer). src/hotspot/share/jvmci/jvmciEnv.cpp line 342: > 340: } > 341: if (line >= max_lines) { > 342: JVMCI_event_1("[elided %d more stack trace lines]", line - max_lines); You could add this output to the last line instead of burning an extra line. ------------- PR Review: https://git.openjdk.org/jdk/pull/14000#pullrequestreview-1428988011 PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1195429692 From never at openjdk.org Tue May 16 16:51:48 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 16 May 2023 16:51:48 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: References: Message-ID: <2ln_MtqHYewt77jwXAcxlMLqDL6eH3ytFRUKSq1FG-c=.a226b0b4-8ff9-412f-b4a8-aa4ce070d5c5@github.com> On Tue, 16 May 2023 09:05:25 GMT, Doug Simon wrote: >> When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). >> >> This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): >> >> JVMCI Events (11 events): >> ... >> Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError >> Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) >> >> >> It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: >> >> COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] >> >> >> [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 2047: > >> 2045: (jlong) compile_state, compile_state->task()->compile_id()); >> 2046: #ifdef ASSERT >> 2047: if (JVMCIENV->has_pending_exception() && JVMCICompileMethodExceptionIsFatal) { > > It's a shame to introduce a VM flag (i.e., `JVMCICompileMethodExceptionIsFatal`) for a test case but I don't know of any other way to do this. As far as I know, system properties cannot be accessed here. Maybe using an environment variable is better than a VM flag? Why can't you use a JVMCI property for this? You get a chance to see them when copying them to Graal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1195435341 From xuelei at openjdk.org Tue May 16 16:54:47 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 16 May 2023 16:54:47 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v2] In-Reply-To: References: Message-ID: > Hi, > > This is a redo of JDK-8307855, where issues were found after integration. > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: replace strcpy with snprintf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13995/files - new: https://git.openjdk.org/jdk/pull/13995/files/9ac95707..1f833d5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13995/head:pull/13995 PR: https://git.openjdk.org/jdk/pull/13995 From xuelei at openjdk.org Tue May 16 16:54:50 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 16 May 2023 16:54:50 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 21:47:19 GMT, Mikael Vidstedt wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> replace strcpy with snprintf > > src/utils/hsdis/binutils/hsdis-binutils.c line 248: > >> 246: size_t used_size = strlen(close); >> 247: char* p = buf + used_size; >> 248: bufsize -= used_size; > > May not happen in practice, but if `used_size` is larger than `bufsize` this will wrap to a very large value. Perhaps the `strcpy` above should also be an `snprintf`, and the return value handled the same way as for the subsequent `snprintf` calls? I think it is safe as the `buf` size has been checked at around line 230. However, it may make the code easier to read if replacing `strcpy` with `snprintf`. The patch was updated accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1195441861 From kvn at openjdk.org Tue May 16 17:51:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 May 2023 17:51:57 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:05:06 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > whitespace fix src/hotspot/share/opto/vectornode.hpp line 244: > 242: > 243: virtual VectorNode* make_normal_vector_op(Node* in1, Node* in2, const TypeVect* vt) = 0; > 244: virtual bool make_normal_vector_op_implemented(const TypeVect* vt) = 0; How about introducing `virtual int vect_Opcode()` (`norm_vect_Opcode()`) or something which returns normal vector opcode (`Op_AddVI` for `AddReductionVINode` for example). Then you don't need these 2 functions to be virtual: virtual int vect_Opcode() const = 0; VectorNode* make_normal_vector_op(Node* in1, Node* in2, const TypeVect* vt) { return new VectorNode::make(vect_Opcode(), in1, in2, vt); } bool make_normal_vector_op_implemented(const TypeVect* vt) { return Matcher::match_rule_supported_vector(vect_Opcode(), vt->length(), vt->element_basic_type()); } If we need that in more cases then in your changes may be have even more general (in `VectorNode` class) `scalar_Opcode()` and use `VectorNode::opcode(sclar_Opcode(), vt->element_basic_type())` to get normal vector opcode. This may need more changes and testing - a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1195504471 From kvn at openjdk.org Tue May 16 17:51:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 May 2023 17:51:57 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 17:47:34 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace fix > > src/hotspot/share/opto/vectornode.hpp line 244: > >> 242: >> 243: virtual VectorNode* make_normal_vector_op(Node* in1, Node* in2, const TypeVect* vt) = 0; >> 244: virtual bool make_normal_vector_op_implemented(const TypeVect* vt) = 0; > > How about introducing `virtual int vect_Opcode()` (`norm_vect_Opcode()`) or something which returns normal vector opcode (`Op_AddVI` for `AddReductionVINode` for example). Then you don't need these 2 functions to be virtual: > > virtual int vect_Opcode() const = 0; > VectorNode* make_normal_vector_op(Node* in1, Node* in2, const TypeVect* vt) { > return new VectorNode::make(vect_Opcode(), in1, in2, vt); > } > bool make_normal_vector_op_implemented(const TypeVect* vt) { > return Matcher::match_rule_supported_vector(vect_Opcode(), vt->length(), vt->element_basic_type()); > } > > > If we need that in more cases then in your changes may be have even more general (in `VectorNode` class) `scalar_Opcode()` and use `VectorNode::opcode(sclar_Opcode(), vt->element_basic_type())` to get normal vector opcode. This may need more changes and testing - a separate RFE. I am also not sure about need `_op` in these functions names. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1195505796 From jsjolen at openjdk.org Tue May 16 18:16:48 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 18:16:48 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Tue, 16 May 2023 12:08:47 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Tier1 passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14009#issuecomment-1550144280 From liach at openjdk.org Tue May 16 18:20:45 2023 From: liach at openjdk.org (Chen Liang) Date: Tue, 16 May 2023 18:20:45 GMT Subject: RFR: 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete In-Reply-To: References: Message-ID: <5CjNjAQme8BTCA-TJzjjV8Zb7e-xYvuHvQENi7nUkrs=.5c407434-4d8f-42a7-aec6-37961ea9afe3@github.com> On Mon, 15 May 2023 08:38:54 GMT, Adam Sotona wrote: > Package `jdk.internal.classfile.java.lang.constant` containing `ModuleDesc` and `PackageDesc` become obsolete after [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729). > All references to `jdk.internal.classfile.java.lang.constant.ModuleDesc` and `jdk.internal.classfile.java.lang.constant.PackageDesc` across all JDK sources, tests and JMH benchmarks are replaced with `java.lang.constant.ModuleDesc` and `java.lang.constant.PackageDesc`. > `jdk.internal.classfile.java.lang.constant` package export hooks are removed from java.base module-info, make files and test headers. > Content of `jdk.internal.classfile.java.lang.constant` package and related tests under `jdk.classfile` are deleted. > Method references renamed in [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729) are fixed: > - `PackageDesc::packageName` to `PackageDesc::name` > - `PackageDesc::packageInternalName` to `PackageDesc::internalName` > - `ModuleDesc::moduleName` to `ModuleDesc::name`. > > Please review this pull request. > > Thanks, > Adam I think these are all the occurrences of jdk.internal.classfile.java.lang.constant package. ------------- Marked as reviewed by liach (Author). PR Review: https://git.openjdk.org/jdk/pull/13979#pullrequestreview-1429153817 From dnsimon at openjdk.org Tue May 16 21:26:47 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 21:26:47 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: <8Wgr4DuTLQIMk2IgF9EV7H1D5kuzxzNNXoF-nJGFv8E=.55faf559-4da0-4c2b-9a06-b1ca000151cd@github.com> References: <8Wgr4DuTLQIMk2IgF9EV7H1D5kuzxzNNXoF-nJGFv8E=.55faf559-4da0-4c2b-9a06-b1ca000151cd@github.com> Message-ID: On Tue, 16 May 2023 16:38:02 GMT, Tom Rodriguez wrote: >> When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). >> >> This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): >> >> JVMCI Events (11 events): >> ... >> Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError >> Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) >> >> >> It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: >> >> COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] >> >> >> [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe > > src/hotspot/share/jvmci/jvmciEnv.cpp line 342: > >> 340: } >> 341: if (line >= max_lines) { >> 342: JVMCI_event_1("[elided %d more stack trace lines]", line - max_lines); > > You could add this output to the last line instead of burning an extra line. Indeed: Event: 0.237 Thread 0x000000011e019e10 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError Event: 0.237 Thread 0x000000011e019e10 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:161) [elided 2 more stack trace lines] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1195700413 From dnsimon at openjdk.org Tue May 16 21:26:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 21:26:52 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: <2ln_MtqHYewt77jwXAcxlMLqDL6eH3ytFRUKSq1FG-c=.a226b0b4-8ff9-412f-b4a8-aa4ce070d5c5@github.com> References: <2ln_MtqHYewt77jwXAcxlMLqDL6eH3ytFRUKSq1FG-c=.a226b0b4-8ff9-412f-b4a8-aa4ce070d5c5@github.com> Message-ID: On Tue, 16 May 2023 16:43:01 GMT, Tom Rodriguez wrote: >> src/hotspot/share/jvmci/jvmciRuntime.cpp line 2047: >> >>> 2045: (jlong) compile_state, compile_state->task()->compile_id()); >>> 2046: #ifdef ASSERT >>> 2047: if (JVMCIENV->has_pending_exception() && JVMCICompileMethodExceptionIsFatal) { >> >> It's a shame to introduce a VM flag (i.e., `JVMCICompileMethodExceptionIsFatal`) for a test case but I don't know of any other way to do this. As far as I know, system properties cannot be accessed here. Maybe using an environment variable is better than a VM flag? > > Why can't you use a JVMCI property for this? You get a chance to see them when copying them to Graal. The copying is only done when using libgraal. I'd like to have this test run in a JDK without libgraal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1195702132 From dnsimon at openjdk.org Tue May 16 21:33:50 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 21:33:50 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v2] In-Reply-To: References: Message-ID: <0xLc58Wa85RKMBStJ-W3xO67Y5rvy24hu9Ggu-dMOMo=.570e69cd-7435-4cc0-ac48-01597e943512@github.com> > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Doug Simon has updated the pull request incrementally with one additional commit since the last revision: append elision comment to end of last stack trace line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14000/files - new: https://git.openjdk.org/jdk/pull/14000/files/b1854a6a..9efd3ea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=00-01 Stats: 13 lines in 1 file changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From dnsimon at openjdk.org Tue May 16 22:04:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 22:04:55 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v3] In-Reply-To: References: <2ln_MtqHYewt77jwXAcxlMLqDL6eH3ytFRUKSq1FG-c=.a226b0b4-8ff9-412f-b4a8-aa4ce070d5c5@github.com> Message-ID: On Tue, 16 May 2023 21:24:24 GMT, Doug Simon wrote: >> Why can't you use a JVMCI property for this? You get a chance to see them when copying them to Graal. > > The copying is only done when using libgraal. I'd like to have this test run in a JDK without libgraal. I see that I can simply use system properties after all: https://github.com/openjdk/jdk/pull/14000/commits/90f4346b3c8737fd0fee25a7ed0c32a1bd506c88 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1195727360 From dnsimon at openjdk.org Tue May 16 22:04:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 22:04:54 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v3] In-Reply-To: References: Message-ID: <9AgB36HtSjwiSZMH3rTs2FolM3LDZeTm0dR-lk3m1FE=.b2b773a7-879b-48e4-80d1-7132c1a2b256@github.com> > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Doug Simon has updated the pull request incrementally with one additional commit since the last revision: replace JVMCICompileMethodExceptionIsFatal VM flag with test.jvmci.compileMethodExceptionIsFatal system property ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14000/files - new: https://git.openjdk.org/jdk/pull/14000/files/9efd3ea6..90f4346b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=01-02 Stats: 17 lines in 4 files changed: 4 ins; 6 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From fjiang at openjdk.org Wed May 17 01:45:50 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 17 May 2023 01:45:50 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v6] In-Reply-To: References: Message-ID: <6QB1ModTlrzwc-GfwGJ0U6XgRPbhJsOawE_P9yHZnAM=.7fbce6b6-1978-4570-abef-7fd9c88902d7@github.com> On Tue, 16 May 2023 12:44:36 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace Looks good, thanks. ------------- Marked as reviewed by fjiang (Author). PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1429664708 From duke at openjdk.org Wed May 17 02:21:03 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 17 May 2023 02:21:03 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension Message-ID: This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. ------------- Commit messages: - 8308192: Error in parsing replay file when staticfield is an array of single dimension Changes: https://git.openjdk.org/jdk/pull/14024/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14024&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308192 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14024/head:pull/14024 PR: https://git.openjdk.org/jdk/pull/14024 From fyang at openjdk.org Wed May 17 03:05:58 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 May 2023 03:05:58 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v6] In-Reply-To: References: Message-ID: <4tnPjoFb1uXfNtTyOPx6hOPJlaD_9BihINgiz4tddQ0=.aaacd56d-a7a8-4637-ad99-2e48395b6bef@github.com> On Tue, 16 May 2023 12:44:36 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace Thanks for the update. Would you mind a few more tweaks? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1647: > 1645: void C2_MacroAssembler::minmax_fp_masked_v(VectorRegister dst, VectorRegister src1, VectorRegister src2, > 1646: VectorRegister vmask, int vector_length, VectorRegister tmp1, > 1647: VectorRegister tmp2, bool is_double, bool is_min) { Suggestion: make `vector_length` the last parameter so that it will be more consistent in style with friend `C2_MacroAssembler::minmax_fp_v` src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1683: > 1681: > 1682: is_min ? vfredmin_vs(tmp1, src2, tmp2, vm) > 1683: : vfredmax_vs(tmp1, src2, tmp2, vm); Suggestion: put the result of reduction in `dst` with `vfmv_f_s(dst, tmp1)` here and save the `j(L_done_check)` at line 1695. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1717: > 1715: void C2_MacroAssembler::reduce_integral_v(Register dst, VectorRegister tmp, > 1716: Register src1, VectorRegister src2, > 1717: BasicType bt, int opc, int vector_length, VectorMask vm) { Suggested parameter order: dst, src1, src2, tmp, opc, bt, vector_length, vm ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1429683625 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1195852453 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1195876189 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1195876780 From kbarrett at openjdk.org Wed May 17 03:29:44 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 03:29:44 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 16:54:47 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> This is a redo of JDK-8307855, where issues were found after integration. >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > replace strcpy with snprintf Changes requested by kbarrett (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13995#pullrequestreview-1429733422 From kbarrett at openjdk.org Wed May 17 03:29:46 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 03:29:46 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 16:49:27 GMT, Xue-Lei Andrew Fan wrote: >> src/utils/hsdis/binutils/hsdis-binutils.c line 248: >> >>> 246: size_t used_size = strlen(close); >>> 247: char* p = buf + used_size; >>> 248: bufsize -= used_size; >> >> May not happen in practice, but if `used_size` is larger than `bufsize` this will wrap to a very large value. Perhaps the `strcpy` above should also be an `snprintf`, and the return value handled the same way as for the subsequent `snprintf` calls? > > I think it is safe as the `buf` size has been checked at around line 230. However, it may make the code easier to read if replacing `strcpy` with `snprintf`. The patch was updated accordingly. This and all uses of snprintf in this change are incorrect. If the output is truncated, snprintf returns the number of characters that would have been written if there had been enough space. That is, the result may be larger than bufsize. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1195887441 From xuelei at openjdk.org Wed May 17 04:17:43 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 17 May 2023 04:17:43 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 03:26:45 GMT, Kim Barrett wrote: > This and all uses of snprintf in this change are incorrect. If the output is truncated, snprintf returns the number of characters that would have been written if there had been enough space. That is, the result may be larger than bufsize. The correctness of this change depends on the fact that the buffer has sufficient capacity, which has been checked at line 230. I agreed that this is not a typical use of `snprintf` that the returned value is not checked. I will make an update to check the returned value of `snprintf`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1195909903 From xuelei at openjdk.org Wed May 17 05:49:00 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 17 May 2023 05:49:00 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: References: Message-ID: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> > Hi, > > This is a redo of JDK-8307855, where issues were found after integration. > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: check returned value of snprintf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13995/files - new: https://git.openjdk.org/jdk/pull/13995/files/1f833d5e..dd6ddbc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=01-02 Stats: 14 lines in 1 file changed: 12 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13995/head:pull/13995 PR: https://git.openjdk.org/jdk/pull/13995 From dzhang at openjdk.org Wed May 17 06:12:07 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 17 May 2023 06:12:07 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v7] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Adjust some function in c2_MacroAssembler_riscv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/2b09a4e3..2274099a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=05-06 Stats: 104 lines in 3 files changed: 2 ins; 4 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From dzhang at openjdk.org Wed May 17 06:12:10 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 17 May 2023 06:12:10 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v6] In-Reply-To: <4tnPjoFb1uXfNtTyOPx6hOPJlaD_9BihINgiz4tddQ0=.aaacd56d-a7a8-4637-ad99-2e48395b6bef@github.com> References: <4tnPjoFb1uXfNtTyOPx6hOPJlaD_9BihINgiz4tddQ0=.aaacd56d-a7a8-4637-ad99-2e48395b6bef@github.com> Message-ID: On Wed, 17 May 2023 02:08:58 GMT, Fei Yang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove trailing whitespace > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1647: > >> 1645: void C2_MacroAssembler::minmax_fp_masked_v(VectorRegister dst, VectorRegister src1, VectorRegister src2, >> 1646: VectorRegister vmask, int vector_length, VectorRegister tmp1, >> 1647: VectorRegister tmp2, bool is_double, bool is_min) { > > Suggestion: make `vector_length` the last parameter so that it will be more consistent in style with friend `C2_MacroAssembler::minmax_fp_v` Fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1683: > >> 1681: >> 1682: is_min ? vfredmin_vs(tmp1, src2, tmp2, vm) >> 1683: : vfredmax_vs(tmp1, src2, tmp2, vm); > > Suggestion: put the result of reduction in `dst` with `vfmv_f_s(dst, tmp1)` here and save the `j(L_done_check)` at line 1695. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1195980293 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1195980433 From jbhateja at openjdk.org Wed May 17 06:13:53 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 May 2023 06:13:53 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v6] In-Reply-To: References: Message-ID: <1iQ3Xup2lCJnC629Y8VgjTKonFHxxWB8TcDou4g1Xp8=.36459fc2-911e-4551-8a7c-a65da3b49bf4@github.com> On Mon, 15 May 2023 07:46:07 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Added Matcher::match_rule_supported_vector check, removed bad assert, added test for it src/hotspot/share/opto/loopopts.cpp line 4191: > 4189: const Type* bt_t = Type::get_const_basic_type(bt); > 4190: > 4191: if (!last_ur->make_normal_vector_op_implemented(vec_t)) { Some naming comments, _make_ prefix is more suitable for IR creation routines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1195977325 From jbhateja at openjdk.org Wed May 17 06:13:56 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 May 2023 06:13:56 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v7] In-Reply-To: References: Message-ID: <0eGoNIE_MHL8VFr591lUJGQpl6t11jk583oI9GJV6OI=.190410f2-ce5f-47ad-9821-f767515fdcd5@github.com> On Mon, 15 May 2023 11:05:06 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > whitespace fix src/hotspot/share/opto/vectornode.hpp line 438: > 436: virtual bool make_normal_vector_op_implemented(const TypeVect* vt) { > 437: return Matcher::match_rule_supported_vector(Op_MulVI, vt->length(), vt->element_basic_type()); > 438: } I agree with Vladimir's comments, we can remove explicit calls from each reduction node class and introduce factory method _VectorNode::make_from_ropc_ in vectornode.cpp similar to _ReductionNode::make_from_vopc_ which accept reduction opcode and returns equivalent vector node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1195969474 From never at openjdk.org Wed May 17 06:33:47 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 17 May 2023 06:33:47 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v3] In-Reply-To: References: <2ln_MtqHYewt77jwXAcxlMLqDL6eH3ytFRUKSq1FG-c=.a226b0b4-8ff9-412f-b4a8-aa4ce070d5c5@github.com> Message-ID: On Tue, 16 May 2023 22:00:22 GMT, Doug Simon wrote: >> The copying is only done when using libgraal. I'd like to have this test run in a JDK without libgraal. > > I see that I can simply use system properties after all: https://github.com/openjdk/jdk/pull/14000/commits/90f4346b3c8737fd0fee25a7ed0c32a1bd506c88 Yes that looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1196004966 From kbarrett at openjdk.org Wed May 17 08:58:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 08:58:49 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> References: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> Message-ID: On Wed, 17 May 2023 05:49:00 GMT, Xue-Lei Andrew Fan wrote: >> Hi, >> >> This is a redo of JDK-8307855, where issues were found after integration. >> >> The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. >> >> Thanks, >> Xuelei > > Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: > > check returned value of snprintf Changes requested by kbarrett (Reviewer). src/utils/hsdis/binutils/hsdis-binutils.c line 246: > 244: > 245: size_t used_size = snprintf(buf, bufsize, "%s", close); > 246: if ((used_size < 0) || (used_size >= bufsize)) { (used_size < 0) is tautologically false, since used_size is a size_t, so unsigned. I'm somewhat surprised this doesn't trigger a warning from some compiler. ------------- PR Review: https://git.openjdk.org/jdk/pull/13995#pullrequestreview-1430144188 PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1196161411 From kbarrett at openjdk.org Wed May 17 08:58:51 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 08:58:51 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: References: Message-ID: <6oTmCMbuV_t23TrGFTMv1BuKTBt3GzpY4W7iXfuuKFA=.22e67e0a-c0d8-4d5d-8e20-039ce3243e6c@github.com> On Wed, 17 May 2023 04:15:01 GMT, Xue-Lei Andrew Fan wrote: >> This and all uses of snprintf in this change are incorrect. If the output is truncated, snprintf returns the >> number of characters that would have been written if there had been enough space. That is, the result >> may be larger than bufsize. > >> This and all uses of snprintf in this change are incorrect. If the output is truncated, snprintf returns the number of characters that would have been written if there had been enough space. That is, the result may be larger than bufsize. > > The correctness of this change depends on the fact that the buffer has sufficient capacity, which has been checked at line 230. I agreed that this is not a typical use of `snprintf` that the returned value is not checked. I will make an update to check the returned value of `snprintf`. OK, I missed that. (The relevant code doesn't show up in the default github diff. I really ought to know better than to use that view for reviewing.) Even having been pointed to the code, I had to do some counting and such to convince myself that it was safe. A bit of commentary might save some time for the next reader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1196170575 From dzhang at openjdk.org Wed May 17 08:59:55 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 17 May 2023 08:59:55 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v8] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Add some comments - Merge remote-tracking branch 'upstream/master' into JDK-8307609 - Adjust some params order in c2_MacroAssembler_riscv - Adjust some function in c2_MacroAssembler_riscv - Remove trailing whitespace - Fix minmax_fp_masked_v - Change some iRegI to iRegIorL2I and small refactoring of minmax_fp_masked_v - Remove debug warning - Merge master and resolve conflict - Optimize call point of vfclass and adjust the parameters of c2 instruct - ... and 8 more: https://git.openjdk.org/jdk/compare/2f1c6548...1fc880e3 ------------- Changes: https://git.openjdk.org/jdk/pull/13862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=07 Stats: 1754 lines in 6 files changed: 1464 ins; 145 del; 145 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From thartmann at openjdk.org Wed May 17 09:50:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 09:50:45 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:12:42 GMT, Emanuel Peter wrote: > Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). > > There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. > > **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: > We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: > https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 > > So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13980#pullrequestreview-1430257805 From thartmann at openjdk.org Wed May 17 09:53:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 09:53:47 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:38:52 GMT, Emanuel Peter wrote: >> This is the second step in the `VerifyLoopOptimizations` revival. >> >> Last step: >> [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure >> See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 >> >> Bug fixing for this step: >> [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate >> (https://github.com/openjdk/jdk/pull/13980) >> >> Next step: >> [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop >> >> I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add TestVerifyLoopOptimizations.java Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13951#pullrequestreview-1430262267 From epeter at openjdk.org Wed May 17 10:19:53 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 10:19:53 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate [v2] In-Reply-To: References: Message-ID: > Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). > > There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. > > **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: > We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: > https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 > > So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - merge from master after Assertion Predicate renaming - 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate ------------- Changes: https://git.openjdk.org/jdk/pull/13980/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13980&range=01 Stats: 15 lines in 1 file changed: 8 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13980.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13980/head:pull/13980 PR: https://git.openjdk.org/jdk/pull/13980 From thartmann at openjdk.org Wed May 17 10:22:48 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 10:22:48 GMT Subject: RFR: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:14:49 GMT, Emanuel Peter wrote: >> **The Problem** >> >> During CCP, we get to a state like that: >> >> x (int:1) Phi (int:4) >> | | >> | +-----+ >> | | >> LShiftI (int:16) >> | >> CastII (top) ConI (int:3) >> | | >> +----+ +---------+ >> | | >> AndI >> >> >> We call `AddINode::Value` during CCP, and in `MulNode::AndIL_shift_and_mask_is_always_zero` we `uncast` both inputs, which leaves us with `LShiftI` and `ConI` as the "true" inputs. They both have non-top types, and so we determine that this `AndI-LShiftI` combination always leads to `zero`: The `Phi` has a constant type (`int:4`). So this leaves the lowest 4 bits zero after the `LShiftI`. Then and-ing that with `int:3` means we extract the lowest 3 bits that are zero. So the result is provably always zero - that is the idea. >> >> Then, we have some type updates (here of `x` and `Phi` and `LShiftI`), and the graph looks like this: >> >> x (int) Phi (int:0..4) >> | | >> | +-----+ >> | | >> LShiftI (int) >> | >> CastII (top) ConI (int:3) >> | | >> +----+ +---------+ >> | | >> AndI >> >> >> This leads to `shift2` failing to have constant type: >> https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L1964-L1967 >> >> And with that, we fall back to `MulNode::Value`: >> https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L559-L566 >> >> In `MulNode::Value` we detect that the `CastII` has type `top`, and return `top` for `AndI`. >> >> CCP expects the types to become more wide over time, so going from `int:0` to `top` is the wrong direction. >> >> **Solution** >> >> The problem is with the relatively rare `CastII` still being `top` - this seems to be very rare. But the new regression test `TestShiftCastAndNotification.java` seems to create exactly that case, in combination with `-XX:StressCCP`. >> >> We should guard against `top` in one of the `AndI` inputs inside `MulNode::AndIL_shift_and_mask_is_always_zero`. This will prevent it from detecting the zero-case, untill `MulNode::Value` would get a chance to compute a non-top type. >> >> **Argument for Solution** >> >> Is there still a threat from `MulNode::AndIL_shift_and_mask_is_always_zero` computing a zero first, and `MulNode::Value` a type that does not include zero after ward? >> As types only widen during CCP, having a zero first means that all inputs now are non-top - in fact th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > refactor with @chhagedorn's suggestions Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13908#pullrequestreview-1430313800 From epeter at openjdk.org Wed May 17 10:43:59 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 10:43:59 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v8] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legen... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Address review suggestion from @vnkozlov and @jatin-bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/9291fb31..e1af0966 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=06-07 Stats: 190 lines in 4 files changed: 107 ins; 70 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From epeter at openjdk.org Wed May 17 10:44:00 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 10:44:00 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> Message-ID: On Wed, 10 May 2023 18:17:43 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> use is_counted and is_innermost > > Looks good to me. @vnkozlov @jatin-bhateja I took the idea with `VectorNode::scalar_opcode`. It might be useful in the future, and it makes the code much simpler. Running testing again... ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1551154336 From thartmann at openjdk.org Wed May 17 10:45:48 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 10:45:48 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Tue, 16 May 2023 12:08:47 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good to me otherwise. Just wondering, can / will we put any safeguards in place once we migrated all code to `nullptr` to make sure new changes don't re-introduce `NULL`? src/hotspot/share/c1/c1_GraphBuilder.cpp line 2954: > 2952: case Bytecodes::_dreturn : method_return(dpop(), ignore_return); break; > 2953: case Bytecodes::_areturn : method_return(apop(), ignore_return); break; > 2954: case Bytecodes::_return : method_return(nullptr , ignore_return); break; Suggestion: case Bytecodes::_return : method_return(nullptr, ignore_return); break; src/hotspot/share/c1/c1_Instruction.hpp line 239: > 237: if (!(enabled) ) return false; \ > 238: class_name* _v = v->as_##class_name(); \ > 239: if (_v == nullptr ) return false; \ Suggestion: if (_v == nullptr) return false; \ src/hotspot/share/c1/c1_Instruction.hpp line 252: > 250: if (!(enabled) ) return false; \ > 251: class_name* _v = v->as_##class_name(); \ > 252: if (_v == nullptr ) return false; \ Suggestion: if (_v == nullptr) return false; \ src/hotspot/share/c1/c1_Instruction.hpp line 266: > 264: if (!(enabled) ) return false; \ > 265: class_name* _v = v->as_##class_name(); \ > 266: if (_v == nullptr ) return false; \ Suggestion: if (_v == nullptr) return false; \ ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14009#pullrequestreview-1430330221 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196283705 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196286525 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196286831 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196286969 From thartmann at openjdk.org Wed May 17 10:57:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 10:57:46 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes In-Reply-To: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: <72PMoe9zEaTFD1qd53UQPSfjuaB8pMSOoevExPZOUA0=.08a5a6a9-920a-4ccb-93dc-e5e01c194761@github.com> On Tue, 16 May 2023 15:27:26 GMT, Christian Hagedorn wrote: > This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). > > Changes include: > - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. > - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. > - Cleanup of touched code (dead code, variable renaming, code style) > - Added comments (e.g. for some special case in Loop Predication) > > For more background, have a look at the first PR: #13864 > > Thanks, > Christian Changes requested by thartmann (Reviewer). src/hotspot/share/opto/cfgnode.hpp line 460: > 458: // Loop Parse Predicate, Profiled Loop Parse Predicate (both used by Loop Predication), and Loop Limit Check Parse > 459: // Predicate (used for integer overflow checks when creating a counted loop). > 460: // More information about predicates can be found at loopPredicate.cpp. Suggestion: // More information about predicates can be found in loopPredicate.cpp. src/hotspot/share/opto/cfgnode.hpp line 462: > 460: // More information about predicates can be found at loopPredicate.cpp. > 461: class ParsePredicateNode : public IfNode { > 462: Deoptimization::DeoptReason _deopt_reason; Don't we need to override `Node::hash()` and `Node::cmp()` here to account for the `_deopt_reason` field? src/hotspot/share/opto/node.hpp line 1026: > 1024: > 1025: // Is 'n' possibly a loop entry (i.e. a Parse Predicate projection)? > 1026: static bool is_maybe_loop_entry(Node* n) { Suggestion: static bool may_be_loop_entry(Node* n) { ------------- PR Review: https://git.openjdk.org/jdk/pull/14017#pullrequestreview-1430355381 PR Review Comment: https://git.openjdk.org/jdk/pull/14017#discussion_r1196301152 PR Review Comment: https://git.openjdk.org/jdk/pull/14017#discussion_r1196303119 PR Review Comment: https://git.openjdk.org/jdk/pull/14017#discussion_r1196299602 From epeter at openjdk.org Wed May 17 10:59:45 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 10:59:45 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:18:21 GMT, Roberto Casta?eda Lozano wrote: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java line 82: > 80: for (int i = 0; i < a.length; i++) { > 81: a[i] = -i; > 82: b[i] = i; That means that `a[i] * b[i] == -i*i`, and get increasingly smaller. I think it would be better if this was a bit more random, and not biased to the maximum always being at the beginning and the minimum at the end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196310369 From epeter at openjdk.org Wed May 17 10:59:46 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 10:59:46 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 17 May 2023 10:54:43 GMT, Emanuel Peter wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java line 82: > >> 80: for (int i = 0; i < a.length; i++) { >> 81: a[i] = -i; >> 82: b[i] = i; > > That means that `a[i] * b[i] == -i*i`, and get increasingly smaller. I think it would be better if this was a bit more random, and not biased to the maximum always being at the beginning and the minimum at the end. Plus, we should try to cover the whole int range, or at least as much as possible. One solution: just pick two random ints, and then add/subtract them before min/max. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196312184 From thartmann at openjdk.org Wed May 17 11:00:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 11:00:47 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. Looks reasonable to me but I'm wondering why we even emit the klass name, if it's not needed? Another review (@dean-long ?) would be good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14024#pullrequestreview-1430375858 From fjiang at openjdk.org Wed May 17 11:09:56 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 17 May 2023 11:09:56 GMT Subject: RFR: 8308277: RISC-V: Improve vectorization of Match.sqrt() on float Message-ID: [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) added `VSqrtF` and `SqrtF` nodes to support the vectorization of Match.sqrt() on floats. For riscv port, however, the scalar version of `sqrtF` still uses the old match rule that converts Float to Double first. It can be simplified to just use `SqrtF`. The old match rule also affects the vectorization of Math.sqrt() on float. The current implementation will convert float to double with `vcvtFtoD`, then do `vsqrtD`, and finally convert the result back to float with `vcvtDtoF`. If we use the new `SqrtF` match rule, it will only use `vsqrtF` to do the conversion. Take the test (Sqrt.java) from [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) as an example, here is the output with `-XX:+PrintOptoAssembly` and `-XX:+UseRVV`: before: 19a loadV V1, [R13] # vector (rvv) 1a2 vcvtFtoD V2, V1 1ae vfsqrt.v V1, V2 #@vsqrtD 1b6 vcvtDtoF V1, V1 1c2 storeV [R14], V1 # vector (rvv) after: 1be loadV V1, [R12] # vector (rvv) 1c6 vfsqrt.v V1, V1 #@vsqrtF 1ce addi R12, R29, #144 # ptr, #@addP_reg_imm 1d2 storeV [R12], V1 # vector (rvv) ------------- Commit messages: - RISC-V: Math.sqrt() does not vectroized on floats Changes: https://git.openjdk.org/jdk/pull/14029/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14029&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308277 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14029.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14029/head:pull/14029 PR: https://git.openjdk.org/jdk/pull/14029 From thartmann at openjdk.org Wed May 17 11:15:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 11:15:46 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 17:41:32 GMT, Jasmine Karthikeyan wrote: >> This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. >> As an example, the output: >> >> *** Bundle: 1 instr, resources: D0 BR >> 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 >> >> states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. >> >> The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright years Looks reasonable to me but I'm not an expert in this code. Another review would be good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13403#pullrequestreview-1430402365 From thartmann at openjdk.org Wed May 17 11:44:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 11:44:46 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Sat, 29 Apr 2023 02:19:23 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add benchmark Great work. I'm just wondering if the extra complexity is justified for optimizing only the floating point conversions. Do you plan to use this for other optimizations? ------------- PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1430462306 From chagedorn at openjdk.org Wed May 17 12:04:36 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 May 2023 12:04:36 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: > This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). > > Changes include: > - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. > - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. > - Cleanup of touched code (dead code, variable renaming, code style) > - Added comments (e.g. for some special case in Loop Predication) > > For more background, have a look at the first PR: #13864 > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Tobias' review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14017/files - new: https://git.openjdk.org/jdk/pull/14017/files/262d6112..2b4244e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14017&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14017&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14017/head:pull/14017 PR: https://git.openjdk.org/jdk/pull/14017 From chagedorn at openjdk.org Wed May 17 12:04:39 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 May 2023 12:04:39 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: <72PMoe9zEaTFD1qd53UQPSfjuaB8pMSOoevExPZOUA0=.08a5a6a9-920a-4ccb-93dc-e5e01c194761@github.com> References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> <72PMoe9zEaTFD1qd53UQPSfjuaB8pMSOoevExPZOUA0=.08a5a6a9-920a-4ccb-93dc-e5e01c194761@github.com> Message-ID: <899gzp8WOvffyax3vQgrj9NaOmb1pFpUW615zJYt_yM=.0498e51c-899b-47fe-b23f-f5379050653c@github.com> On Wed, 17 May 2023 10:48:24 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Tobias' review > > src/hotspot/share/opto/cfgnode.hpp line 462: > >> 460: // More information about predicates can be found at loopPredicate.cpp. >> 461: class ParsePredicateNode : public IfNode { >> 462: Deoptimization::DeoptReason _deopt_reason; > > Don't we need to override `Node::hash()` and `Node::cmp()` here to account for the `_deopt_reason` field? Since it is a CFG node, the hash should always be different due to a different control input. If two `If` nodes have the same control input after an optimization, the graph is broken. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14017#discussion_r1196397089 From thartmann at openjdk.org Wed May 17 12:39:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 12:39:45 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: On Wed, 17 May 2023 12:04:36 GMT, Christian Hagedorn wrote: >> This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). >> >> Changes include: >> - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. >> - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. >> - Cleanup of touched code (dead code, variable renaming, code style) >> - Added comments (e.g. for some special case in Loop Predication) >> >> For more background, have a look at the first PR: #13864 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Tobias' review Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14017#pullrequestreview-1430579773 From thartmann at openjdk.org Wed May 17 12:39:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 12:39:47 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: <899gzp8WOvffyax3vQgrj9NaOmb1pFpUW615zJYt_yM=.0498e51c-899b-47fe-b23f-f5379050653c@github.com> References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> <72PMoe9zEaTFD1qd53UQPSfjuaB8pMSOoevExPZOUA0=.08a5a6a9-920a-4ccb-93dc-e5e01c194761@github.com> <899gzp8WOvffyax3vQgrj9NaOmb1pFpUW615zJYt_yM=.0498e51c-899b-47fe-b23f-f5379050653c@github.com> Message-ID: On Wed, 17 May 2023 12:00:09 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/cfgnode.hpp line 462: >> >>> 460: // More information about predicates can be found at loopPredicate.cpp. >>> 461: class ParsePredicateNode : public IfNode { >>> 462: Deoptimization::DeoptReason _deopt_reason; >> >> Don't we need to override `Node::hash()` and `Node::cmp()` here to account for the `_deopt_reason` field? > > Since it is a CFG node, the hash should always be different due to a different control input. If two `If` nodes have the same control input after an optimization, the graph is broken. Right, I missed that. Looks good then! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14017#discussion_r1196446214 From asotona at openjdk.org Wed May 17 12:48:53 2023 From: asotona at openjdk.org (Adam Sotona) Date: Wed, 17 May 2023 12:48:53 GMT Subject: Integrated: 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete In-Reply-To: References: Message-ID: On Mon, 15 May 2023 08:38:54 GMT, Adam Sotona wrote: > Package `jdk.internal.classfile.java.lang.constant` containing `ModuleDesc` and `PackageDesc` become obsolete after [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729). > All references to `jdk.internal.classfile.java.lang.constant.ModuleDesc` and `jdk.internal.classfile.java.lang.constant.PackageDesc` across all JDK sources, tests and JMH benchmarks are replaced with `java.lang.constant.ModuleDesc` and `java.lang.constant.PackageDesc`. > `jdk.internal.classfile.java.lang.constant` package export hooks are removed from java.base module-info, make files and test headers. > Content of `jdk.internal.classfile.java.lang.constant` package and related tests under `jdk.classfile` are deleted. > Method references renamed in [JDK-8306729](https://bugs.openjdk.org/browse/JDK-8306729) are fixed: > - `PackageDesc::packageName` to `PackageDesc::name` > - `PackageDesc::packageInternalName` to `PackageDesc::internalName` > - `ModuleDesc::moduleName` to `ModuleDesc::name`. > > Please review this pull request. > > Thanks, > Adam This pull request has now been integrated. Changeset: 5763be72 Author: Adam Sotona URL: https://git.openjdk.org/jdk/commit/5763be726700be322de3bbaf345d80e11936b472 Stats: 503 lines in 46 files changed: 0 ins; 446 del; 57 mod 8307326: Package jdk.internal.classfile.java.lang.constant become obsolete Reviewed-by: erikj, liach ------------- PR: https://git.openjdk.org/jdk/pull/13979 From chagedorn at openjdk.org Wed May 17 12:49:45 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 May 2023 12:49:45 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: On Wed, 17 May 2023 12:04:36 GMT, Christian Hagedorn wrote: >> This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). >> >> Changes include: >> - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. >> - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. >> - Cleanup of touched code (dead code, variable renaming, code style) >> - Added comments (e.g. for some special case in Loop Predication) >> >> For more background, have a look at the first PR: #13864 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Tobias' review Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14017#issuecomment-1551329381 From fyang at openjdk.org Wed May 17 12:55:51 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 May 2023 12:55:51 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v8] In-Reply-To: References: Message-ID: <4x9xG70rreKenuWX5xxv2j7WlkoI4b-GgcB8l0AGnJA=.91c78fb7-ba28-49c7-ac20-ed1501f9fb9f@github.com> On Wed, 17 May 2023 08:59:55 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Add some comments > - Merge remote-tracking branch 'upstream/master' into JDK-8307609 > - Adjust some params order in c2_MacroAssembler_riscv > - Adjust some function in c2_MacroAssembler_riscv > - Remove trailing whitespace > - Fix minmax_fp_masked_v > - Change some iRegI to iRegIorL2I and small refactoring of minmax_fp_masked_v > - Remove debug warning > - Merge master and resolve conflict > - Optimize call point of vfclass and adjust the parameters of c2 instruct > - ... and 8 more: https://git.openjdk.org/jdk/compare/2f1c6548...1fc880e3 src/hotspot/cpu/riscv/riscv_v.ad line 4263: > 4261: __ vsetvli_helper(bt, Matcher::vector_length(this, $src)); > 4262: __ vmsbf_m(as_VectorRegister($tmp$$reg), as_VectorRegister($src$$reg), Assembler::v0_t); > 4263: __ vcpop_m($dst$$Register, as_VectorRegister($tmp$$reg)); Shouldn't this be: `__ vcpop_m($dst$$Register, as_VectorRegister($tmp$$reg), Assembler::v0_t);`? And do we missed `VectorMaskLastTrue`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1196468562 From fyang at openjdk.org Wed May 17 12:58:42 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 May 2023 12:58:42 GMT Subject: RFR: 8308277: RISC-V: Improve vectorization of Match.sqrt() on floats In-Reply-To: References: Message-ID: On Wed, 17 May 2023 11:00:07 GMT, Feilong Jiang wrote: > [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) added `VSqrtF` and `SqrtF` nodes to support the vectorization of Match.sqrt() on floats. For riscv port, however, the scalar version of `sqrtF` still uses the old match rule that converts Float to Double first. It can be simplified to just use `SqrtF`. > > The old match rule also affects the vectorization of Math.sqrt() on float. The current implementation will convert float to double with `vcvtFtoD`, then do `vsqrtD`, and finally convert the result back to float with `vcvtDtoF`. If we use the new `SqrtF` match rule, it will only use `vsqrtF` to do the conversion. Take the test (Sqrt.java) from [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) as an example, here is the output with `-XX:+PrintOptoAssembly` and `-XX:+UseRVV`: > > before: > > > 19a loadV V1, [R13] # vector (rvv) > 1a2 vcvtFtoD V2, V1 > 1ae vfsqrt.v V1, V2 #@vsqrtD > 1b6 vcvtDtoF V1, V1 > 1c2 storeV [R14], V1 # vector (rvv) > > > after: > > 1be loadV V1, [R12] # vector (rvv) > 1c6 vfsqrt.v V1, V1 #@vsqrtF > 1ce addi R12, R29, #144 # ptr, #@addP_reg_imm > 1d2 storeV [R12], V1 # vector (rvv) > > > Testing: > - [ ] tier1 tests on Unmatched board without `-XX:+UseRVV` (release build) > - [ ] hotspot_tier1/jdk_tier1 on QEMU with `-XX:+UseRVV` (release build) Looks reasonable. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14029#pullrequestreview-1430621322 From duke at openjdk.org Wed May 17 13:06:51 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 17 May 2023 13:06:51 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: <9lAPWldwKaikAGlSOZgmHLrINPteufkrOtOeFnoecYY=.ce42defb-70fd-4fc9-ba27-a53b07dd1133@github.com> On Wed, 17 May 2023 10:57:29 GMT, Tobias Hartmann wrote: > why we even emit the klass name, if it's not needed? The klass name seems to be used when parsing a multi-dimensional array. The code that emits `staticfield` command for the arrays does not differentiate between a single and multi-dimensional array and just emits the klass name. I believe even for multi-dimensional array we don't need the klass name; the field signature can be used instead. But I guess to be consistent with how the command is generated for other types, we can continue with this approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1551355377 From epeter at openjdk.org Wed May 17 13:12:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 13:12:34 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legen... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: added missing float/double cases to VectorNode::scalar_opcode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13056/files - new: https://git.openjdk.org/jdk/pull/13056/files/e1af0966..e3d99c95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13056&range=07-08 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13056.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13056/head:pull/13056 PR: https://git.openjdk.org/jdk/pull/13056 From epeter at openjdk.org Wed May 17 13:23:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 13:23:47 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:18:21 GMT, Roberto Casta?eda Lozano wrote: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. src/hotspot/share/opto/addnode.cpp line 1103: > 1101: } > 1102: return n; > 1103: } This was confusing to read. Why not make it a `as_add_constant`, and explicitly always set the `con`: case AddI(x, top): return case AddI(x, int_con): return default: return That would make it easier to argue what the value of `con` is after the call - it does certainly not depend on what it was before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196508793 From epeter at openjdk.org Wed May 17 13:23:48 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 13:23:48 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:19:19 GMT, Emanuel Peter wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > src/hotspot/share/opto/addnode.cpp line 1103: > >> 1101: } >> 1102: return n; >> 1103: } > > This was confusing to read. > > Why not make it a `as_add_constant`, and explicitly always set the `con`: > > case AddI(x, top): return > case AddI(x, int_con): return > default: return > > That would make it easier to argue what the value of `con` is after the call - it does certainly not depend on what it was before. Add a comment that explains that for `top` we will bail out - for that we can check `nullptr`. In the other cases, we know that `n == AddI(x, int_con)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196514850 From epeter at openjdk.org Wed May 17 13:27:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 13:27:49 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:21:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/addnode.cpp line 1103: >> >>> 1101: } >>> 1102: return n; >>> 1103: } >> >> This was confusing to read. >> >> Why not make it a `as_add_constant`, and explicitly always set the `con`: >> >> case AddI(x, top): return >> case AddI(x, int_con): return >> default: return >> >> That would make it easier to argue what the value of `con` is after the call - it does certainly not depend on what it was before. > > Add a comment that explains that for `top` we will bail out - for that we can check `nullptr`. > In the other cases, we know that `n == AddI(x, int_con)`. You could also consider having a custom "pair" class, so that the "second-output" is more explicit. But maybe just more useful / explicit variable naming would do the trick. Maybe like `add_var` and `add_con`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196519924 From jsjolen at openjdk.org Wed May 17 13:41:02 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 17 May 2023 13:41:02 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Wed, 17 May 2023 10:42:56 GMT, Tobias Hartmann wrote: > Looks good to me otherwise. > > Just wondering, can / will we put any safeguards in place once we migrated all code to `nullptr` to make sure new changes don't re-introduce `NULL`? I hope it's possible to add it to jcheck, but I haven't asked around. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14009#issuecomment-1551414936 From chagedorn at openjdk.org Wed May 17 13:41:01 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 May 2023 13:41:01 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Tue, 16 May 2023 12:08:47 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! I only have some minor code style things. Otherwise, looks good to me, too! src/hotspot/share/c1/c1_GraphBuilder.cpp line 3421: > 3419: > 3420: # ifdef ASSERT > 3421: //All blocks reachable from start_block have _end isn't null Suggestion: // For all blocks reachable from start_block: _end must be non-null src/hotspot/share/c1/c1_Instruction.cpp line 103: > 101: state_before()->values_do(f); > 102: } > 103: if (exception_state() != nullptr){ Suggestion: if (exception_state() != nullptr) { src/hotspot/share/c1/c1_LIR.cpp line 513: > 511: if (opBranch->_opr2->is_valid()) do_input(opBranch->_opr2); > 512: > 513: if (opBranch->_info != nullptr) do_info(opBranch->_info); Maybe keep previous alignment: Suggestion: if (opBranch->_info != nullptr) do_info(opBranch->_info); src/hotspot/share/c1/c1_LIR.cpp line 515: > 513: if (opBranch->_info != nullptr) do_info(opBranch->_info); > 514: assert(opBranch->_result->is_illegal(), "not used"); > 515: if (opBranch->_stub != nullptr) opBranch->stub()->visit(this); Maybe keep previous alignment: Suggestion: if (opBranch->_stub != nullptr) opBranch->stub()->visit(this); src/hotspot/share/c1/c1_LIR.cpp line 2074: > 2072: void LIR_OpProfileType::print_instr(outputStream* out) const { > 2073: out->print("exact = "); > 2074: if (exact_klass() == nullptr) { Suggestion: if (exact_klass() == nullptr) { src/hotspot/share/c1/c1_LinearScan.cpp line 2983: > 2981: for (int j = 0; j < num_inst; j++) { > 2982: LIR_Op* op = instructions->at(j); > 2983: if (op == nullptr) { // this can happen when spill-moves are removed in eliminate_spill_moves Suggestion: if (op == nullptr) { // this can happen when spill-moves are removed in eliminate_spill_moves src/hotspot/share/c1/c1_Optimizer.cpp line 319: > 317: } else { > 318: Constant* x_const = x->as_Constant(); > 319: if (x_const != nullptr) { // x and y are constants Suggestion: if (x_const != nullptr) { // x and y are constants src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 137: > 135: bool has_lower = true; > 136: assert(phi, "Phi must not be null"); > 137: Bound *bound = nullptr; Suggestion: Bound* bound = nullptr; src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 368: > 366: assert(loop_header, "Loop header must not be null!"); > 367: if (!instruction) return true; > 368: for (BlockBegin *d = loop_header->dominator(); d != nullptr; d = d->dominator()) { Suggestion: for (BlockBegin* d = loop_header->dominator(); d != nullptr; d = d->dominator()) { src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 386: > 384: assert(_bounds.at(v->id()), "Now Stack must exist"); > 385: } > 386: Bound *top = nullptr; Suggestion: Bound* top = nullptr; src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 954: > 952: cur_value = nullptr; > 953: } > 954: Bound *new_index_bound = new Bound(0, nullptr, cur_constant, cur_value); Suggestion: Bound* new_index_bound = new Bound(0, nullptr, cur_constant, cur_value); src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 1102: > 1100: for (int i=0; inumber_of_sux(); i++) { > 1101: BlockBegin *sux = block->sux_at(i); > 1102: BlockBegin *pred = nullptr; Suggestion: BlockBegin* pred = nullptr; src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 1224: > 1222: > 1223: // Try to reach Block end beginning in Block start and not using Block dont_use > 1224: bool RangeCheckEliminator::Verification::can_reach(BlockBegin *start, BlockBegin *end, BlockBegin *dont_use /* = nullptr */) { Suggestion: bool RangeCheckEliminator::Verification::can_reach(BlockBegin* start, BlockBegin* end, BlockBegin* dont_use /* = nullptr */) { src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 1522: > 1520: void RangeCheckEliminator::Bound::add_assertion(Instruction *instruction, Instruction *position, int i, Value instr, Instruction::Condition cond) { > 1521: Instruction *result = position; > 1522: Instruction *compare_with = nullptr; Suggestion: Instruction* compare_with = nullptr; src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 1533: > 1531: result = instruction_before; > 1532: // Load constant only if needed > 1533: Constant *constant = nullptr; Suggestion: Constant* constant = nullptr; src/hotspot/share/c1/c1_RangeCheckElimination.cpp line 1557: > 1555: // Add operation only if necessary > 1556: if (constant) { > 1557: ArithmeticOp *ao = new ArithmeticOp(Bytecodes::_iadd, constant, op, nullptr); Suggestion: ArithmeticOp* ao = new ArithmeticOp(Bytecodes::_iadd, constant, op, nullptr); src/hotspot/share/c1/c1_Runtime1.cpp line 960: > 958: Klass* load_klass = nullptr; // klass needed by load_klass_patching code > 959: Handle mirror(current, nullptr); // oop needed by load_mirror_patching code > 960: Handle appendix(current, nullptr); // oop needed by appendix_patching code Strange alignment Suggestion: Handle mirror(current, nullptr); // oop needed by load_mirror_patching code Handle appendix(current, nullptr); // oop needed by appendix_patching code src/hotspot/share/c1/c1_ValueMap.hpp line 246: > 244: ValueMap* current_map() { return _current_map; } > 245: ValueMap* value_map_of(BlockBegin* block) { return _value_maps.at(block->linear_scan_number()); } > 246: void set_value_map_of(BlockBegin* block, ValueMap* map) { assert(value_map_of(block) == nullptr, ""); _value_maps.at_put(block->linear_scan_number(), map); } Suggestion: void set_value_map_of(BlockBegin* block, ValueMap* map) { assert(value_map_of(block) == nullptr, ""); _value_maps.at_put(block->linear_scan_number(), map); } ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14009#pullrequestreview-1430621110 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196473301 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196476625 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196529402 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196529703 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196492675 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196507880 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196515212 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196516467 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196516995 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196517209 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196517785 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196518124 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196518753 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196519435 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196519874 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196520284 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196522421 PR Review Comment: https://git.openjdk.org/jdk/pull/14009#discussion_r1196524998 From thartmann at openjdk.org Wed May 17 13:49:46 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 13:49:46 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. Okay, thanks for the details. That sounds reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1551430497 From thartmann at openjdk.org Wed May 17 13:50:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 13:50:49 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Wed, 17 May 2023 13:37:09 GMT, Johan Sj?len wrote: > I hope it's possible to add it to jcheck, but I haven't asked around. That would be good, otherwise new `NULLs` will slip through reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14009#issuecomment-1551433405 From epeter at openjdk.org Wed May 17 14:18:51 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 14:18:51 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:18:21 GMT, Roberto Casta?eda Lozano wrote: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. src/hotspot/share/opto/addnode.cpp line 1132: > 1130: // four possible permutations given by opcode's commutativity) into > 1131: // opcode(x + opcode(x_off, y_off), z), where opcode is either MinI or MaxI, > 1132: // if x == y and the additions can't overflow. Ok, effectively we have 5, not just 4 cases here: opcode(x + x_off, opcode(y + y_off, z)) opcode(x + x_off, opcode(z, y + y_off)) opcode(opcode(y + y_off, z), x + x_off) opcode(opcode(z, y + y_off), x + x_off) opcode(x + x_off, y + y_off) I find the nested for-loop quite confusing. Maybe packing the inner stuff into a separate function could work? // Check for opcode(x + x_con, y + y_con), no z if (in(1)->Opcode() == Op_AddI && in(2)->Opcode() == Op_AddI) { Node* ret = try_fold(opcode, in(1), in(2), nullptr); if (ret != nullptr) { return ret; } } // Check for these 4 cases, equivalent to opcode3(addx, addy, z) // opcode(x + x_con, opcode(y + y_con, z)) // opcode(x + x_con, opcode(z, y + y_con)) // opcode(opcode(y + y_con, z), x + x_con) // opcode(opcode(z, y + y_con), x + x_con) for (uint i = 1; i < 2; i++) { Node* addx = in(i); Node* other = in(i == 1 ? 2 : 1); // or just "2-i" if (addx->Opcode() != Op_AddI || other->Opcode() != opcode) { continue; } for (uint j = 1; i < 2; j++) { Node* addy = other->in(j); Node* z = other->in(j == 1 ? 2 : 1); if (addy->Opcode() != Op_AddI) { continue; } // We have opcode3(addx, addy, z) Node* ret = try_fold(opcode, addx, addy, z); if (ret != nullptr) { return ret; } } } Where we have Node* try_fold(int opcode, Node* addx, Node* addy, Node* z = nullptr) { jint addx_con = 0; jint addy_con = 0; Node* addx_var = as_add_constant(addx, &addx_con); Node* addy_var = as_add_constant(addy, &addy_con); if (addx_var == nullptr || addy_var == nullptr) { // found a top return nullptr; } // could even check addx_var != addy_var, then we don't have to do that inside... Node* folded = extract_addition(phase, addx_var, addx_con, addy_var, addy_con, opcode); if (z != nullptr) { folded = opcode(folded, z); } return folded; } Maybe this does a few more calls to `as_add_constant` than strictly necessary, but it is a bit easier to understand, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1196592919 From qamai at openjdk.org Wed May 17 14:23:48 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 17 May 2023 14:23:48 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: <191LKe9vO1H28bQAJ4BiYKNuFkEE1gZIiAub-wlrEdo=.c2fe3545-041b-4b0c-8980-1aa55b55485a@github.com> On Wed, 17 May 2023 11:42:02 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add benchmark > > Great work. I'm just wondering if the extra complexity is justified for optimizing only the floating point conversions. Do you plan to use this for other optimizations? @TobiHartmann Thanks for taking a look, I think this can be used for the vectorized version of these nodes, as well as the max, min nodes for floating point numbers. I also see compact header uses out-of-line code to slow path `LoadNKlass`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1551489783 From epeter at openjdk.org Wed May 17 14:33:44 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 May 2023 14:33:44 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:18:21 GMT, Roberto Casta?eda Lozano wrote: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Nice work with the tests, it's good to have some specific IR tests there! I hope we can also generalize this for `MaxL/MinL` (once we do this [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513)) - I think that is now also going to be easier with your refactoring towards `MaxNode::IdealI`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1551512040 From tholenstein at openjdk.org Wed May 17 14:43:48 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 May 2023 14:43:48 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly Message-ID: At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. ### Old CompileOnly format - matching a **method name** with **class name** and **package name**: `-XX:CompileOnly=package/path/Class.method` `-XX:CompileOnly=package/path/Class::method` `-XX:CompileOnly=package.path.Class::method` BUT NOT `-XX:CompileOnly=package.path.Class.method` - just matching a **single method name**: `-XX:CompileOnly=.hashCode` `-XX:CompileOnly=::hashCode` BUT NOT `-XX:CompileOnly=hashCode` - Matching **all method names** in a **class name** with **package name** `-XX:CompileOnly=java/lang/String` BUT NOT `-XX:CompileOnly=java/lang/String.` BUT NOT `-XX:CompileOnly=java.lang.String` BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) BUT NOT `-XX:CompileOnly=String` BUT NOT `-XX:CompileOnly=String.` BUT NOT `-XX:CompileOnly=String::` - Matching **all method names** in a **class name** with **NO package name** `-XX:CompileOnly=String` BUT NOT `-XX:CompileOnly=String.` BUT NOT `-XX:CompileOnly=String::` - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command ### CompileCommand=compileonly format `CompileCommand` allows two different forms for paths: - `package/path/Class.method` - `package.path.Class::method` In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. Valid forms: `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` `-XX:CompileCommand=compileonly,java.lang.String::*` `-XX:CompileCommand=compileonly,*::hashCode` `-XX:CompileCommand=compileonly,*ng.String::hashC*` `-XX:CompileCommand=compileonly,*String::hash*` Invalid forms (Error: Embedded * not allowed): `-XX:CompileCommand=compileonly,java.*.String::has*Code` ### Use CompileCommand syntax for CompileOnly At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as input. New syntax for `CompileOnly`: - `-XX:CompileOnly=pattern1,pattern2` is an **alias** for: - `-XX:CompileCommand=compileonly,pattern1 -XX:CompileCommand=compileonly,pattern2` ### Handling invalid syntax Before `CompileOnly` just ignored invalid syntax. `CompileCommand` at least prints an error massage for invalid patterns: CompileOnly: An error occurred during parsing Error: Could not parse method pattern Line: 'pattern' Since `CompileOnly` now maps to `CompileCommand` is also prints the error message for invalid inputs. In the future `CompileCommand` (and `CompileOnly`) parsing errors could exit the VM https://bugs.openjdk.org/browse/JDK-8282797 ### Changed test cases In the following we mean with `-XX:CompileOnly=oldPattern` -> `-XX:CompileOnly=newPattern` that we changed the tests from the `oldPattern` to the `newPattern`: - `Class.method` -> `Class::method` AND `package/path/Class.method` -> `package.path.Class::method` Prefer the `package.path.Class::method` format because it is used by `-XX:+PrintCompilation` - `Class` -> `Class::*` The `CompileCommand` format requires the `::` to define if `Class` is a class name or a method name. - `::method` -> `*::method` The `CompileCommand` format requires the `*`(wildcard) if no class name is given. - `package/path/Class` -> `package.path.Class::*` The `CompileCommand` format requires the `*`(wildcard) if no method name is given. Prefer the `package.path.Class::method` format because it is used by `-XX:+PrintCompilation` - `::get,::get1` -> `*Class::get*` The `CompileCommand` format requires the `*`(wildcard) if no method name is given. Therefore `*::get` would work as well, but this matches many other methods as well like `java.util.HashMap::get`. `*Class::get,*Class::get1` matches the wanted class name - `*Class::get*` is just the short form. - `package/path/Class::method` -> `package.path.Class::method` The old format of `CompileOnly` combining `/` with `::`. The `CompileCommand` is either `package/path/Class.method` or `package.path.Class::method`. - _BUG_: `package.path.Class::` -> `package.path.Class::*` There was a bug in the old format of `CompileOnly` : when the pattern ended with `::` the `CompileOnly` was just ignored and all methods compiled. The `CompileCommand` format requires the `*`(wildcard) if no method name is given. ------------- Commit messages: - Updated copyright - fix CompileOnly tests - JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly Changes: https://git.openjdk.org/jdk/pull/13802/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8027711 Stats: 376 lines in 71 files changed: 29 ins; 73 del; 274 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From tholenstein at openjdk.org Wed May 17 14:43:55 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 May 2023 14:43:55 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:36:22 GMT, Tobias Holenstein wrote: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileC... The CSR can be found here: https://bugs.openjdk.org/browse/JDK-8308287 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13802#issuecomment-1551524277 From xuelei at openjdk.org Wed May 17 15:04:50 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 17 May 2023 15:04:50 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v4] In-Reply-To: References: Message-ID: > Hi, > > This is a redo of JDK-8307855, where issues were found after integration. > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: size_t to int ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13995/files - new: https://git.openjdk.org/jdk/pull/13995/files/dd6ddbc4..244278a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13995/head:pull/13995 PR: https://git.openjdk.org/jdk/pull/13995 From xuelei at openjdk.org Wed May 17 15:04:57 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Wed, 17 May 2023 15:04:57 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: References: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> Message-ID: <-qvQkvH8SylX3unheSpOdsjz-mhrnyvqgxtNLKiOmGg=.41f065ea-f856-4436-88d3-8c7b8b01726d@github.com> On Wed, 17 May 2023 08:48:37 GMT, Kim Barrett wrote: >> Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: >> >> check returned value of snprintf > > src/utils/hsdis/binutils/hsdis-binutils.c line 246: > >> 244: >> 245: size_t used_size = snprintf(buf, bufsize, "%s", close); >> 246: if ((used_size < 0) || (used_size >= bufsize)) { > > (used_size < 0) is tautologically false, since used_size is a size_t, so unsigned. I'm somewhat surprised > this doesn't trigger a warning from some compiler. Updated to use `int` to replace `size_t.`. Thank you for the catching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1196646900 From kvn at openjdk.org Wed May 17 15:38:47 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 15:38:47 GMT Subject: RFR: 8260943: C2 SuperWord: Revisit vectorization optimization added by 8076284 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 12:15:08 GMT, Emanuel Peter wrote: > I suggest we remove this dead `_do_vector_loop_experimental` code. > @vnkozlov disabled it 2.5 years ago [JDK-8251994](https://bugs.openjdk.org/browse/JDK-8251994) https://github.com/openjdk/jdk/commit/a7fa1b70f212566e95068936841b6e9702eccaed. > His [analysis](https://bugs.openjdk.org/browse/JDK-8251994?focusedCommentId=14364507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14364507). > His conclusion back then: > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. > Even if pack_parallel() method is able created packs they are all removed by filter_packs() method. > And additionally the above cases are vectorized without hoisting loads and pack_parallel - I verified it. > That code is useless now and I will put it under flag to not run it. It needs more work to be useful. > I reluctant to remove the code because may be in a future we will have time to invest into it. > > > He disabled it by renaming many occurances of `_do_vector_loop` with `_do_vector_loop_experimental = false`. > > I don't believe anybody wants to fix this code any time soon. Current `SuperWord` can do almost everything that this code promises. If we really want to have parallel iterations for the Stream API, then we should do this in the dependency graph directly, by removing the inter-iteration edges. > > If you care, you can read my arguments below. > I am also using this opportunity to think back: what were the motivations for this code. > And I am thinking forward: what could we do to improve our `SuperWord` algorithm? > > **Testing** > > Up to tier5 and stress testing, with and without `-XX:CompileCommand=option,path.to.Class::method,Vectorize`. **Running...** > > ----------- > > **Background** > > "Seeding" is crucial: > The SPL algorithm (Super Word Parallelism) relies on good detection of parallel instruction that can be packed. This is usually done with "seeding": one finds loads or stores that can be packed - preferrably they are adjacent so that we can use a vectorized load or store (alternatively gather and scatter can be used for strided or random accesses). After this "seeding", the vectorization is extended to non-seed operations (usually greedily). > > In `C2`'s `SuperWord` algorithm, we have two approaches for this "seeding": > 1. Normally, we simply try to find adjacent loads and stores for the same `base` (array). Second, we require load/store packs to be aligned to each other in the same memory slice... Nice analysis. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13930#pullrequestreview-1431017893 From kvn at openjdk.org Wed May 17 15:45:44 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 15:45:44 GMT Subject: RFR: 8260943: C2 SuperWord: Revisit vectorization optimization added by 8076284 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 12:15:08 GMT, Emanuel Peter wrote: > I suggest we remove this dead `_do_vector_loop_experimental` code. > @vnkozlov disabled it 2.5 years ago [JDK-8251994](https://bugs.openjdk.org/browse/JDK-8251994) https://github.com/openjdk/jdk/commit/a7fa1b70f212566e95068936841b6e9702eccaed. > His [analysis](https://bugs.openjdk.org/browse/JDK-8251994?focusedCommentId=14364507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14364507). > His conclusion back then: > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. > Even if pack_parallel() method is able created packs they are all removed by filter_packs() method. > And additionally the above cases are vectorized without hoisting loads and pack_parallel - I verified it. > That code is useless now and I will put it under flag to not run it. It needs more work to be useful. > I reluctant to remove the code because may be in a future we will have time to invest into it. > > > He disabled it by renaming many occurances of `_do_vector_loop` with `_do_vector_loop_experimental = false`. > > I don't believe anybody wants to fix this code any time soon. Current `SuperWord` can do almost everything that this code promises. If we really want to have parallel iterations for the Stream API, then we should do this in the dependency graph directly, by removing the inter-iteration edges. > > If you care, you can read my arguments below. > I am also using this opportunity to think back: what were the motivations for this code. > And I am thinking forward: what could we do to improve our `SuperWord` algorithm? > > **Testing** > > Up to tier5 and stress testing, with and without `-XX:CompileCommand=option,path.to.Class::method,Vectorize`. **Running...** > > ----------- > > **Background** > > "Seeding" is crucial: > The SPL algorithm (Super Word Parallelism) relies on good detection of parallel instruction that can be packed. This is usually done with "seeding": one finds loads or stores that can be packed - preferrably they are adjacent so that we can use a vectorized load or store (alternatively gather and scatter can be used for strided or random accesses). After this "seeding", the vectorization is extended to non-seed operations (usually greedily). > > In `C2`'s `SuperWord` algorithm, we have two approaches for this "seeding": > 1. Normally, we simply try to find adjacent loads and stores for the same `base` (array). Second, we require load/store packs to be aligned to each other in the same memory slice... Very nice analysis and I agree with you to introduce cost-model. As you correctly pointed we do not unroll enough sometimes to get full advantage of wider vectors. And thank you for verifying that we do vectorize `Stream.forEach` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13930#issuecomment-1551635002 From kvn at openjdk.org Wed May 17 15:50:33 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 15:50:33 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 10:19:53 GMT, Emanuel Peter wrote: >> Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). >> >> There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. >> >> **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: >> We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: >> https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 >> >> So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - merge from master after Assertion Predicate renaming > - 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13980#pullrequestreview-1431039166 From kvn at openjdk.org Wed May 17 15:55:34 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 15:55:34 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:12:34 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > added missing float/double cases to VectorNode::scalar_opcode This looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13056#pullrequestreview-1431049694 From lmesnik at openjdk.org Wed May 17 15:59:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 17 May 2023 15:59:43 GMT Subject: RFR: 8308292: Problemlist vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java Message-ID: Trivial problemlisting of failing test. ------------- Commit messages: - 8308292: Problemlist vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java Changes: https://git.openjdk.org/jdk/pull/14035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308292 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14035/head:pull/14035 PR: https://git.openjdk.org/jdk/pull/14035 From kvn at openjdk.org Wed May 17 16:05:31 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:05:31 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: On Wed, 17 May 2023 12:04:36 GMT, Christian Hagedorn wrote: >> This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). >> >> Changes include: >> - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. >> - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. >> - Cleanup of touched code (dead code, variable renaming, code style) >> - Added comments (e.g. for some special case in Loop Predication) >> >> For more background, have a look at the first PR: #13864 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Tobias' review Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14017#pullrequestreview-1431068081 From kvn at openjdk.org Wed May 17 16:06:36 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:06:36 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:38:52 GMT, Emanuel Peter wrote: >> This is the second step in the `VerifyLoopOptimizations` revival. >> >> Last step: >> [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure >> See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 >> >> Bug fixing for this step: >> [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate >> (https://github.com/openjdk/jdk/pull/13980) >> >> Next step: >> [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop >> >> I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add TestVerifyLoopOptimizations.java Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13951#pullrequestreview-1431075523 From kvn at openjdk.org Wed May 17 16:29:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:29:00 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v2] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 17:41:32 GMT, Jasmine Karthikeyan wrote: >> This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. >> As an example, the output: >> >> *** Bundle: 1 instr, resources: D0 BR >> 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 >> >> states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. >> >> The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright years I agree with this fix. Only one comment about copyright line update. src/hotspot/share/adlc/formsopt.cpp line 2: > 1: /* > 2: * Copyright (c) 1998, 2023, Oracle and/or its affiliates. All rights reserved. You mistakenly updated .cpp file instead of formsopt.hpp ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13403#pullrequestreview-1431094978 PR Review Comment: https://git.openjdk.org/jdk/pull/13403#discussion_r1196765974 From kvn at openjdk.org Wed May 17 16:32:59 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:32:59 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: <22-5ad0kCNHI1nnKIANsfeLamK_3dXwpwalDhPsXJ_I=.70d0e6b7-3bdd-4233-bffe-da73e4858c57@github.com> On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14024#pullrequestreview-1431128315 From kvn at openjdk.org Wed May 17 16:45:37 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:45:37 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Sat, 29 Apr 2023 02:19:23 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add benchmark Is it possible to do this in `c2_MacroAssembler_x86` instead (as for `verified_entry`)? We are trying to move complex coding from .ad files to macroassembler. ------------- PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1431148183 From kvn at openjdk.org Wed May 17 16:54:49 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 16:54:49 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly In-Reply-To: References: Message-ID: <1iF_2pFExGpBX1dxqyM6TiQecD8o1qSJWeIv4HVG0vE=.930245c2-8033-402e-a0e8-0a7e3ffaff6c@github.com> On Thu, 4 May 2023 13:36:22 GMT, Tobias Holenstein wrote: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileC... Thank you for fixing this finally! FTR. We planned to do this for long time. Main motivations: unify syntax and catch invalid commands. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1431162594 From duke at openjdk.org Wed May 17 17:00:54 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 17 May 2023 17:00:54 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:46:31 GMT, Tobias Hartmann wrote: >> This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. > > Okay, thanks for the details. That sounds reasonable to me. @TobiHartmann @vnkozlov thanks for reviewing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1551760036 From sspitsyn at openjdk.org Wed May 17 17:17:55 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 17:17:55 GMT Subject: RFR: 8308292: Problemlist vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java In-Reply-To: References: Message-ID: On Wed, 17 May 2023 15:50:09 GMT, Leonid Mesnik wrote: > Trivial problemlisting of failing test. Looks good and trivial. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14035#pullrequestreview-1431200014 From lmesnik at openjdk.org Wed May 17 17:27:57 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 17 May 2023 17:27:57 GMT Subject: Integrated: 8308292: Problemlist vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java In-Reply-To: References: Message-ID: On Wed, 17 May 2023 15:50:09 GMT, Leonid Mesnik wrote: > Trivial problemlisting of failing test. This pull request has now been integrated. Changeset: 8bedf2ef Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/8bedf2efd7671834b3f7ff42bc33008821545d9f Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8308292: Problemlist vmTestbase/nsk/jvmti/AttachOnDemand/attach020/TestDescription.java Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/14035 From never at openjdk.org Wed May 17 17:35:59 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 17 May 2023 17:35:59 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Wed, 10 May 2023 14:00:51 GMT, Doug Simon wrote: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13905#pullrequestreview-1431230593 From qamai at openjdk.org Wed May 17 18:28:51 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 17 May 2023 18:28:51 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Wed, 17 May 2023 16:43:06 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add benchmark > > Is it possible to do this in `c2_MacroAssembler_x86` instead (as for `verified_entry`)? > We are trying to move complex coding from .ad files to macroassembler. @vnkozlov Yes we can explicitly define a stub without relying on code generation, it may be more preferable since it avoids adding complexity to adlc generation. The only downside is that there is some boilerplate for each usage but I think the boilerplate is not too terrible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1551869074 From jbhateja at openjdk.org Wed May 17 18:56:00 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 17 May 2023 18:56:00 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:12:34 GMT, Emanuel Peter wrote: >> https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 >> >> I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. >> >> The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. >> >> **Performance results** >> I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. >> >> I disabled `turbo-boost`. >> Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. >> Full `avx512` support, including `avx512dq` required for `MulReductionVL`. >> >> >> operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | >> --------------------------------------------------------------- >> int add 2063 2085 660 530 415 283 | | >> int mul 2272 2257 1189 733 908 439 | | >> int min 2527 2520 2516 2579 2585 2542 | 1 | >> int max 2548 2525 2551 2516 2515 2517 | 1 | >> int and 2410 2414 602 480 353 263 | | >> int or 2149 2151 597 498 354 262 | | >> int xor 2059 2062 605 476 364 263 | | >> long add 1776 1790 2000 1000 1683 591 | 2 | >> long mul 2135 2199 2185 2001 2176 1307 | 2 | >> long min 1439 1424 1421 1420 1430 1427 | 3 | >> long max 2299 2287 2303 2305 1433 1425 | 3 | >> long and 1657 1667 2015 1003 1679 568 | 4 | >> long or 1776 1783 2032 1009 1680 569 | 4 | >> long xor 1834 1783 2012 1024 1679 570 | 4 | >> float add 2779 2644 2633 2648 2632 2639 | 5 | >> float mul 2779 2871 2810 2776 2732 2791 | 5 | >> float min 2294 2620 1725 1286 872 672 | | >> float max 2371 2519 1697 1265 841 468 | | >> double add 2634 2636 2635 2650 2635 2648 | 5 | >> double mul 3053 2955 2881 3030 2979 2927 | 5 | >> double min 2364 2400 2439 2399 2486 2398 | 6 | >> double max 2488 2459 2501 ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > added missing float/double cases to VectorNode::scalar_opcode src/hotspot/share/opto/loopopts.cpp line 4192: > 4190: > 4191: // Convert opcode from vector-reduction -> scalar -> normal-vector-op > 4192: const int sopc = VectorNode::scalar_opcode(last_ur->Opcode(), bt); Other changes looks good to me, can you rename _VectorNode::scalar_opcode_ to _ReductionNode::scalar_opcode_ , also move out vector opcode cases into a separate vector-to-scalar mapping routine if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1196935662 From thartmann at openjdk.org Wed May 17 19:11:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 May 2023 19:11:53 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:36:22 GMT, Tobias Holenstein wrote: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileC... Thanks for taking care of this tedious changes, Toby. Looks good to me too! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1431379325 From kvn at openjdk.org Wed May 17 22:13:51 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 May 2023 22:13:51 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Wed, 17 May 2023 16:43:06 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add benchmark > > Is it possible to do this in `c2_MacroAssembler_x86` instead (as for `verified_entry`)? > We are trying to move complex coding from .ad files to macroassembler. > @vnkozlov Yes we can explicitly define a stub without relying on code generation, it may be more preferable since it avoids adding complexity to adlc generation. The only downside is that there is some boilerplate for each usage but I think the boilerplate is not too terrible. Can you look on that? There could be other cases in Macroassembler which can use this ------------- PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1552153942 From xxinliu at amazon.com Wed May 17 23:48:13 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 17 May 2023 23:48:13 +0000 Subject: Update on PEA in C2 (Episode 3) Message-ID: Hi, I would like to update what we have done in C2 PEA. We manage to compile java.base module with PEA and inliner. It contains 7, 442 classes and 62,210 methods. Here are the number of objects we track and materialize. PEA: num allocations tracked = 24741, num materializations = 16037 We also CTW jdk.compiler and java.compiler modules. No compilation error is found. We fixed those compiler errors mainly by correcting allocation state. We verified behavior with one microbenchmark that we ported to JMH. It shows the allocation rate drops as expected. Because PEA is flow-sensitive, it can allocate on demand. The allocate rate reduces 75% when the object has 25% chance to escape (odd = 4); reduce to 1/8 when the object has only 12.5% chance to escape. https://github.com/navyxliu/jdk/pull/36 Remaining problems: 1. In order to curb complexity, we disable passive materialization for time being. Passive materialization takes place only at a merging point because any of predecessor has already materialized the object. We prove that it is still correct to skip passive materialization. The downside is that we may have partial redundant allocation because C2 can't guarantee to eliminate the original object now. Currently, JDK-8287061 is working on this problem. The patch unravels 'reducible phi nodes' and then the original AllocateNodes are eliminated by ScalarReplacement. More details can be found here. https://gist.github.com/navyxliu/6239ce24f1ae447060302cc8562cbb71?permalink_comment_id=4520588#gistcomment-4520588 If JDK-8287061 processes all reducible phi nodes, PEA will have synergy effect with it. Our design goal is to punt complex jobs to C2 optimizer. If PEA introduces severe performance problem, we will revisit 'passive materialization'. 2. There are still 400+ runtime errors when we try to run hotspot:tier1 tests. Most of them are from javac. here is what we have so far. $make test TEST="hotspot:tier1" CONF=linux-x86_64-server-fastdebug JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+DoPartialEscapeAnalysis" Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2150 1717 145 288 << ============================== Our understanding is that PEA can't guarantee to replace all the old objects with the new objects in the debug sections of GraphKit::add_safepoint_edges(). If deoptimization happens, runtime will rematerialize objects based on the wrong debuginfo. We end up wrong objects then. Our next goal to fix those runtime errors. We post a draft PR for curious audiences. We will port those tests to jtreg once we fix tier1 tests. https://github.com/openjdk/jdk/pull/14041 thanks, --lx From jkarthikeyan at openjdk.org Thu May 18 04:11:04 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 04:11:04 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v6] In-Reply-To: <0LZmbabcnX0kf4HLiNPu-XX5IcMkTPvR1CAG6yEAat0=.cc5bd2fd-2029-4e41-86e5-3b899b1b523f@github.com> References: <2ODJH1IFMOVjRgjQIeobF2eb_nxTCgnxcV__ttNz9nw=.7cbf388a-0a65-4d1c-8b60-d29ae3502123@github.com> <0LZmbabcnX0kf4HLiNPu-XX5IcMkTPvR1CAG6yEAat0=.cc5bd2fd-2029-4e41-86e5-3b899b1b523f@github.com> Message-ID: On Fri, 28 Apr 2023 05:48:39 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into conv2b-x86-lowering >> - Whitespace tweak >> - Make transform conditional >> - Remove Conv2B from backend as it's macro expanded now >> - Re-work transform to happen in macro expansion >> - Fix whitespace and add bug tag to IR test >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - ... and 1 more: https://git.openjdk.org/jdk/compare/bad6aa68...295b9a67 > > src/hotspot/share/opto/cfgnode.cpp line 1576: > >> 1574: Node *n = new Conv2BNode(cmp->in(1)); >> 1575: if( flipped ) >> 1576: n = new XorINode( phase->transform(n), phase->intcon(1) ); > > This lives under the `if (flipped)`, maybe move into a block for more clarity. Fixed, thanks. > src/hotspot/share/opto/macro.cpp line 44: > >> 42: #include "opto/macro.hpp" >> 43: #include "opto/memnode.hpp" >> 44: #include "opto/movenode.hpp" > > Unnecessary change? Oops, I forgot to remove the include from the old macro expansion logic. Thanks for the catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197336184 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197336090 From jkarthikeyan at openjdk.org Thu May 18 04:17:48 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 04:17:48 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: > Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: > > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% > Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% > Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) > Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% > Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% > > Reviews would be greatly appreciated! > > Testing: tier1-2 on linux x64, GHA Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Changes from code review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13345/files - new: https://git.openjdk.org/jdk/pull/13345/files/295b9a67..69e914a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=05-06 Stats: 23 lines in 5 files changed: 8 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345 PR: https://git.openjdk.org/jdk/pull/13345 From jkarthikeyan at openjdk.org Thu May 18 04:17:53 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 04:17:53 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v6] In-Reply-To: <0LZmbabcnX0kf4HLiNPu-XX5IcMkTPvR1CAG6yEAat0=.cc5bd2fd-2029-4e41-86e5-3b899b1b523f@github.com> References: <2ODJH1IFMOVjRgjQIeobF2eb_nxTCgnxcV__ttNz9nw=.7cbf388a-0a65-4d1c-8b60-d29ae3502123@github.com> <0LZmbabcnX0kf4HLiNPu-XX5IcMkTPvR1CAG6yEAat0=.cc5bd2fd-2029-4e41-86e5-3b899b1b523f@github.com> Message-ID: <9ehfUKghTaErxeSL_G0j5l9Fx0NoMXE9xvUuZJGiuUs=.31c225a7-291f-4c85-9afc-6220b1bde519@github.com> On Fri, 28 Apr 2023 05:51:06 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into conv2b-x86-lowering >> - Whitespace tweak >> - Make transform conditional >> - Remove Conv2B from backend as it's macro expanded now >> - Re-work transform to happen in macro expansion >> - Fix whitespace and add bug tag to IR test >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - ... and 1 more: https://git.openjdk.org/jdk/compare/bad6aa68...295b9a67 > > src/hotspot/share/opto/addnode.cpp line 890: > >> 888: } >> 889: >> 890: // Try to convert (c ? 1 : 0) ^ 1 into !c ? 1 : 0. This pattern can occur after expansion of Conv2B nodes. > > Be more general? `Xor (CMove cond, iftrue, iffalse), op == CMove cond, (Xor iftrue op), (Xor iffalse op)`. You can be conservative and apply this only if `op`, `iftrue` and `iffalse` are all constant. I think that's a good idea, I've made this change. I wonder if other associative operations would also benefit from a similar patch? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197338164 From jkarthikeyan at openjdk.org Thu May 18 04:17:55 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 04:17:55 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v6] In-Reply-To: <26rzNFCK7HGbnD6uJkCZ7niSgLDZz4fEPl7OeVkxqrQ=.8622554c-eefb-4a72-8682-297ec3c27cf3@github.com> References: <2ODJH1IFMOVjRgjQIeobF2eb_nxTCgnxcV__ttNz9nw=.7cbf388a-0a65-4d1c-8b60-d29ae3502123@github.com> <26rzNFCK7HGbnD6uJkCZ7niSgLDZz4fEPl7OeVkxqrQ=.8622554c-eefb-4a72-8682-297ec3c27cf3@github.com> Message-ID: On Wed, 10 May 2023 00:54:08 GMT, Sandhya Viswanathan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into conv2b-x86-lowering >> - Whitespace tweak >> - Make transform conditional >> - Remove Conv2B from backend as it's macro expanded now >> - Re-work transform to happen in macro expansion >> - Fix whitespace and add bug tag to IR test >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - Merge branch 'master' into conv2b-x86-lowering >> - ... and 1 more: https://git.openjdk.org/jdk/compare/bad6aa68...295b9a67 > > src/hotspot/share/opto/cfgnode.cpp line 1530: > >> 1528: if (phase->C->post_loop_opts_phase()) { >> 1529: return nullptr; >> 1530: } > > Should this only be done if (!Matcher::match_rule_supported(Op_Conv2B))? Yes, I think adding that would help in not accidentally removing optimization opportunities here, for other platforms. Thanks for this! > src/hotspot/share/opto/convertnode.hpp line 36: > >> 34: class Conv2BNode : public Node { >> 35: public: >> 36: Conv2BNode(Node* i) : Node(nullptr, i) {} > > Need to also update the copyright year to 2023 for convertnode.hpp. Thanks for the catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197338729 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197338854 From jkarthikeyan at openjdk.org Thu May 18 05:20:51 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 05:20:51 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 04:17:48 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Changes from code review Apologies for the delayed update, and thanks for the reviews! I have aarch64 performance results, from an M1 mac: Baseline Patch Improvement Benchmark Mode Cnt Score Error Units Score Error Units Conv2BRules.testEquals0? ? ? ? avgt ? 12? 41.697 ? 0.127?ns/op / 40.724 ? 0.086? ns/op + 2.4% Conv2BRules.testNotEquals0 ? ? avgt ? 12? 39.522 ? 0.143?ns/op / 40.608 ? 0.046? ns/op - 2.7% Conv2BRules.testEquals1? ? ? ? avgt ? 12? 40.168 ? 0.136?ns/op / 40.679 ? 0.044? ns/op (unchanged) Conv2BRules.testEqualsNull ? ? avgt ? 12? 48.922 ? 0.498?ns/op / 42.046 ? 0.018? ns/op + 15.1% Conv2BRules.testNotEqualsNull? avgt ? 12? 41.725 ? 0.264?ns/op / 42.063 ? 0.043? ns/op - 0.8% It seems like the patch doesn't have much of an impact other than `testEqualsNull`, which would make sense as the Conv2B rule is using the same `cset` instruction as the 0 and 1 rule for CMoveI. I was unfortunately not able to test for arm32, but I think it should still be beneficial as the Conv2B rules there used two cmoves and had a fixme, whereas with this patch it would only use one cmove. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1552411461 From jkarthikeyan at openjdk.org Thu May 18 05:36:07 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 05:36:07 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v3] In-Reply-To: References: Message-ID: <1GETXMI3qOJBGRxZTM-3Y5yGf9FHkV7i9XKypO9f_4E=.ceeb9d8f-ac96-4295-b949-d8acfeddbc98@github.com> > This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. > As an example, the output: > > *** Bundle: 1 instr, resources: D0 BR > 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 > > states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. > > The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13403/files - new: https://git.openjdk.org/jdk/pull/13403/files/79e3f744..c875474d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13403&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13403&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13403.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13403/head:pull/13403 PR: https://git.openjdk.org/jdk/pull/13403 From jkarthikeyan at openjdk.org Thu May 18 05:36:10 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 05:36:10 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v2] In-Reply-To: References: Message-ID: <4FF4tgN9U3zBZOW1kzb9_Lti73sc7yoRw2pfBGmfWEw=.5bfe64df-c567-4708-97ff-1b57cbec15f2@github.com> On Wed, 17 May 2023 16:10:25 GMT, Vladimir Kozlov wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright years > > src/hotspot/share/adlc/formsopt.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 1998, 2023, Oracle and/or its affiliates. All rights reserved. > > You mistakenly updated .cpp file instead of formsopt.hpp Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13403#discussion_r1197387981 From jkarthikeyan at openjdk.org Thu May 18 05:36:39 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 05:36:39 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 11:13:19 GMT, Tobias Hartmann wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright years > > Looks reasonable to me but I'm not an expert in this code. Another review would be good. Thanks a lot for the reviews @TobiHartmann and @vnkozlov! I've updated the PR with the copyright line fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13403#issuecomment-1552428524 From kvn at openjdk.org Thu May 18 06:14:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 06:14:52 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v3] In-Reply-To: <1GETXMI3qOJBGRxZTM-3Y5yGf9FHkV7i9XKypO9f_4E=.ceeb9d8f-ac96-4295-b949-d8acfeddbc98@github.com> References: <1GETXMI3qOJBGRxZTM-3Y5yGf9FHkV7i9XKypO9f_4E=.ceeb9d8f-ac96-4295-b949-d8acfeddbc98@github.com> Message-ID: On Thu, 18 May 2023 05:36:07 GMT, Jasmine Karthikeyan wrote: >> This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. >> As an example, the output: >> >> *** Bundle: 1 instr, resources: D0 BR >> 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 >> >> states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. >> >> The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright I submitted our internal testing before sponsoring. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13403#issuecomment-1552491164 From dzhang at openjdk.org Thu May 18 07:19:53 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 18 May 2023 07:19:53 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v8] In-Reply-To: <4x9xG70rreKenuWX5xxv2j7WlkoI4b-GgcB8l0AGnJA=.91c78fb7-ba28-49c7-ac20-ed1501f9fb9f@github.com> References: <4x9xG70rreKenuWX5xxv2j7WlkoI4b-GgcB8l0AGnJA=.91c78fb7-ba28-49c7-ac20-ed1501f9fb9f@github.com> Message-ID: On Wed, 17 May 2023 12:52:27 GMT, Fei Yang wrote: > Shouldn't this be: `__ vcpop_m($dst$$Register, as_VectorRegister($tmp$$reg), Assembler::v0_t);`? And do we missed `VectorMaskLastTrue`? Based on our research, we have found that RISC-V does not require the addition of the `vmask_firsttrue_masked` node (currently only present in ARM64). This is because current we already have the application vector length or AVL setting in RISC-V. Therefore, the `VectorMaskFirstTrue` node on RISC-V does not need a partial operation to substitute vector length for vector register size, as in ARM64: https://github.com/openjdk/jdk/blob/3c9ec26370dfae5d1230b6b69ae26122fe42e51d/src/hotspot/cpu/aarch64/aarch64_vector.ad#L290-L294 We will remove `vmask_firsttrue_masked` next. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1197487563 From duke at openjdk.org Thu May 18 09:23:59 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 18 May 2023 09:23:59 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v2] In-Reply-To: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: > In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. > > For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. > > However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. > > This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. > > For example, > > > var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); > m.not().trueCount(); > > > will produce following assembly on a Neon machine before this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > xtn v16.4h, v16.4s > xtn v16.8b, v16.8h > neg v16.8b, v16.8b // VectorStoreMask > addv b17, v16.8b > umov w0, v17.b[0] // VectorMask.trueCount() > ... > > > After this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > addv s17, v16.4s > smov x0, v17.b[0] > neg x0, x0 // Optimized VectorMask.trueCount() > ... > > > In this case, we can save two xtn insns. > > Performance: > > Benchmark Before After Unit > testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms > testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms > testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms > > [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 > [2]: https://github.com/openjdk/jdk/b... Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into optimize_truecount_neon - 8307795: AArch64: Optimize VectorMask.truecount() on Neon In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [4] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[3] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask [2] is generated to convert the mask from in-register format to in-memory format before those operations. However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. For example, ``` var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); m.not().trueCount(); ``` will produce following assembly on a Neon machine before this patch: ``` ... mvn v16.16b, v16.16b // VectorMask.not() xtn v16.4h, v16.4s xtn v16.8b, v16.8h neg v16.8b, v16.8b // VectorStoreMask addv b17, v16.8b umov w0, v17.b[0] // VectorMask.trueCount() ... ``` After this patch: ``` ... mvn v16.16b, v16.16b // VectorMask.not() addv s17, v16.4s smov x0, v17.b[0] neg x0, x0 // Optimized VectorMask.trueCount() ... ``` In this case, we can save two xtn insns. Performance: Benchmark Before After Unit testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms [1]: https://developer.arm.com/documentation/dui0801/h/A64-SIMD-Vector-Instructions/XTN--XTN2--vector- [2]: https://github.com/openjdk/jdk/blob/f968da97a5a5c68c28ad29d13fdfbe3a4adf5ef7/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4841 [3]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() [4]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13974/files - new: https://git.openjdk.org/jdk/pull/13974/files/b0eb5324..49e35b63 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=00-01 Stats: 445618 lines in 4990 files changed: 371359 ins; 38351 del; 35908 mod Patch: https://git.openjdk.org/jdk/pull/13974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13974/head:pull/13974 PR: https://git.openjdk.org/jdk/pull/13974 From dzhang at openjdk.org Thu May 18 09:40:07 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 18 May 2023 09:40:07 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v9] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Add vmask_lasttrue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/1fc880e3..f831f83c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=07-08 Stats: 41 lines in 1 file changed: 17 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From duke at openjdk.org Thu May 18 09:50:13 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 18 May 2023 09:50:13 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3] In-Reply-To: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: > In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. > > For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. > > However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. > > This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. > > For example, > > > var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); > m.not().trueCount(); > > > will produce following assembly on a Neon machine before this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > xtn v16.4h, v16.4s > xtn v16.8b, v16.8h > neg v16.8b, v16.8b // VectorStoreMask > addv b17, v16.8b > umov w0, v17.b[0] // VectorMask.trueCount() > ... > > > After this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > addv s17, v16.4s > smov x0, v17.b[0] > neg x0, x0 // Optimized VectorMask.trueCount() > ... > > > In this case, we can save two xtn insns. > > Performance: > > Benchmark Before After Unit > testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms > testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms > testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms > > [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 > [2]: https://github.com/openjdk/jdk/b... Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Update benchmark to avoid potential optimization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13974/files - new: https://git.openjdk.org/jdk/pull/13974/files/49e35b63..567f69a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=01-02 Stats: 20 lines in 4 files changed: 0 ins; 5 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13974/head:pull/13974 PR: https://git.openjdk.org/jdk/pull/13974 From duke at openjdk.org Thu May 18 09:54:54 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 18 May 2023 09:54:54 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> <4hN6zn1gNhhXUIKHPlKhqqVolJkX-Hcrp2SgvI6zcU0=.9ecce499-66a2-41fa-be6e-93ed7615e02d@github.com> <5NfBPdiTQS9KBSWSgJSjfwW_IT7UdKwi__REZAvtxo4=.35641a45-6e4d-4b8a-865f-76784e7cc173@github.com> Message-ID: On Mon, 15 May 2023 10:59:11 GMT, Andrew Haley wrote: > > > This looks like it might be removed by loop opts. I think you might need a blackhole somewhere. > > > > > > `m` will be updated in every iteration of this loop, so `m` is not a loop-invariants actually. I can see the assembly code of this loop by using JMH perfasm. > > Isn't it? Looks to me like all it does is flip `m` each time. Whether or not this code is optimized today isn't relevant. > > So it's the same as > > ``` > for (int i = 0; i < LENGTH/2; i++) { > res += m.trueCount(); > } > m = m.not(); > for (int i = 0; i < LENGTH/2; i++) { > res += m.trueCount(); > } > ``` > > ... which is trivially optimizable, no? Sorry for the delay. Yes, actually they do the same thing, though current C2 compiler cannot do such optimization so far. Anyway, I have updated this benchmark to avoid potential optimization and ensure that we can measure performance effectively. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1197626911 From fyang at openjdk.org Thu May 18 13:06:49 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 18 May 2023 13:06:49 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 08:35:45 GMT, Xiaolin Zheng wrote: >> The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. >> >> Passed fastdebug/release build on both AArch64/RISC-V platforms. >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Further cleanups Updated change LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13983#pullrequestreview-1432671883 From qamai at openjdk.org Thu May 18 13:20:34 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 13:20:34 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v3] In-Reply-To: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: > Hi, > > This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. > > Thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - trailing new lines - add microbenchmark - refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13602/files - new: https://git.openjdk.org/jdk/pull/13602/files/3b13e9e6..262e7a3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=01-02 Stats: 183807 lines in 2968 files changed: 138810 ins; 22591 del; 22406 mod Patch: https://git.openjdk.org/jdk/pull/13602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13602/head:pull/13602 PR: https://git.openjdk.org/jdk/pull/13602 From qamai at openjdk.org Thu May 18 14:15:22 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 14:15:22 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v4] In-Reply-To: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: > Hi, > > This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. > > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13602/files - new: https://git.openjdk.org/jdk/pull/13602/files/262e7a3b..a17bcb76 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13602/head:pull/13602 PR: https://git.openjdk.org/jdk/pull/13602 From dnsimon at openjdk.org Thu May 18 14:18:10 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 18 May 2023 14:18:10 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v2] In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 - send JVMCI exception info to hs-err log and/or tty - remove unused callToString method - make JMCI more robust in low resource conditions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13905/files - new: https://git.openjdk.org/jdk/pull/13905/files/26a1c426..29cbdebc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=00-01 Stats: 85182 lines in 1213 files changed: 69307 ins; 7212 del; 8663 mod Patch: https://git.openjdk.org/jdk/pull/13905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13905/head:pull/13905 PR: https://git.openjdk.org/jdk/pull/13905 From qamai at openjdk.org Thu May 18 14:18:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 14:18:54 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Wed, 17 May 2023 11:42:02 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > Great work. I'm just wondering if the extra complexity is justified for optimizing only the floating point conversions. Do you plan to use this for other optimizations? @TobiHartmann @vnkozlov I have reworked the patch, now it relies on template instead of adlc generation to achieve the desired behaviours, I think this is a much more preferable approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1553128882 From qamai at openjdk.org Thu May 18 14:41:55 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 14:41:55 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 04:17:48 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Changes from code review Otherwise LGTM. Thanks a lot. src/hotspot/share/opto/addnode.cpp line 892: > 890: // Propagate xor through constant cmoves. This pattern can occur after expansion of Conv2B nodes. > 891: if (in1->Opcode() == Op_CMoveI && in2->is_Con()) { > 892: if (in1->in(2)->is_Con() && in1->in(3)->is_Con()) { `CMoveNode::IfTrue` and `CMoveNode::IfFalse` instead of 3 and 2. src/hotspot/share/opto/addnode.cpp line 900: > 898: > 899: if (cmp_op == Op_CmpI || cmp_op == Op_CmpP) { > 900: // Flip the sense of comparison in the bool and return a new cmove Mistaken comment ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/13345#pullrequestreview-1432831514 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197905158 PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197906007 From qamai at openjdk.org Thu May 18 14:41:56 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 14:41:56 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: <3i_J-b5e5rWGvwniI16n9_9uPlM8A4nmy07WXAEUeFA=.e54fa3ad-701d-41ff-a0cf-6e0f227a40ac@github.com> On Thu, 18 May 2023 14:33:37 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Changes from code review > > src/hotspot/share/opto/addnode.cpp line 892: > >> 890: // Propagate xor through constant cmoves. This pattern can occur after expansion of Conv2B nodes. >> 891: if (in1->Opcode() == Op_CMoveI && in2->is_Con()) { >> 892: if (in1->in(2)->is_Con() && in1->in(3)->is_Con()) { > > `CMoveNode::IfTrue` and `CMoveNode::IfFalse` instead of 3 and 2. `CMoveNode::Condition` instead of `in1->in(1)` below, too. You need to check for the node actually being a `BoolNode` in case a constant-condition `CMove` has not been folded yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1197908427 From duke at openjdk.org Thu May 18 15:03:49 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 18 May 2023 15:03:49 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:46:31 GMT, Tobias Hartmann wrote: >> This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. > > Okay, thanks for the details. That sounds reasonable to me. @TobiHartmann, @vnkozlov would you mind sponsoring this PR please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1553195254 From kvn at openjdk.org Thu May 18 15:10:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 15:10:52 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v4] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Thu, 18 May 2023 14:15:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Clever. Let me test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1432893022 From kvn at openjdk.org Thu May 18 15:13:04 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 15:13:04 GMT Subject: RFR: 8305787: Wrong debugging information printed with TraceOptoOutput [v3] In-Reply-To: <1GETXMI3qOJBGRxZTM-3Y5yGf9FHkV7i9XKypO9f_4E=.ceeb9d8f-ac96-4295-b949-d8acfeddbc98@github.com> References: <1GETXMI3qOJBGRxZTM-3Y5yGf9FHkV7i9XKypO9f_4E=.ceeb9d8f-ac96-4295-b949-d8acfeddbc98@github.com> Message-ID: On Thu, 18 May 2023 05:36:07 GMT, Jasmine Karthikeyan wrote: >> This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. >> As an example, the output: >> >> *** Bundle: 1 instr, resources: D0 BR >> 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 >> >> states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. >> >> The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright My testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13403#issuecomment-1553207647 From jkarthikeyan at openjdk.org Thu May 18 15:13:06 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 18 May 2023 15:13:06 GMT Subject: Integrated: 8305787: Wrong debugging information printed with TraceOptoOutput In-Reply-To: References: Message-ID: <6xTygCTBQ5oXcafuyYpEz-3eIxBCD3l4EAmFJwXkHMg=.05b1ccaf-c520-446e-97d2-7bda21b305be@github.com> On Mon, 10 Apr 2023 03:43:17 GMT, Jasmine Karthikeyan wrote: > This patch fixes a minor bug in aldc where the wrong resource names are printed when the flag TraceOptoOutput is enabled to debug instruction scheduling. > As an example, the output: > > *** Bundle: 1 instr, resources: D0 BR > 126 salI_rReg_imm === _ 240 |271 [[ 127 125 ]] #5/0x00000005 > > states that the bundle is using resources D0 and BR, but the second resource used is actually ALU0. > > The issue is caused because `pipeline->_rescount` is only incremented for discrete resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1612), resources specified without `=`. However, the list of names is added to for *all* resources [(here)](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/adlparse.cpp#L1652), so using `_rescount` to index the names causes it to go out of sync. The fix is found in [output_h.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/adlc/output_h.cpp#L2231), where it uses the iterator to go through all the resources and use only the ones that are discrete. I applied that fix to this case, and also fixed the other instances of this bug. Reviews on this fix would be appreciated! This pull request has now been integrated. Changeset: cc5c9b5d Author: Jasmine Karthikeyan Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/cc5c9b5da2de4229c0244169bcbd6496f68db5ab Stats: 76 lines in 3 files changed: 37 ins; 2 del; 37 mod 8305787: Wrong debugging information printed with TraceOptoOutput Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13403 From kvn at openjdk.org Thu May 18 15:28:49 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 15:28:49 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: <9FTt1c4dwJsamVN11SqgsJ2wDmolh43hqD3p-esEwDY=.31463a9e-2e9b-480e-a30b-e337720d91dd@github.com> On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. I submitted our internal testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1553230043 From kbarrett at openjdk.org Thu May 18 15:49:51 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 15:49:51 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: <-qvQkvH8SylX3unheSpOdsjz-mhrnyvqgxtNLKiOmGg=.41f065ea-f856-4436-88d3-8c7b8b01726d@github.com> References: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> <-qvQkvH8SylX3unheSpOdsjz-mhrnyvqgxtNLKiOmGg=.41f065ea-f856-4436-88d3-8c7b8b01726d@github.com> Message-ID: <24zfRJ2Ir1egB-U5XJd37qJZUliAqAXkKIaHqE8gG-8=.3e0fc1f7-5eae-4fc9-8135-42a40f917a1e@github.com> On Wed, 17 May 2023 14:51:49 GMT, Xue-Lei Andrew Fan wrote: >> src/utils/hsdis/binutils/hsdis-binutils.c line 246: >> >>> 244: >>> 245: size_t used_size = snprintf(buf, bufsize, "%s", close); >>> 246: if ((used_size < 0) || (used_size >= bufsize)) { >> >> (used_size < 0) is tautologically false, since used_size is a size_t, so unsigned. I'm somewhat surprised >> this doesn't trigger a warning from some compiler. > > Updated to use `int` to replace `size_t.`. Thank you for the catching. bufsize is size_t, so that's a comparison between signed and unsigned values, which I think some compilers will warn about. Maybe the preceding check for negative is getting rid of that? But will that still occur in a slowdebug build, or will the lack of optimization lead to a warning? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1197985875 From Divino.Cesar at microsoft.com Thu May 18 17:40:24 2023 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Thu, 18 May 2023 17:40:24 +0000 Subject: Update on PEA in C2 (Episode 3) In-Reply-To: References: Message-ID: Hi, Xin Liu. Thank you for working on this. I?m glad to see the progress. > PEA: num allocations tracked = 24741, num materializations = 16037 Can you give more details on what these numbers are? Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) and ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? If that happens often perhaps it?s a low hanging fruit that you could pursue instead of the general PEA problem. I.e., something like this: Point p = new Point(?); if (?.) { method(p); } Thanks, Cesar From: hotspot-compiler-dev on behalf of Liu, Xin Date: Wednesday, May 17, 2023 at 4:48 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Update on PEA in C2 (Episode 3) Hi, I would like to update what we have done in C2 PEA. We manage to compile java.base module with PEA and inliner. It contains 7, 442 classes and 62,210 methods. Here are the number of objects we track and materialize. PEA: num allocations tracked = 24741, num materializations = 16037 We also CTW jdk.compiler and java.compiler modules. No compilation error is found. We fixed those compiler errors mainly by correcting allocation state. We verified behavior with one microbenchmark that we ported to JMH. It shows the allocation rate drops as expected. Because PEA is flow-sensitive, it can allocate on demand. The allocate rate reduces 75% when the object has 25% chance to escape (odd = 4); reduce to 1/8 when the object has only 12.5% chance to escape. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnavyxliu%2Fjdk%2Fpull%2F36&data=05%7C01%7Cdivino.cesar%40microsoft.com%7C4db7d7afba524192f3b808db57314523%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638199641360138680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=x2CPg%2BegAqcri48C61VhjyBckubFzR3LhD1ipUAeJs4%3D&reserved=0 Remaining problems: 1. In order to curb complexity, we disable passive materialization for time being. Passive materialization takes place only at a merging point because any of predecessor has already materialized the object. We prove that it is still correct to skip passive materialization. The downside is that we may have partial redundant allocation because C2 can't guarantee to eliminate the original object now. Currently, JDK-8287061 is working on this problem. The patch unravels 'reducible phi nodes' and then the original AllocateNodes are eliminated by ScalarReplacement. More details can be found here. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fnavyxliu%2F6239ce24f1ae447060302cc8562cbb71%3Fpermalink_comment_id%3D4520588%23gistcomment-4520588&data=05%7C01%7Cdivino.cesar%40microsoft.com%7C4db7d7afba524192f3b808db57314523%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638199641360138680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rXguep3zzXLM1bh0vHTYepXrde7OtMBhXsFgr173EQU%3D&reserved=0 If JDK-8287061 processes all reducible phi nodes, PEA will have synergy effect with it. Our design goal is to punt complex jobs to C2 optimizer. If PEA introduces severe performance problem, we will revisit 'passive materialization'. 2. There are still 400+ runtime errors when we try to run hotspot:tier1 tests. Most of them are from javac. here is what we have so far. $make test TEST="hotspot:tier1" CONF=linux-x86_64-server-fastdebug JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+DoPartialEscapeAnalysis" Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2150 1717 145 288 << ============================== Our understanding is that PEA can't guarantee to replace all the old objects with the new objects in the debug sections of GraphKit::add_safepoint_edges(). If deoptimization happens, runtime will rematerialize objects based on the wrong debuginfo. We end up wrong objects then. Our next goal to fix those runtime errors. We post a draft PR for curious audiences. We will port those tests to jtreg once we fix tier1 tests. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fpull%2F14041&data=05%7C01%7Cdivino.cesar%40microsoft.com%7C4db7d7afba524192f3b808db57314523%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638199641360138680%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gHoHmZ3aqpkGcP2RVhYXlXsnZmUVLj4jUJXiKxW2z1o%3D&reserved=0 thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From qamai at openjdk.org Thu May 18 17:46:55 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 18 May 2023 17:46:55 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 04:17:48 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Changes from code review A tiny point: maybe just remove `setne` and use `setb(Assembler::notZero`, I don't think having a dedicated `setne` achieve much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1553403253 From xuelei at openjdk.org Thu May 18 17:50:22 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Thu, 18 May 2023 17:50:22 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v5] In-Reply-To: References: Message-ID: > Hi, > > This is a redo of JDK-8307855, where issues were found after integration. > > The sprintf is deprecated in Xcode 14, and Microsoft Virtual Studio, because of security concerns. The issue was addressed in [JDK-8296812](https://bugs.openjdk.org/browse/JDK-8296812) for building failure, and [JDK-8299378](https://bugs.openjdk.org/browse/JDK-8299378)/[JDK-8299635](https://bugs.openjdk.org/browse/JDK-8299635)/[JDK-8301132](https://bugs.openjdk.org/browse/JDK-8301132) for testing issues . This is a break-down update for sprintf uses in the src/utils directory. > > Thanks, > Xuelei Xue-Lei Andrew Fan has updated the pull request incrementally with one additional commit since the last revision: compare between int and size_t ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13995/files - new: https://git.openjdk.org/jdk/pull/13995/files/244278a0..ccef71e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13995&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13995/head:pull/13995 PR: https://git.openjdk.org/jdk/pull/13995 From xuelei at openjdk.org Thu May 18 17:50:41 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Thu, 18 May 2023 17:50:41 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: <24zfRJ2Ir1egB-U5XJd37qJZUliAqAXkKIaHqE8gG-8=.3e0fc1f7-5eae-4fc9-8135-42a40f917a1e@github.com> References: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> <-qvQkvH8SylX3unheSpOdsjz-mhrnyvqgxtNLKiOmGg=.41f065ea-f856-4436-88d3-8c7b8b01726d@github.com> <24zfRJ2Ir1egB-U5XJd37qJZUliAqAXkKIaHqE8gG-8=.3e0fc1f7-5eae-4fc9-8135-42a40f917a1e@github.com> Message-ID: <1tYQTu-AGF4FDHDYVVh75wXiRwR8qt2YnojKtTLFNXk=.c2d20be8-8616-4c8d-85bf-dc331bd1e812@github.com> On Thu, 18 May 2023 15:46:46 GMT, Kim Barrett wrote: >> Updated to use `int` to replace `size_t.`. Thank you for the catching. > > bufsize is size_t, so that's a comparison between signed and unsigned values, which I think some compilers > will warn about. Maybe the preceding check for negative is getting rid of that? But will that still occur in > a slowdebug build, or will the lack of optimization lead to a warning? As always, this comment helps a lot. Thank you! Updated to cast `int` to `size_t` explicitly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1198111051 From duke at openjdk.org Thu May 18 19:52:59 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 18 May 2023 19:52:59 GMT Subject: Integrated: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: <6eu_Z40bsj2Dy_-fB4sDnsLmaxdmaR5CTq0kBE2X5Io=.0366005e-11ca-4742-ada2-28478ce201b3@github.com> On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. This pull request has now been integrated. Changeset: d3feedf5 Author: Ashutosh Mehra Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/d3feedf5114542078c10abec0612038c88e005d6 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8308192: Error in parsing replay file when staticfield is an array of single dimension Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14024 From kvn at openjdk.org Thu May 18 19:52:58 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 19:52:58 GMT Subject: RFR: 8308192: Error in parsing replay file when staticfield is an array of single dimension In-Reply-To: References: Message-ID: On Wed, 17 May 2023 02:13:39 GMT, Ashutosh Mehra wrote: > This fixes the parsing error caused by not consuming all the tokens in the `staticfield` command in a replay file. My testing passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14024#issuecomment-1553563580 From dnsimon at openjdk.org Thu May 18 20:47:26 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 18 May 2023 20:47:26 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v3] In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 - make JMCI more robust in low resource conditions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13905/files - new: https://git.openjdk.org/jdk/pull/13905/files/29cbdebc..ef9ac32d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=01-02 Stats: 1497 lines in 27 files changed: 1080 ins; 338 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/13905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13905/head:pull/13905 PR: https://git.openjdk.org/jdk/pull/13905 From dnsimon at openjdk.org Thu May 18 20:47:28 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 18 May 2023 20:47:28 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v2] In-Reply-To: References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Thu, 18 May 2023 14:18:10 GMT, Doug Simon wrote: >> This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: >> * Tracks upcalls into libjvmci or creation of libjvmci. >> * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). >> >> When JVMCI compilation is disabled, a warning is emitted: >> >> [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. >> >> >> With `-Xlog:jit+compilation`, the extra detail shown is: >> >> [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I >> Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError >> java.lang.InternalError: aborting compilation of HotSpotMethod()> >> at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) >> >> >> Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 > - send JVMCI exception info to hs-err log and/or tty > - remove unused callToString method > - make JMCI more robust in low resource conditions I rebased this PR to remove commits from https://github.com/openjdk/jdk/pull/14000 that accidentally got cherry-picked into this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13905#issuecomment-1553619797 From xxinliu at amazon.com Thu May 18 21:57:58 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 18 May 2023 21:57:58 +0000 Subject: Update on PEA in C2 (Episode 3) In-Reply-To: References: Message-ID: <2886E2C4-6D99-4F83-83A7-2C5C8B0922F1@amazon.com> Hi, Cesar, > Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) Yes, we intercept Parse::do_new() and increment the counter if we register the Object idx to the allocation state. > ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? This counter is the number of materializations. One object may be materialized multiple times in different branches. Eg. We track one object, but num materializations = 3. Object o = new Object; If(a) escape(o); Else if (b) escape(o); Ese escaped (o); > Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? We are not tracking this case. Your example is very similar the ArgEscape case in the microbenchark. https://github.com/navyxliu/jdk/pull/36/files#diff-c96245c7aa8950a261e64f01570331420bf00e76ba1861130f7381458b345f33R76 what you are going to do in your case? Our scheme is like this. In the nutshell, our PEA materialization splits the lifecycle of an object. Let's say there's an object which will be marked 'Escape' by C2 EA. PEA keeps cloning this object at escaping points in flow-sensitive way. After parse, the original object becomes certainly NonEscaped anymore from the perspective of C2 EA. Point p = new Point(?); // NonEscaped if (?.) { Point p' = materialize(p); method(p'); } We don't clean it up. We just leave this to C2 Optimizer. There are 3 cases: 1. the object is useless. Removed by C2 optimizer. Like this case. 2. As long as the NonEscaped object is 'unque typing', Scalar Replacement will process it. 3. it's NSR. I wish I could leverage your work on this case. Thanks, --lx From: Cesar Soares Lucas Date: Thursday, May 18, 2023 at 10:40 AM To: "Liu, Xin" , "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL]Update on PEA in C2 (Episode 3) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, Xin Liu. Thank you for working on this. I?m glad to see the progress. > PEA: num allocations tracked = 24741, num materializations = 16037 Can you give more details on what these numbers are? Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) and ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? If that happens often perhaps it?s a low hanging fruit that you could pursue instead of the general PEA problem. I.e., something like this: Point p = new Point(?); if (?.) { ?? method(p); } Thanks, Cesar From: hotspot-compiler-dev on behalf of Liu, Xin Date: Wednesday, May 17, 2023 at 4:48 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Update on PEA in C2 (Episode 3) Hi, I would like to update what we have done in C2 PEA. We manage to compile java.base module with PEA and inliner. It contains 7, 442 classes and 62,210 methods. Here are the number of objects we track and materialize. PEA: num allocations tracked = 24741, num materializations = 16037 We also CTW jdk.compiler and java.compiler modules. No compilation error is found. We fixed those compiler errors mainly by correcting allocation state. We verified behavior with one microbenchmark that we ported to JMH. It shows the allocation rate drops as expected. Because PEA is flow-sensitive, it can allocate on demand. The allocate rate reduces 75% when the object has 25% chance to escape (odd = 4); reduce to 1/8 when the object has only 12.5% chance to escape. https://github.com/navyxliu/jdk/pull/36 Remaining problems: 1. In order to curb complexity, we disable passive materialization for time being. Passive materialization takes place only at a merging point because any of predecessor has already materialized the object. We prove that it is still correct to skip passive materialization. The downside is that we may have partial redundant allocation because C2 can't guarantee to eliminate the original object now. Currently, JDK-8287061 is working on this problem. The patch unravels 'reducible phi nodes' and then the original AllocateNodes are eliminated by ScalarReplacement. More details can be found here.?? https://gist.github.com/navyxliu/6239ce24f1ae447060302cc8562cbb71?permalink_comment_id=4520588#gistcomment-4520588 ? If JDK-8287061 processes all reducible phi nodes, PEA will have synergy effect with it. Our design goal is to punt complex jobs to C2 optimizer. If PEA introduces severe performance problem, we will revisit 'passive materialization'. 2. There are still 400+ runtime errors when we try to run hotspot:tier1 tests. Most of them are from javac. here is what we have so far. $make test TEST="hotspot:tier1"? CONF=linux-x86_64-server-fastdebug JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+DoPartialEscapeAnalysis" Test summary ============================== ?? TEST????????????????????????????????????????????? TOTAL? PASS? FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1???????????????????? 2150? 1717?? 145?? 288 << ============================== Our understanding is that PEA can't guarantee to replace all the old objects with the new objects in the debug sections of GraphKit::add_safepoint_edges(). If deoptimization happens, runtime will rematerialize objects based on the wrong debuginfo. We end up wrong objects then. Our next goal to fix those runtime errors. We post a draft PR for curious audiences.? We will port those tests to jtreg once we fix tier1 tests. https://github.com/openjdk/jdk/pull/14041 thanks, --lx From kvn at openjdk.org Thu May 18 23:11:54 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 May 2023 23:11:54 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v4] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Thu, 18 May 2023 14:15:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > whitespace My testing passed. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1433566460 From fjiang at openjdk.org Fri May 19 00:49:53 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 19 May 2023 00:49:53 GMT Subject: RFR: 8308277: RISC-V: Improve vectorization of Match.sqrt() on floats In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:56:04 GMT, Fei Yang wrote: >> [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) added `VSqrtF` and `SqrtF` nodes to support the vectorization of Match.sqrt() on floats. For riscv port, however, the scalar version of `sqrtF` still uses the old match rule that converts Float to Double first. It can be simplified to just use `SqrtF`. >> >> The old match rule also affects the vectorization of Math.sqrt() on float. The current implementation will convert float to double with `vcvtFtoD`, then do `vsqrtD`, and finally convert the result back to float with `vcvtDtoF`. If we use the new `SqrtF` match rule, it will only use `vsqrtF` to do the conversion. Take the test (Sqrt.java) from [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) as an example, here is the output with `-XX:+PrintOptoAssembly` and `-XX:+UseRVV`: >> >> before: >> >> >> 19a loadV V1, [R13] # vector (rvv) >> 1a2 vcvtFtoD V2, V1 >> 1ae vfsqrt.v V1, V2 #@vsqrtD >> 1b6 vcvtDtoF V1, V1 >> 1c2 storeV [R14], V1 # vector (rvv) >> >> >> after: >> >> 1be loadV V1, [R12] # vector (rvv) >> 1c6 vfsqrt.v V1, V1 #@vsqrtF >> 1ce addi R12, R29, #144 # ptr, #@addP_reg_imm >> 1d2 storeV [R12], V1 # vector (rvv) >> >> >> Testing: >> - [x] tier1 tests on Unmatched board without `-XX:+UseRVV` (release build) >> - [x] hotspot_tier1/jdk_tier1 on QEMU with `-XX:+UseRVV` (release build) > > Looks reasonable. Thanks. @RealFYang -- Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14029#issuecomment-1553857011 From fjiang at openjdk.org Fri May 19 00:52:56 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 19 May 2023 00:52:56 GMT Subject: Integrated: 8308277: RISC-V: Improve vectorization of Match.sqrt() on floats In-Reply-To: References: Message-ID: On Wed, 17 May 2023 11:00:07 GMT, Feilong Jiang wrote: > [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) added `VSqrtF` and `SqrtF` nodes to support the vectorization of Match.sqrt() on floats. For riscv port, however, the scalar version of `sqrtF` still uses the old match rule that converts Float to Double first. It can be simplified to just use `SqrtF`. > > The old match rule also affects the vectorization of Math.sqrt() on float. The current implementation will convert float to double with `vcvtFtoD`, then do `vsqrtD`, and finally convert the result back to float with `vcvtDtoF`. If we use the new `SqrtF` match rule, it will only use `vsqrtF` to do the conversion. Take the test (Sqrt.java) from [JDK-8190800](https://bugs.openjdk.org/browse/JDK-8190800) as an example, here is the output with `-XX:+PrintOptoAssembly` and `-XX:+UseRVV`: > > before: > > > 19a loadV V1, [R13] # vector (rvv) > 1a2 vcvtFtoD V2, V1 > 1ae vfsqrt.v V1, V2 #@vsqrtD > 1b6 vcvtDtoF V1, V1 > 1c2 storeV [R14], V1 # vector (rvv) > > > after: > > 1be loadV V1, [R12] # vector (rvv) > 1c6 vfsqrt.v V1, V1 #@vsqrtF > 1ce addi R12, R29, #144 # ptr, #@addP_reg_imm > 1d2 storeV [R12], V1 # vector (rvv) > > > Testing: > - [x] tier1 tests on Unmatched board without `-XX:+UseRVV` (release build) > - [x] hotspot_tier1/jdk_tier1 on QEMU with `-XX:+UseRVV` (release build) This pull request has now been integrated. Changeset: e520cdc8 Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/e520cdc882a778260181a2162a01ceff7cc41ca0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8308277: RISC-V: Improve vectorization of Match.sqrt() on floats Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/14029 From dzhang at openjdk.org Fri May 19 01:20:17 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 19 May 2023 01:20:17 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v10] In-Reply-To: References: Message-ID: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge remote-tracking branch 'upstream/master' into JDK-8307609 - Remove masked vmaskall - Add vmask_lasttrue - Add some comments - Merge remote-tracking branch 'upstream/master' into JDK-8307609 - Adjust some params order in c2_MacroAssembler_riscv - Adjust some function in c2_MacroAssembler_riscv - Remove trailing whitespace - Fix minmax_fp_masked_v - Change some iRegI to iRegIorL2I and small refactoring of minmax_fp_masked_v - ... and 11 more: https://git.openjdk.org/jdk/compare/e520cdc8...04ff9333 ------------- Changes: https://git.openjdk.org/jdk/pull/13862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=09 Stats: 1720 lines in 6 files changed: 1447 ins; 139 del; 134 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From fyang at openjdk.org Fri May 19 01:47:57 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 May 2023 01:47:57 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v10] In-Reply-To: References: Message-ID: <1E3SCm2CqU2Kp23YwBrGTSXTTVmyqu6mZTE1b7tujII=.794fda9b-90a4-4ea6-aa8f-6f5d6a08784b@github.com> On Fri, 19 May 2023 01:20:17 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge remote-tracking branch 'upstream/master' into JDK-8307609 > - Remove masked vmaskall > - Add vmask_lasttrue > - Add some comments > - Merge remote-tracking branch 'upstream/master' into JDK-8307609 > - Adjust some params order in c2_MacroAssembler_riscv > - Adjust some function in c2_MacroAssembler_riscv > - Remove trailing whitespace > - Fix minmax_fp_masked_v > - Change some iRegI to iRegIorL2I and small refactoring of minmax_fp_masked_v > - ... and 11 more: https://git.openjdk.org/jdk/compare/e520cdc8...04ff9333 Some extra nit-picking suggestions. Otherwise, looks good. Thanks. src/hotspot/cpu/riscv/riscv_v.ad line 3942: > 3940: match(Set dst (ExpandV src v0)); > 3941: effect(TEMP_DEF dst, TEMP tmp); > 3942: format %{ "vexpand $dst, $src, $v0" %} Suggestion: `format %{ "vexpand $dst, $src, $v0\t# KILL $tmp" %}` src/hotspot/cpu/riscv/riscv_v.ad line 4035: > 4033: // ------------------------------ Populate Index to a Vector ------------------- > 4034: > 4035: instruct populateindex(vReg dst, iRegIorL2I src1, iRegIorL2I src2, vReg tmp1) %{ Suggestion: Rename `tmp1` to `tmp` src/hotspot/cpu/riscv/riscv_v.ad line 4073: > 4071: %} > 4072: > 4073: instruct insertI_index(vReg dst, vReg src, iRegIorL2I val, iRegIorL2I idx, vReg tmp1, vRegMask_V0 v0) %{ Suggestion: Rename `tmp1` to `tmp` src/hotspot/cpu/riscv/riscv_v.ad line 4111: > 4109: %} > 4110: > 4111: instruct insertL_index(vReg dst, vReg src, iRegL val, iRegIorL2I idx, vReg tmp1, vRegMask_V0 v0) %{ Suggestion: Rename `tmp1` to `tmp` src/hotspot/cpu/riscv/riscv_v.ad line 4146: > 4144: %} > 4145: > 4146: instruct insertF_index(vReg dst, vReg src, fRegF val, iRegIorL2I idx, vReg tmp1, vRegMask_V0 v0) %{ Suggestion: Rename `tmp1` to `tmp` src/hotspot/cpu/riscv/riscv_v.ad line 4180: > 4178: %} > 4179: > 4180: instruct insertD_index(vReg dst, vReg src, fRegD val, iRegIorL2I idx, vReg tmp1, vRegMask_V0 v0) %{ Suggestion: Rename `tmp1` to `tmp` ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1433666573 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198465264 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198465979 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198466117 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198466179 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198466235 PR Review Comment: https://git.openjdk.org/jdk/pull/13862#discussion_r1198466274 From dzhang at openjdk.org Fri May 19 03:01:17 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 19 May 2023 03:01:17 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v11] In-Reply-To: References: Message-ID: <5uMY1423PkvXGHNBJeg9y5UKbxrJEeXOW9CmS_gtiDY=.12e24933-affa-4d0b-9ba0-52c68acc6af1@github.com> > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Rename some tmp1 to tmp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13862/files - new: https://git.openjdk.org/jdk/pull/13862/files/04ff9333..75f8437e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13862&range=09-10 Stats: 26 lines in 1 file changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/13862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13862/head:pull/13862 PR: https://git.openjdk.org/jdk/pull/13862 From fyang at openjdk.org Fri May 19 03:01:43 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 May 2023 03:01:43 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v11] In-Reply-To: <5uMY1423PkvXGHNBJeg9y5UKbxrJEeXOW9CmS_gtiDY=.12e24933-affa-4d0b-9ba0-52c68acc6af1@github.com> References: <5uMY1423PkvXGHNBJeg9y5UKbxrJEeXOW9CmS_gtiDY=.12e24933-affa-4d0b-9ba0-52c68acc6af1@github.com> Message-ID: On Fri, 19 May 2023 03:01:17 GMT, Dingli Zhang wrote: >> Hi all, >> >> We have added support for Extract, Compress, Expand and other nodes for Vector >> API. It was implemented by referring to RVV v1.0 [1]. Please take a look and >> have some reviews. Thanks a lot. >> >> In this PR, we will support these new nodes: >> >> CompressM/CompressV/ExpandV >> LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> Extract >> VectorLongToMask/VectorMaskToLong >> PopulateIndex >> VectorLongToMask/VectorMaskToLong >> VectorMaskTrueCount/VectorMaskFirstTrue >> VectorInsert >> >> >> At the same time, we refactored methods such as >> `match_rule_supported_vector_mask`. All implemented vector nodes support mask >> operations by default now, so we also added mask nodes for all implemented >> nodes. >> >> By the way, we will implement the VectorTest node in the next PR. >> >> We can use the tests under `test/jdk/jdk/incubator/vector` to print the >> compilation log for most of the new nodes. And we can use the following >> command to print the compilation log of a jtreg test case: >> >> >> $ jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=log_name.log \ >> -jdk:build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:build/linux-x86_64-server-release/images/jdk \ >> >> >> >> >> >> ### CompressM/CompressV/ExpandV >> >> There is no inverse vdecompress provided in RVV, as this operation can be >> readily synthesized using iota and a masked vrgather in `ExpandV`. >> >> We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit >> these nodes and the compilation log is as follows: >> >> >> ## CompressM >> 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm >> 2ae mcompress V0, V30 # KILL R30 >> 2c2 vstoremask V2, V0 >> 2ce storeV [R7], V2 # vector (rvv) >> 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## CompressV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vcompress V1, V2, V0 >> 0fe storeV [R7], V1 # vector (rvv) >> 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> ## ExpandV >> 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm >> 0f2 vexpand V3, V2, V0 >> 102 storeV [R7], V3 # vector (rvv) >> 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 >> >> >> >> >> ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked >> >> We use the vs... > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Rename some tmp1 to tmp Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13862#pullrequestreview-1433709910 From dzhang at openjdk.org Fri May 19 03:08:58 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 19 May 2023 03:08:58 GMT Subject: RFR: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API [v6] In-Reply-To: <6QB1ModTlrzwc-GfwGJ0U6XgRPbhJsOawE_P9yHZnAM=.7fbce6b6-1978-4570-abef-7fd9c88902d7@github.com> References: <6QB1ModTlrzwc-GfwGJ0U6XgRPbhJsOawE_P9yHZnAM=.7fbce6b6-1978-4570-abef-7fd9c88902d7@github.com> Message-ID: On Wed, 17 May 2023 01:43:13 GMT, Feilong Jiang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove trailing whitespace > > Looks good, thanks. @feilongjiang @RealFYang Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13862#issuecomment-1553937394 From dzhang at openjdk.org Fri May 19 03:12:03 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 19 May 2023 03:12:03 GMT Subject: Integrated: 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API In-Reply-To: References: Message-ID: On Mon, 8 May 2023 11:04:09 GMT, Dingli Zhang wrote: > Hi all, > > We have added support for Extract, Compress, Expand and other nodes for Vector > API. It was implemented by referring to RVV v1.0 [1]. Please take a look and > have some reviews. Thanks a lot. > > In this PR, we will support these new nodes: > > CompressM/CompressV/ExpandV > LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > Extract > VectorLongToMask/VectorMaskToLong > PopulateIndex > VectorLongToMask/VectorMaskToLong > VectorMaskTrueCount/VectorMaskFirstTrue > VectorInsert > > > At the same time, we refactored methods such as > `match_rule_supported_vector_mask`. All implemented vector nodes support mask > operations by default now, so we also added mask nodes for all implemented > nodes. > > By the way, we will implement the VectorTest node in the next PR. > > We can use the tests under `test/jdk/jdk/incubator/vector` to print the > compilation log for most of the new nodes. And we can use the following > command to print the compilation log of a jtreg test case: > > > $ jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=log_name.log \ > -jdk:build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:build/linux-x86_64-server-release/images/jdk \ > > > > > > ### CompressM/CompressV/ExpandV > > There is no inverse vdecompress provided in RVV, as this operation can be > readily synthesized using iota and a masked vrgather in `ExpandV`. > > We can use `test/jdk/jdk/incubator/vector/Float256VectorTests.java` to emit > these nodes and the compilation log is as follows: > > > ## CompressM > 2aa addi R29, R10, #16 # ptr, #@addP_reg_imm > 2ae mcompress V0, V30 # KILL R30 > 2c2 vstoremask V2, V0 > 2ce storeV [R7], V2 # vector (rvv) > 2d6 bgeu R29, R28, B47 #@cmpP_branch P=0.000100 C=-1.000000 > > ## CompressV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vcompress V1, V2, V0 > 0fe storeV [R7], V1 # vector (rvv) > 106 bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > ## ExpandV > 0ee addi R29, R10, #16 # ptr, #@addP_reg_imm > 0f2 vexpand V3, V2, V0 > 102 storeV [R7], V3 # vector (rvv) > 10a bgeu R29, R28, B10 #@cmpP_branch P=0.000100 C=-1.000000 > > > > > ### LoadVectorGather/StoreVectorScatter/LoadVectorGatherMasked/StoreVectorScatterMasked > > We use the vsoxei32_v instruction regardless of what sew is set to. The > indexMap in fromArr... This pull request has now been integrated. Changeset: 97ade57f Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/97ade57fb244b17e93b150b7f9e025a5ba906bb2 Stats: 1720 lines in 6 files changed: 1447 ins; 139 del; 134 mod 8307609: RISC-V: Added support for Extract, Compress, Expand and other nodes for Vector API Co-authored-by: zifeihan Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/13862 From vlivanov at openjdk.org Fri May 19 04:10:00 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 19 May 2023 04:10:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 12 May 2023 21:09:01 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR review 5: refactor on rematerialization & add tests. Very nice, Cesar. I like how the code shapes now. I verified that the new test cases do trigger SR+NSR scenario. How do you test that deoptimization works as expected? Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. FTR `_skip_rematerialization` flag is unused now. Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1553966589 From epeter at openjdk.org Fri May 19 05:00:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 05:00:58 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 18:52:43 GMT, Jatin Bhateja wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> added missing float/double cases to VectorNode::scalar_opcode > > src/hotspot/share/opto/loopopts.cpp line 4192: > >> 4190: >> 4191: // Convert opcode from vector-reduction -> scalar -> normal-vector-op >> 4192: const int sopc = VectorNode::scalar_opcode(last_ur->Opcode(), bt); > > Other changes looks good to me, can you rename _VectorNode::scalar_opcode_ to _ReductionNode::scalar_opcode_ > , also move out vector opcode cases into a separate vector-to-scalar mapping routine if needed. Is it not better to have `VectorNode::scalar_opcode`? It is more general - maybe it is useful in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1198544502 From epeter at openjdk.org Fri May 19 05:21:50 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 05:21:50 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Mon, 15 May 2023 07:47:19 GMT, Fei Gao wrote: >>> The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. >>> >>> But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. >>> >>> How would you improve my comments? >> >> Thanks for your clarification. >> >> Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: >> >> // >> // But with these two cases, which `VectorMaskCmp` interprets as ordered, >> // we must convert the unordered into an ordered comparison: >> // BoolTest::lt: Case -1 -> LT_U >> // BoolTest::le: Case -1, 0 -> LE_U >> // > >> @fg1417 Are you ok with how I worded it now? > > Oh, yes. Clear enough! @fg1417 Is there anything you still want me to change before you could approve this PR? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1198554476 From fgao at openjdk.org Fri May 19 06:14:01 2023 From: fgao at openjdk.org (Fei Gao) Date: Fri, 19 May 2023 06:14:01 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:45:45 GMT, Emanuel Peter wrote: >> **Bug** >> In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). >> >> The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) >> On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 >> >> The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: >> https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 >> >> The wrong results with `NaN` are because of a bug in `x`: >> https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 >> The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). >> >> **Solution** >> @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. >> >> This has a few benefits: >> - `VectorMaskCmp + VectorBlend` is more powerful: >> - `CMoveVF/D` required the same inputs to the compare than to the move itself. >> - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. >> - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). >> - We need less code (I completely removed all code for `CMoveVF/D`). >> >> I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. >> >> As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment on request of @fg1417 Nice work! Mid-end part and aarch64 part are generally good to me. Maybe you need another review from an expert on x86. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/13493#pullrequestreview-1433830934 From fgao at openjdk.org Fri May 19 06:14:02 2023 From: fgao at openjdk.org (Fei Gao) Date: Fri, 19 May 2023 06:14:02 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Mon, 15 May 2023 07:47:19 GMT, Fei Gao wrote: >>> The issue is this: `CmpF -> Bool [lt/le]` is unordered, because they both accept the return code `-1` from the `CmpF`, which also makes comparisons with `NaN` true. This means that such comparisons are `unordered`. >>> >>> But `VectorMaskCmp` would interpret `lt/le` test-codes as `ordered`, so they would return false for `NaN` comparisons. So that is why we need to make a transformation here. >>> >>> How would you improve my comments? >> >> Thanks for your clarification. >> >> Your comment is quite clear already. Maybe just highlight the mismatch between `VectorMaskCmp` and `bol_test` here, like: >> >> // >> // But with these two cases, which `VectorMaskCmp` interprets as ordered, >> // we must convert the unordered into an ordered comparison: >> // BoolTest::lt: Case -1 -> LT_U >> // BoolTest::le: Case -1, 0 -> LE_U >> // > >> @fg1417 Are you ok with how I worded it now? > > Oh, yes. Clear enough! > @fg1417 Is there anything you still want me to change before you could approve this PR? Sorry for my delay. I assume that the flags `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` over all jtreg passed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1198582643 From epeter at openjdk.org Fri May 19 06:17:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 06:17:49 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: <5q6Ofnx4dUWUeA1b8CCMyZGlfIoe2ZEZK97GrUbjqwg=.a0581e27-9900-4ab7-b377-d4636de9eb5b@github.com> Message-ID: On Fri, 19 May 2023 06:11:22 GMT, Fei Gao wrote: >>> @fg1417 Are you ok with how I worded it now? >> >> Oh, yes. Clear enough! > >> @fg1417 Is there anything you still want me to change before you could approve this PR? > > Sorry for my delay. I assume that the flags `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` over all jtreg passed. Ah yes, they have passed indeed. With and without those flags, up to tier5 and stress testing ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13493#discussion_r1198584506 From epeter at openjdk.org Fri May 19 06:22:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 06:22:49 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Thu, 11 May 2023 03:56:42 GMT, Fei Gao wrote: >> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? > >> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? > > Hi @eme64 , nice rewrite! > > BTW, have you tested your patch with `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` for all jtreg? Thanks. Thanks @fg1417 for the review! Yes, the testing passes up to at least tier5 and stress testing. With and without `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` Yes, I hope that someone from intel / x86 specialists can review this too :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1554066872 From chagedorn at openjdk.org Fri May 19 14:15:50 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 May 2023 14:15:50 GMT Subject: RFR: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes [v2] In-Reply-To: References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: On Wed, 17 May 2023 12:04:36 GMT, Christian Hagedorn wrote: >> This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). >> >> Changes include: >> - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. >> - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. >> - Cleanup of touched code (dead code, variable renaming, code style) >> - Added comments (e.g. for some special case in Loop Predication) >> >> For more background, have a look at the first PR: #13864 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Tobias' review Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14017#issuecomment-1554650369 From jbhateja at openjdk.org Fri May 19 16:02:00 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 May 2023 16:02:00 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 04:58:16 GMT, Emanuel Peter wrote: > Is it not better to have `VectorNode::scalar_opcode`? It is more general - maybe it is useful in the future. Not a blocker, but we intend to get a scalar opcode for ReductionNode, we have different factory method for Vector/Reduction Nodes, you can keep it for now Best Regards, Jatin ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1199124775 From qamai at openjdk.org Fri May 19 16:27:22 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 May 2023 16:27:22 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative Message-ID: Hi, This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. ------------- Commit messages: - flag to use imm16 - Merge branch 'master' into getandadd - fix tests - fix missing xadds_reg_no_res - Merge branch 'master' into getandadd - should not ignore blackhole - improve GetAndAdd Changes: https://git.openjdk.org/jdk/pull/14061/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14061&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308444 Stats: 273 lines in 6 files changed: 247 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/14061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14061/head:pull/14061 PR: https://git.openjdk.org/jdk/pull/14061 From sviswanathan at openjdk.org Fri May 19 17:17:51 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 19 May 2023 17:17:51 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Fri, 19 May 2023 06:20:30 GMT, Emanuel Peter wrote: >>> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? >> >> Hi @eme64 , nice rewrite! >> >> BTW, have you tested your patch with `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` for all jtreg? Thanks. > > Thanks @fg1417 for the review! > > Yes, the testing passes up to at least tier5 and stress testing. With and without `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` > > Yes, I hope that someone from intel / x86 specialists can review this too :) > These are candidates: @jatin-bhateja @sviswa7 @merykitty @eme64 I will take a look at it next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1554992957 From kvn at openjdk.org Sat May 20 00:30:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 May 2023 00:30:00 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative In-Reply-To: References: Message-ID: On Fri, 19 May 2023 16:19:42 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. Implementation looks good to me. I have few comments about test. You need second review. src/hotspot/share/opto/memnode.cpp line 3016: > 3014: } > 3015: > 3016: bool LoadStoreNode::result_not_used() const { Add comment to this method explaining what it is for and cases when it returns `true` or `false`. test/hotspot/jtreg/compiler/c2/x86/TestGetAndAdd.java line 1: > 1: /* Please, use `c2/varhandle` directory for this test. We don't use platform in dir names. The test could be updated later for other platforms. test/hotspot/jtreg/compiler/c2/x86/TestGetAndAdd.java line 34: > 32: * bug 8308444 > 33: * @summary verify that the correct node is matched for atomic getAndAdd > 34: * @requires os.arch=="amd64" | os.arch=="x86_64" You can use os.simpleArch == "x64" to cover both cases. And put second `requires` after it so they stay together. ------------- PR Review: https://git.openjdk.org/jdk/pull/14061#pullrequestreview-1435280691 PR Review Comment: https://git.openjdk.org/jdk/pull/14061#discussion_r1199508347 PR Review Comment: https://git.openjdk.org/jdk/pull/14061#discussion_r1199507955 PR Review Comment: https://git.openjdk.org/jdk/pull/14061#discussion_r1199509039 From duke at openjdk.org Sat May 20 07:35:52 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sat, 20 May 2023 07:35:52 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic Message-ID: Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. The original snippet loads the value (step 3) soon after it was written to the memory (step 2). md5_loop: __ ldrw(a, Address(state, 0)); // step 3: load the value from memory ... // loop body __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) __ addw(rscratch1, rscratch1, a); __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory ... __ br(Assembler::LE, md5_loop); The snippet is optimized to avoid memory loads and writes in the loop. __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register __ ubfx(a, s0, 0, 32); md5_loop: .. // body __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register __ addw(a, rscratch1, a); __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register .... __ br(Assembler::LE, md5_loop); .... __ str(s0, Address(state, 0)); // write the result to memory only once **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. The original snippet, generated by two `md5_FF`s and `md5_GG`s, shows the same data was repeatedly read. __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) ... __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) ... __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) ... __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) The snippet is optimized by caching the values in registers and removing the redundant loads. __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 ... __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 ... __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 ... __ ubfx(rscratch1, buf0, 32, 32); ... __ ubfx(rscratch1, buf0, 0, 32); **Test** The following tests have passed. test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java **Performance** The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests` on larger inputs. *MessageDigests.digest* improvement | | 64 | 256 | 1,024 | 4,096 | 16,384 | bytes | |----------- |---------|--------|------|--------|--------|-------| | Graviton 2 | -1.41% | 0.43% | 1.81% | 2.20% | 2.28% | | Graviton 3 | -3.63% | -0.43% | 0.73% | 1.05% | 1.14% | *MessageDigests.getAndDigest* improvement | | 64 | 256 | 1,024 | 4,096 | 16,384 | bytes | |----------- |---------|--------|-------|--------|--------|-------| | Graviton 2 | -0.97% | 0.55% | 1.46% | 1.84% | 1.91% | | Graviton 3 | -0.20% | 0.49% | 1.03% | 1.13% | 1.17% | Graviton 2 Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units ---- baseline ------------------------------------------------------------------------------------------ MessageDigests.digest md5 64 DEFAULT thrpt 15 3709.849 ? 30.327 ops/ms MessageDigests.digest md5 256 DEFAULT thrpt 15 1513.543 ? 0.616 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 15 462.135 ? 0.382 ops/ms MessageDigests.digest md5 4096 DEFAULT thrpt 15 122.360 ? 0.024 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 31.037 ? 0.010 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2902.714 ? 92.908 ops/ms MessageDigests.getAndDigest md5 256 DEFAULT thrpt 15 1395.815 ? 2.292 ops/ms MessageDigests.getAndDigest md5 1024 DEFAULT thrpt 15 448.729 ? 7.343 ops/ms MessageDigests.getAndDigest md5 4096 DEFAULT thrpt 15 120.616 ? 0.038 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 31.010 ? 0.007 ops/ms ---- optimized ----------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 3657.658 ? 40.255 ops/ms MessageDigests.digest md5 256 DEFAULT thrpt 15 1520.086 ? 6.095 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 15 470.505 ? 0.395 ops/ms MessageDigests.digest md5 4096 DEFAULT thrpt 15 125.048 ? 0.044 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 31.744 ? 0.050 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 2874.460 ? 95.028 ops/ms MessageDigests.getAndDigest md5 256 DEFAULT thrpt 15 1403.462 ? 4.536 ops/ms MessageDigests.getAndDigest md5 1024 DEFAULT thrpt 15 455.260 ? 6.794 ops/ms MessageDigests.getAndDigest md5 4096 DEFAULT thrpt 15 122.836 ? 0.046 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 31.602 ? 0.024 ops/ms Graviton 3 Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units ---- baseline ------------------------------------------------------------------------------------------ MessageDigests.digest md5 64 DEFAULT thrpt 15 4122.050 ? 8.495 ops/ms MessageDigests.digest md5 256 DEFAULT thrpt 15 1634.045 ? 0.341 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 15 490.091 ? 0.072 ops/ms MessageDigests.digest md5 4096 DEFAULT thrpt 15 129.017 ? 0.007 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 32.687 ? 0.002 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3212.170 ? 81.253 ops/ms MessageDigests.getAndDigest md5 256 DEFAULT thrpt 15 1504.159 ? 1.091 ops/ms MessageDigests.getAndDigest md5 1024 DEFAULT thrpt 15 476.164 ? 3.869 ops/ms MessageDigests.getAndDigest md5 4096 DEFAULT thrpt 15 126.983 ? 0.011 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 32.546 ? 0.004 ops/ms ---- optimized ----------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 3972.523 ? 8.753 ops/ms MessageDigests.digest md5 256 DEFAULT thrpt 15 1627.038 ? 1.855 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 15 493.648 ? 0.064 ops/ms MessageDigests.digest md5 4096 DEFAULT thrpt 15 130.371 ? 0.012 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 33.058 ? 0.002 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3205.779 ? 76.897 ops/ms MessageDigests.getAndDigest md5 256 DEFAULT thrpt 15 1511.463 ? 2.209 ops/ms MessageDigests.getAndDigest md5 1024 DEFAULT thrpt 15 481.071 ? 3.479 ops/ms MessageDigests.getAndDigest md5 4096 DEFAULT thrpt 15 128.423 ? 0.015 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 32.928 ? 0.005 ops/ms ------------- Commit messages: - 8308465: Reduce memory reads in AArch64 MD5 intrinsic Changes: https://git.openjdk.org/jdk/pull/14068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14068&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308465 Stats: 139 lines in 1 file changed: 45 ins; 8 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/14068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14068/head:pull/14068 PR: https://git.openjdk.org/jdk/pull/14068 From aph at openjdk.org Sat May 20 10:02:00 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 20 May 2023 10:02:00 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic In-Reply-To: References: Message-ID: On Sat, 20 May 2023 07:29:13 GMT, Yi-Fan Tsai wrote: > Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. > > **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. > > The original snippet loads the value (step 3) soon after it was written to the memory (step 2). > > md5_loop: > __ ldrw(a, Address(state, 0)); // step 3: load the value from memory > ... // loop body > __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) > __ addw(rscratch1, rscratch1, a); > __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory > ... > __ br(Assembler::LE, md5_loop); > > > The snippet is optimized to avoid memory loads and writes in the loop. > > __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register > __ ubfx(a, s0, 0, 32); > md5_loop: > .. // body > __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register > __ addw(a, rscratch1, a); > __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register > .... > __ br(Assembler::LE, md5_loop); > .... > __ str(s0, Address(state, 0)); // write the result to memory only once > > > **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. > > The original snippet, generated by two `md5_FF`s and `md5_GG`s, shows the same data was repeatedly read. > > __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) > > > The snippet is optimized by caching the values in registers and removing the redundant loads. > > __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 > ... > __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); > ... > __ ubfx(rscratch1, buf0, 0, 32); > > > > **Test** > The following tests have passed. > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > **Per... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3346: > 3344: assert(rs.size() == 8, "%u registers are used to cache 16 4-byte data", rs.size()); > 3345: auto it = rs.begin(); > 3346: for (int i = 0; i < 8; ++i, ++it) { We don't need a counter here. Just loop over the `RegSet`. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3358: > 3356: > 3357: // Generate code extracting i-th unsigned word (4 bytes) from cached 64 bytes. > 3358: void gen_unsigned_word_extract(Register dest, int i) { Suggestion: void extract_u32(Register dest, int i) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199590247 PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199590448 From aph at openjdk.org Sat May 20 10:06:48 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 20 May 2023 10:06:48 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic In-Reply-To: References: Message-ID: On Sat, 20 May 2023 07:29:13 GMT, Yi-Fan Tsai wrote: > Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. > > **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. > > The original snippet loads the value (step 3) soon after it was written to the memory (step 2). > > md5_loop: > __ ldrw(a, Address(state, 0)); // step 3: load the value from memory > ... // loop body > __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) > __ addw(rscratch1, rscratch1, a); > __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory > ... > __ br(Assembler::LE, md5_loop); > > > The snippet is optimized to avoid memory loads and writes in the loop. > > __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register > __ ubfx(a, s0, 0, 32); > md5_loop: > .. // body > __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register > __ addw(a, rscratch1, a); > __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register > .... > __ br(Assembler::LE, md5_loop); > .... > __ str(s0, Address(state, 0)); // write the result to memory only once > > > **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. > > The original snippet, generated by two `md5_FF`s and `md5_GG`s, shows the same data was repeatedly read. > > __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) > > > The snippet is optimized by caching the values in registers and removing the redundant loads. > > __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 > ... > __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); > ... > __ ubfx(rscratch1, buf0, 0, 32); > > > > **Test** > The following tests have passed. > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > **Per... In general this patch looks pretty good. Just a few minor nits. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3458: > 3456: RegSet saved_regs = RegSet::range(r16, r22) - r18_tls; > 3457: Cached64Bytes reg_cache(_masm, buf, RegSet::of(r14, r15) + saved_regs); > 3458: Perhaps add a note here to the effect that the rest of this patch requires there to be **exactly** 8 registers in this set. Maybe assert that here? ------------- PR Review: https://git.openjdk.org/jdk/pull/14068#pullrequestreview-1435389799 PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199590951 From aph at openjdk.org Sat May 20 10:28:48 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 20 May 2023 10:28:48 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic In-Reply-To: References: Message-ID: On Sat, 20 May 2023 07:29:13 GMT, Yi-Fan Tsai wrote: > Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. > > **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. > > The original snippet loads the value (step 3) soon after it was written to the memory (step 2). > > md5_loop: > __ ldrw(a, Address(state, 0)); // step 3: load the value from memory > ... // loop body > __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) > __ addw(rscratch1, rscratch1, a); > __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory > ... > __ br(Assembler::LE, md5_loop); > > > The snippet is optimized to avoid memory loads and writes in the loop. > > __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register > __ ubfx(a, s0, 0, 32); > md5_loop: > .. // body > __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register > __ addw(a, rscratch1, a); > __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register > .... > __ br(Assembler::LE, md5_loop); > .... > __ str(s0, Address(state, 0)); // write the result to memory only once > > > **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. > > The original snippet, generated by two `md5_FF`s and `md5_GG`s, shows the same data was repeatedly read. > > __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) > > > The snippet is optimized by caching the values in registers and removing the redundant loads. > > __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 > ... > __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); > ... > __ ubfx(rscratch1, buf0, 0, 32); > > > > **Test** > The following tests have passed. > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > **Per... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3553: > 3551: __ addw(c, rscratch3, c); > 3552: __ addw(d, rscratch4, d); > 3553: Suggestion: __ addw(a, state_regs[0], a); __ ubfx(rscratch2, state_regs[0], 32, 32); __ addw(b, rscratch2, b); __ addw(c, state_regs[1], c); __ ubfx(rscratch4, state_regs[1], 32, 32); __ addw(d, rscratch4, d); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199593377 From duke at openjdk.org Sat May 20 20:11:52 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sat, 20 May 2023 20:11:52 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic [v2] In-Reply-To: References: Message-ID: > Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. > > **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. > > The original snippet loads the value (step 3) soon after it was written to the memory (step 2). > > md5_loop: > __ ldrw(a, Address(state, 0)); // step 3: load the value from memory > ... // loop body > __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) > __ addw(rscratch1, rscratch1, a); > __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory > ... > __ br(Assembler::LE, md5_loop); > > > The snippet is optimized to avoid memory loads and writes in the loop. > > __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register > __ ubfx(a, s0, 0, 32); > md5_loop: > .. // body > __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register > __ addw(a, rscratch1, a); > __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register > .... > __ br(Assembler::LE, md5_loop); > .... > __ str(s0, Address(state, 0)); // write the result to memory only once > > > **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. > > The original snippet, generated by two `md5_FF`s and `md5_GG`s, shows the same data was repeatedly read. > > __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) > > > The snippet is optimized by caching the values in registers and removing the redundant loads. > > __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 > ... > __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); > ... > __ ubfx(rscratch1, buf0, 0, 32); > > > > **Test** > The following tests have passed. > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > **Per... Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Rename and optimize ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14068/files - new: https://git.openjdk.org/jdk/pull/14068/files/c9ae28a1..0fcb9d42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14068&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14068&range=00-01 Stats: 22 lines in 1 file changed: 2 ins; 6 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14068/head:pull/14068 PR: https://git.openjdk.org/jdk/pull/14068 From duke at openjdk.org Sat May 20 20:11:53 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sat, 20 May 2023 20:11:53 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic [v2] In-Reply-To: References: Message-ID: On Sat, 20 May 2023 10:03:32 GMT, Andrew Haley wrote: >> Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and optimize > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3458: > >> 3456: RegSet saved_regs = RegSet::range(r16, r22) - r18_tls; >> 3457: Cached64Bytes reg_cache(_masm, buf, RegSet::of(r14, r15) + saved_regs); >> 3458: > > Perhaps add a note here to the effect that the rest of this patch requires there to be **exactly** 8 registers in this set. Maybe assert that here? This requirement has been asserted in the constructor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199654480 From aph at openjdk.org Sun May 21 09:34:48 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 21 May 2023 09:34:48 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic [v2] In-Reply-To: References: Message-ID: On Sat, 20 May 2023 20:07:01 GMT, Yi-Fan Tsai wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3458: >> >>> 3456: RegSet saved_regs = RegSet::range(r16, r22) - r18_tls; >>> 3457: Cached64Bytes reg_cache(_masm, buf, RegSet::of(r14, r15) + saved_regs); >>> 3458: >> >> Perhaps add a note here to the effect that the rest of this patch requires there to be **exactly** 8 registers in this set. Maybe assert that here? > > This requirement has been asserted in the constructor. Sure, but a note here would have made it easier to understand. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14068#discussion_r1199730360 From aph at openjdk.org Sun May 21 09:58:49 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 21 May 2023 09:58:49 GMT Subject: RFR: 8308465: Reduce memory reads in AArch64 MD5 intrinsic [v2] In-Reply-To: References: Message-ID: On Sat, 20 May 2023 20:11:52 GMT, Yi-Fan Tsai wrote: >> Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. >> >> **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. >> >> The original snippet loaded the value (step 3) soon after it was written to the memory (step 2). >> >> md5_loop: >> __ ldrw(a, Address(state, 0)); // step 3: load the value from memory >> ... // loop body >> __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) >> __ addw(rscratch1, rscratch1, a); >> __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory >> ... >> __ br(Assembler::LE, md5_loop); >> >> >> The snippet is optimized to avoid memory loads and writes in the loop. >> >> __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register >> __ ubfx(a, s0, 0, 32); >> md5_loop: >> .. // body >> __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register >> __ addw(a, rscratch1, a); >> __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register >> .... >> __ br(Assembler::LE, md5_loop); >> .... >> __ str(s0, Address(state, 0)); // write the result to memory only once >> >> >> **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. >> >> The original snippet, generated by two `md5_FF`s and `md5_GG`s, read the same data repeatedly. >> >> __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) >> >> >> The snippet is optimized by caching the values in registers and removing the redundant loads. >> >> __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 >> ... >> __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 >> ... >> __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 >> ... >> __ ubfx(rscratch1, buf0, 32, 32); >> ... >> __ ubfx(rscratch1, buf0, 0, 32); >> >> >> >> **Test** >> The following tests have passed. >> >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics... > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Rename and optimize Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14068#pullrequestreview-1435507726 From jiangli at openjdk.org Mon May 22 04:03:42 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 22 May 2023 04:03:42 GMT Subject: RFR: 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' Message-ID: Trivial fix with casting `strlen` return value to `int`. ------------- Commit messages: - 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' Changes: https://git.openjdk.org/jdk/pull/14074/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14074&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308458 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14074.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14074/head:pull/14074 PR: https://git.openjdk.org/jdk/pull/14074 From jiefu at openjdk.org Mon May 22 05:58:49 2023 From: jiefu at openjdk.org (Jie Fu) Date: Mon, 22 May 2023 05:58:49 GMT Subject: RFR: 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' In-Reply-To: References: Message-ID: On Mon, 22 May 2023 03:56:05 GMT, Jiangli Zhou wrote: > Trivial fix with casting `strlen` return value to `int`. Looks good and trivial. ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14074#pullrequestreview-1435817056 From epeter at openjdk.org Mon May 22 06:23:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:23:52 GMT Subject: RFR: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java [v2] In-Reply-To: <-gNauO0wYVhnu4p1bvZVjqMkXb3XppyehYD2qfdO5Gw=.c053171f-6c86-487d-973b-c24fe0ce4607@github.com> References: <-gNauO0wYVhnu4p1bvZVjqMkXb3XppyehYD2qfdO5Gw=.c053171f-6c86-487d-973b-c24fe0ce4607@github.com> Message-ID: On Mon, 15 May 2023 11:18:26 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with @chhagedorn's suggestions > > That looks good to me! As we've discussed offline, I'm also afraid, that there are more such cases where we do not handle `top` correctly during CCP. Might be worth to further investigate at some point. @chhagedorn @TobiHartmann thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13908#issuecomment-1556606708 From epeter at openjdk.org Mon May 22 06:26:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:26:58 GMT Subject: Integrated: 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java In-Reply-To: References: Message-ID: On Wed, 10 May 2023 14:37:35 GMT, Emanuel Peter wrote: > **The Problem** > > During CCP, we get to a state like that: > > x (int:1) Phi (int:4) > | | > | +-----+ > | | > LShiftI (int:16) > | > CastII (top) ConI (int:3) > | | > +----+ +---------+ > | | > AndI > > > We call `AddINode::Value` during CCP, and in `MulNode::AndIL_shift_and_mask_is_always_zero` we `uncast` both inputs, which leaves us with `LShiftI` and `ConI` as the "true" inputs. They both have non-top types, and so we determine that this `AndI-LShiftI` combination always leads to `zero`: The `Phi` has a constant type (`int:4`). So this leaves the lowest 4 bits zero after the `LShiftI`. Then and-ing that with `int:3` means we extract the lowest 3 bits that are zero. So the result is provably always zero - that is the idea. > > Then, we have some type updates (here of `x` and `Phi` and `LShiftI`), and the graph looks like this: > > x (int) Phi (int:0..4) > | | > | +-----+ > | | > LShiftI (int) > | > CastII (top) ConI (int:3) > | | > +----+ +---------+ > | | > AndI > > > This leads to `shift2` failing to have constant type: > https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L1964-L1967 > > And with that, we fall back to `MulNode::Value`: > https://github.com/openjdk/jdk/blob/a6b72f56f56b4f33ac163e90b115d79b2b844999/src/hotspot/share/opto/mulnode.cpp#L559-L566 > > In `MulNode::Value` we detect that the `CastII` has type `top`, and return `top` for `AndI`. > > CCP expects the types to become more wide over time, so going from `int:0` to `top` is the wrong direction. > > **Solution** > > The problem is with the relatively rare `CastII` still being `top` - this seems to be very rare. But the new regression test `TestShiftCastAndNotification.java` seems to create exactly that case, in combination with `-XX:StressCCP`. > > We should guard against `top` in one of the `AndI` inputs inside `MulNode::AndIL_shift_and_mask_is_always_zero`. This will prevent it from detecting the zero-case, untill `MulNode::Value` would get a chance to compute a non-top type. > > **Argument for Solution** > > Is there still a threat from `MulNode::AndIL_shift_and_mask_is_always_zero` computing a zero first, and `MulNode::Value` a type that does not include zero after ward? > As types only widen during CCP, having a zero first means that all inputs now are non-top - in fact they are all `T_INT`. Since types only widen in the input... This pull request has now been integrated. Changeset: b6a9f5c3 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/b6a9f5c304d9ffe74161d25af84f7c5bc1c09b33 Stats: 10 lines in 1 file changed: 7 ins; 2 del; 1 mod 8307619: C2 failed: Not monotonic (AndI CastII LShiftI) in TestShiftCastAndNotification.java Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13908 From epeter at openjdk.org Mon May 22 06:30:00 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:30:00 GMT Subject: RFR: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 15:47:29 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - merge from master after Assertion Predicate renaming >> - 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate > > Marked as reviewed by kvn (Reviewer). @vnkozlov @TobiHartmann @chhagedorn Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13980#issuecomment-1556609843 From epeter at openjdk.org Mon May 22 06:30:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:30:02 GMT Subject: Integrated: 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:12:42 GMT, Emanuel Peter wrote: > Bug fixing to make the verification pass for https://github.com/openjdk/jdk/pull/13951 [JDK-8305073](https://bugs.openjdk.org/browse/JDK-8305073). > > There only seemed to be one bug with idom that I could find up to **tier6 and stress testing**. That one bug already showed up with a simple `java -Xcomp --version`. But it is possible that there are more that we would find in the future, maybe with the fuzzer. > > **Details about the bug** I fixed in `PhaseIdealLoop::create_new_if_for_predicate`: > We computed the `dom_lca_internal` for `rgn` too early - the following line can change the CFG such that the idom would change: > https://github.com/openjdk/jdk/blob/1e1abc4c086298060ccb13b63f646a298bbe3ef7/src/hotspot/share/opto/loopPredicate.cpp#L216 > > So I moved the idom computation down, until after we do not change the CFG anymore, and idom should be stable from there on. This pull request has now been integrated. Changeset: 41beb448 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/41beb448d2ac5d432558f25362a787a9511a5d83 Stats: 15 lines in 1 file changed: 8 ins; 7 del; 0 mod 8308084: C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate Reviewed-by: chagedorn, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13980 From epeter at openjdk.org Mon May 22 06:37:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:37:11 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v4] In-Reply-To: References: Message-ID: > This is the second step in the `VerifyLoopOptimizations` revival. > > Last step: > [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure > See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 > > Bug fixing for this step: > [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate > (https://github.com/openjdk/jdk/pull/13980) > > Next step: > [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop > > I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into JDK-8305073 - add TestVerifyLoopOptimizations.java - remove bug fix, is fixed in JDK-8308084 - 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom ------------- Changes: https://git.openjdk.org/jdk/pull/13951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13951&range=03 Stats: 52 lines in 2 files changed: 38 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13951/head:pull/13951 PR: https://git.openjdk.org/jdk/pull/13951 From epeter at openjdk.org Mon May 22 06:37:12 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:37:12 GMT Subject: RFR: 8305073: Fix VerifyLoopOptimizations - step 2 - verify idom [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:38:52 GMT, Emanuel Peter wrote: >> This is the second step in the `VerifyLoopOptimizations` revival. >> >> Last step: >> [JDK-8173709](https://bugs.openjdk.org/browse/JDK-8173709) Fix VerifyLoopOptimizations - step 1 - minimal infrastructure >> See PR for all the planned steps: https://github.com/openjdk/jdk/pull/13207 >> >> Bug fixing for this step: >> [JDK-8308084](https://bugs.openjdk.org/browse/JDK-8308084) C2 fix idom bug in PhaseIdealLoop::create_new_if_for_predicate >> (https://github.com/openjdk/jdk/pull/13980) >> >> Next step: >> [JDK-8307982](https://bugs.openjdk.org/browse/JDK-8307982) Fix VerifyLoopOptimizations - step 3 - fix ctrl/loop >> >> I added `TestVerifyLoopOptimizations.java` per @robcasloz 's request. It works just like `TestVerifyIterativeGVN.java`, with a simple `-Xcomp -XX:+VerifyLoopOptimizations` on a basically empty test. It fails until this patch is integrated: https://github.com/openjdk/jdk/pull/13980 > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > add TestVerifyLoopOptimizations.java Update: the bug fix is pushed (https://github.com/openjdk/jdk/pull/13980) now I can run testing here again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13951#issuecomment-1556618298 From epeter at openjdk.org Mon May 22 06:52:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:52:55 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: <9NJQ9_MLd4zfnEGSrEIDLbnU2doHmhvpLEkAdpjDIK4=.29100439-9c75-43d2-81b0-8aee43b37663@github.com> On Fri, 12 May 2023 00:44:09 GMT, Sandhya Viswanathan wrote: >> @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. > > @eme64 Very nice and clean work. Thanks a lot for taking this up. @sviswa7 @pfustc @vnkozlov @jatin-bhateja Thanks for all the help! Let me know if there is still any concern, otherwise I will integrate this in 24h. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1556635350 From epeter at openjdk.org Mon May 22 06:52:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 06:52:58 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v9] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 15:59:08 GMT, Jatin Bhateja wrote: >> Is it not better to have `VectorNode::scalar_opcode`? It is more general - maybe it is useful in the future. > >> Is it not better to have `VectorNode::scalar_opcode`? It is more general - maybe it is useful in the future. > > Not a blocker, but we intend to get a scalar opcode for ReductionNode, we have different factory method for Vector/Reduction Nodes, you can keep it for now > > Best Regards, > Jatin @jatin-bhateja I see your point. On the other hand, we would have quite some code duplication handling all the BasicType cases for every operation. I'll leave it the way I have it now, and we can still reconsider it if we want to in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13056#discussion_r1200037772 From aph at openjdk.org Mon May 22 09:52:58 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 May 2023 09:52:58 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: On Fri, 12 May 2023 00:44:09 GMT, Sandhya Viswanathan wrote: >> @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. > > @eme64 Very nice and clean work. Thanks a lot for taking this up. > > @sviswa7 thanks for your quick response! > > I can confirm: we do not "intrinsify" (ie turn into `MinL/MaxL`), rather we just inline the `java.lang.Math::Min/Max` methods, implemented with `CmpL` / `If`-branching. Do you think this makes sense, or should we intrinsify, at least when the hardware supports it? > > @eme64 We should intrinsify MinL/MaxL when the hardware supports it. I doubt it, unless there really is a performance payoff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1556906978 From eliu at openjdk.org Mon May 22 11:46:54 2023 From: eliu at openjdk.org (Eric Liu) Date: Mon, 22 May 2023 11:46:54 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: <-bS3aIFdUhBDKMi5yZO9rHTnaPvEi3OSewrDNZfgau8=.e309e65b-a566-456b-bb45-2db59ad460fd@github.com> On Thu, 18 May 2023 09:50:13 GMT, Chang Peng wrote: >> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. >> >> For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. >> >> However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. >> >> This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. >> >> For example, >> >> >> var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); >> m.not().trueCount(); >> >> >> will produce following assembly on a Neon machine before this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> xtn v16.4h, v16.4s >> xtn v16.8b, v16.8h >> neg v16.8b, v16.8b // VectorStoreMask >> addv b17, v16.8b >> umov w0, v17.b[0] // VectorMask.trueCount() >> ... >> >> >> After this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> addv s17, v16.4s >> smov x0, v17.b[0] >> neg x0, x0 // Optimized VectorMask.trueCount() >> ... >> >> >> In this case, we can save two xtn insns. >> >> Performance: >> >> Benchmark Before After Unit >> testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms >> testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms >> testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms >> >> [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vect... > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update benchmark to avoid potential optimization LGTM. ------------- Marked as reviewed by eliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/13974#pullrequestreview-1436444533 From dnsimon at openjdk.org Mon May 22 12:26:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 22 May 2023 12:26:54 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v3] In-Reply-To: <9AgB36HtSjwiSZMH3rTs2FolM3LDZeTm0dR-lk3m1FE=.b2b773a7-879b-48e4-80d1-7132c1a2b256@github.com> References: <9AgB36HtSjwiSZMH3rTs2FolM3LDZeTm0dR-lk3m1FE=.b2b773a7-879b-48e4-80d1-7132c1a2b256@github.com> Message-ID: On Tue, 16 May 2023 22:04:54 GMT, Doug Simon wrote: >> When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). >> >> This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): >> >> JVMCI Events (11 events): >> ... >> Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError >> Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) >> >> >> It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: >> >> COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] >> >> >> [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > replace JVMCICompileMethodExceptionIsFatal VM flag with test.jvmci.compileMethodExceptionIsFatal system property src/hotspot/share/jvmci/jvmciRuntime.cpp line 2048: > 2046: (jlong) compile_state, compile_state->task()->compile_id()); > 2047: if (JVMCIENV->has_pending_exception()) { > 2048: const char* val = Arguments::PropertyList_get_value(Arguments::system_properties(), "test.jvmci.compileMethodExceptionIsFatal"); Note that this view on system properties is restricted to properties set at VM startup (e.g. on the command line) and will not see the result of calls to `System.setProperty()` made by an application. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14000#discussion_r1200441783 From dnsimon at openjdk.org Mon May 22 12:59:06 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 22 May 2023 12:59:06 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v4] In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 - avoid VM exit in more cases when creating or attaching to a libjvmci isolate - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 - make JMCI more robust in low resource conditions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13905/files - new: https://git.openjdk.org/jdk/pull/13905/files/ef9ac32d..94f4ba18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=02-03 Stats: 15911 lines in 500 files changed: 9445 ins; 4757 del; 1709 mod Patch: https://git.openjdk.org/jdk/pull/13905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13905/head:pull/13905 PR: https://git.openjdk.org/jdk/pull/13905 From dnsimon at openjdk.org Mon May 22 13:00:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 22 May 2023 13:00:51 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v4] In-Reply-To: References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Wed, 17 May 2023 17:32:33 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 >> - avoid VM exit in more cases when creating or attaching to a libjvmci isolate >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 >> - make JMCI more robust in low resource conditions > > Marked as reviewed by never (Reviewer). @tkrodriguez could you please review https://github.com/openjdk/jdk/pull/13905/commits/fb2250921fc731a47a945a3e578b384a6849c331 which I added to handle cases such as https://github.com/oracle/graal/discussions/6216#discussioncomment-5942123. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13905#issuecomment-1557174453 From roland at openjdk.org Mon May 22 15:10:33 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 22 May 2023 15:10:33 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v3] In-Reply-To: References: Message-ID: > pre/main/post loops are created for an inner loop of a loop nest but > assert predicates cause the main and post loops to be removed. The > OpaqueZeroTripGuard nodes for the loops are not removed: there's no > logic to trigger removal of the opaque nodes once the loops are no > longer there. With the inner loops gone, the outer loop becomes > candidate for optimizations and is unrolled which causes the zero trip > guards of the now removed loops to be duplicated and the opaque nodes > to have more than one use. > > The fix I propose is, using logic similar to > `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop > opts if every OpaqueZeroTripGuard node guards a loop and if not, > remove it. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - test failure - Merge branch 'master' into JDK-8305189 - review - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13901/files - new: https://git.openjdk.org/jdk/pull/13901/files/d360f92a..8f37817c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=01-02 Stats: 141494 lines in 2421 files changed: 107547 ins; 17403 del; 16544 mod Patch: https://git.openjdk.org/jdk/pull/13901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13901/head:pull/13901 PR: https://git.openjdk.org/jdk/pull/13901 From roland at openjdk.org Mon May 22 15:33:54 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 22 May 2023 15:33:54 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> Message-ID: On Mon, 15 May 2023 13:39:12 GMT, Tobias Hartmann wrote: > I see the following failure with `TestMissingMulLOptimization` from JDK-8299546 and `-XX:StressLongCountedLoop=2000000`: What happens here is that a counted loop ends up in an infinite loop. So the `IdealLoopTree` tree object is unreachable from the root loop (for this round of loop opts as a `NeverBranch` is added so the `IdealLoopTree` should become reachable again at next pass of loop opts). So the new logic removes an opaque node for a loop that still exists. I suppose it's rare and mostly harmless and I tweaked the assert to cover that corner case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1557431091 From never at openjdk.org Mon May 22 16:04:00 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 22 May 2023 16:04:00 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v4] In-Reply-To: References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Mon, 22 May 2023 12:59:06 GMT, Doug Simon wrote: >> This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: >> * Tracks upcalls into libjvmci or creation of libjvmci. >> * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). >> >> When JVMCI compilation is disabled, a warning is emitted: >> >> [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. >> >> >> With `-Xlog:jit+compilation`, the extra detail shown is: >> >> [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I >> Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError >> java.lang.InternalError: aborting compilation of HotSpotMethod()> >> at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) >> >> >> Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 > - avoid VM exit in more cases when creating or attaching to a libjvmci isolate > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 > - make JMCI more robust in low resource conditions I think the new changes look ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13905#issuecomment-1557486091 From dnsimon at openjdk.org Mon May 22 16:11:49 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 22 May 2023 16:11:49 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v5] In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). Doug Simon has updated the pull request incrementally with one additional commit since the last revision: [skip ci] update copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13905/files - new: https://git.openjdk.org/jdk/pull/13905/files/94f4ba18..d547c457 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13905&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13905/head:pull/13905 PR: https://git.openjdk.org/jdk/pull/13905 From dnsimon at openjdk.org Mon May 22 16:11:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 22 May 2023 16:11:51 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v5] In-Reply-To: References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Mon, 22 May 2023 16:06:31 GMT, Doug Simon wrote: >> This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: >> * Tracks upcalls into libjvmci or creation of libjvmci. >> * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). >> >> When JVMCI compilation is disabled, a warning is emitted: >> >> [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. >> >> >> With `-Xlog:jit+compilation`, the extra detail shown is: >> >> [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I >> Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError >> java.lang.InternalError: aborting compilation of HotSpotMethod()> >> at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) >> >> >> Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] update copyright headers src/hotspot/share/jvmci/jvmciRuntime.cpp line 1233: > 1231: JavaVM* javaVM = _shared_library_javavm; > 1232: if (javaVM == nullptr) { > 1233: const char* val = Arguments::PropertyList_get_value(Arguments::system_properties(), "test.jvmci.forceEnomemOnLibjvmciInit"); Note that this view on system properties is restricted to properties set at VM startup (e.g. on the command line) and will not see the result of calls to `System.setProperty()` made by an application. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13905#discussion_r1200728362 From jiangli at openjdk.org Mon May 22 16:24:50 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 22 May 2023 16:24:50 GMT Subject: RFR: 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' In-Reply-To: References: Message-ID: <0aoq7t6WaprZ300u7cCwL0wQhztYTc6OsiM8QWvM9q8=.0bde9935-a19a-48e6-b044-1110694ba4b7@github.com> On Mon, 22 May 2023 03:56:05 GMT, Jiangli Zhou wrote: > Trivial fix with casting `strlen` return value to `int`. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14074#issuecomment-1557529845 From jiangli at openjdk.org Mon May 22 16:28:01 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 22 May 2023 16:28:01 GMT Subject: Integrated: 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' In-Reply-To: References: Message-ID: On Mon, 22 May 2023 03:56:05 GMT, Jiangli Zhou wrote: > Trivial fix with casting `strlen` return value to `int`. This pull request has now been integrated. Changeset: 491bdeaa Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/491bdeaa90aaafd15615d2c4e42aaff5940938e3 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8308458: Windows build failure with disassembler.cpp(792): warning C4267: '=': conversion from 'size_t' to 'int' Reviewed-by: jiefu ------------- PR: https://git.openjdk.org/jdk/pull/14074 From phh at openjdk.org Mon May 22 16:53:53 2023 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 22 May 2023 16:53:53 GMT Subject: RFR: 8308465: Reduce memory accesses in AArch64 MD5 intrinsic [v2] In-Reply-To: References: Message-ID: On Sat, 20 May 2023 20:11:52 GMT, Yi-Fan Tsai wrote: >> Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. >> >> **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. >> >> The original snippet loaded the value (step 3) soon after it was written to the memory (step 2). >> >> md5_loop: >> __ ldrw(a, Address(state, 0)); // step 3: load the value from memory >> ... // loop body >> __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) >> __ addw(rscratch1, rscratch1, a); >> __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory >> ... >> __ br(Assembler::LE, md5_loop); >> >> >> The snippet is optimized to avoid memory loads and writes in the loop. >> >> __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register >> __ ubfx(a, s0, 0, 32); >> md5_loop: >> .. // body >> __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register >> __ addw(a, rscratch1, a); >> __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register >> .... >> __ br(Assembler::LE, md5_loop); >> .... >> __ str(s0, Address(state, 0)); // write the result to memory only once >> >> >> **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. >> >> The original snippet, generated by two `md5_FF`s and `md5_GG`s, read the same data repeatedly. >> >> __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) >> ... >> __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) >> >> >> The snippet is optimized by caching the values in registers and removing the redundant loads. >> >> __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 >> ... >> __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 >> ... >> __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 >> ... >> __ ubfx(rscratch1, buf0, 32, 32); >> ... >> __ ubfx(rscratch1, buf0, 0, 32); >> >> >> >> **Test** >> The following tests have passed. >> >> test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java >> test/hotspot/jtreg/compiler/intrinsics... > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Rename and optimize Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14068#pullrequestreview-1437042848 From duke at openjdk.org Mon May 22 16:56:58 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Mon, 22 May 2023 16:56:58 GMT Subject: Integrated: 8308465: Reduce memory accesses in AArch64 MD5 intrinsic In-Reply-To: References: Message-ID: On Sat, 20 May 2023 07:29:13 GMT, Yi-Fan Tsai wrote: > Two optimizations have been implemented in this change to reduce memory reads in AArch64 MD5 intrinsic. > > **Optimization 1:** Memory loads and stores updating hash values are moved out of the loop. The final results are only written to memory once. > > The original snippet loaded the value (step 3) soon after it was written to the memory (step 2). > > md5_loop: > __ ldrw(a, Address(state, 0)); // step 3: load the value from memory > ... // loop body > __ ldrw(rscratch1, Address(state, 0)); // step 1: load the value at Address(state, 0) > __ addw(rscratch1, rscratch1, a); > __ strw(rscratch1, Address(state, 0)); // step 2: write the value to memory > ... > __ br(Assembler::LE, md5_loop); > > > The snippet is optimized to avoid memory loads and writes in the loop. > > __ ldp(s0, s1, Address(state, 0)); // load the value at Address(state, 0) to a register > __ ubfx(a, s0, 0, 32); > md5_loop: > .. // body > __ ubfx(rscratch1, s0, 0, 32); // step 1: extract the value from the register > __ addw(a, rscratch1, a); > __ orr(s0, a, b, Assembler::LSL, 32); // step 2: preserve the value in the register > .... > __ br(Assembler::LE, md5_loop); > .... > __ str(s0, Address(state, 0)); // write the result to memory only once > > > **Optimization 2**: Redundant loads generated by `md5_GG`, `md5_HH`, and `md5_II` are removed. > > The original snippet, generated by two `md5_FF`s and `md5_GG`s, read the same data repeatedly. > > __ ldrw(rscratch1, Address(buf, 0)); // from md5_FF(.., k = 0, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_FF(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 4)); // from md5_GG(.., k = 1, ..) > ... > __ ldrw(rscratch1, Address(buf, 0)); // from md5_GG(.., k = 0, ..) > > > The snippet is optimized by caching the values in registers and removing the redundant loads. > > __ ldp (buf0, buf1, Address(buf, 0)); // load both values into buf0 > ... > __ ubfx(rscratch1, buf0, 0, 32); // extract the value of k = 0 from the lower 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); // extract the value of k = 1 from the higher 32 bits of buf0 > ... > __ ubfx(rscratch1, buf0, 32, 32); > ... > __ ubfx(rscratch1, buf0, 0, 32); > > > > **Test** > The following tests have passed. > > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > **Performance*... This pull request has now been integrated. Changeset: 8474e693 Author: Yi-Fan Tsai Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/8474e693b4404ba62927fe0e43e68b904d66fbde Stats: 138 lines in 1 file changed: 44 ins; 11 del; 83 mod 8308465: Reduce memory accesses in AArch64 MD5 intrinsic Reviewed-by: aph, phh ------------- PR: https://git.openjdk.org/jdk/pull/14068 From sviswanathan at openjdk.org Mon May 22 17:05:15 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 May 2023 17:05:15 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized Message-ID: This PR fixes the problem with double reduction on x86_64. In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. With this PR the vector_reduction_double node is generated. Please review. Best Regards, Sandhya ------------- Commit messages: - Update IR test - 8300865: C2: product reduction in ProdRed_Double is not vectorized Changes: https://git.openjdk.org/jdk/pull/14065/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8300865 Stats: 9 lines in 2 files changed: 6 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14065/head:pull/14065 PR: https://git.openjdk.org/jdk/pull/14065 From never at openjdk.org Mon May 22 17:40:55 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 22 May 2023 17:40:55 GMT Subject: RFR: 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 Message-ID: adjust requires declaration ------------- Commit messages: - 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 Changes: https://git.openjdk.org/jdk/pull/14091/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14091&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308291 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14091.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14091/head:pull/14091 PR: https://git.openjdk.org/jdk/pull/14091 From cslucas at openjdk.org Mon May 22 18:00:00 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 22 May 2023 18:00:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 19 May 2023 04:06:47 GMT, Vladimir Ivanov wrote: > I verified that the new test cases do trigger SR+NSR scenario. > > How do you test that deoptimization works as expected? > I have a copy of the tests in AllocationMergesTests.java in a separate file (not included in this PR) and I run the tests with a tool that compares the output of the test with RAM enabled and disabled. So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. > Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. > I'll take care of that. I was testing only with PrintDebugInfo. > FTR `_skip_rematerialization` flag is unused now. > yeah, I forgot to remove that. Thanks. > Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. > Sounds like a good idea. I'll do that. Thanks. > Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? I don't think so. This current patch only handle Phis that don't have NULL as input. As part of the reduction process we set at least one of the reducible Phi inputs to NULL. Therefore, subsequent iterations of EA won't reduce the same Phi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1557655811 From jsjolen at openjdk.org Mon May 22 19:00:31 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 22 May 2023 19:00:31 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ [v2] In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14009/files - new: https://git.openjdk.org/jdk/pull/14009/files/241df265..ace81e7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14009&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14009&range=00-01 Stats: 27 lines in 9 files changed: 0 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/14009.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14009/head:pull/14009 PR: https://git.openjdk.org/jdk/pull/14009 From jsjolen at openjdk.org Mon May 22 19:45:00 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 22 May 2023 19:45:00 GMT Subject: RFR: 8300086: Replace NULL with nullptr in share/c1/ [v2] In-Reply-To: References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Mon, 22 May 2023 19:00:31 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fixes Passes tier1! Thank you for the in-depth reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14009#issuecomment-1557839963 From jsjolen at openjdk.org Mon May 22 19:45:01 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 22 May 2023 19:45:01 GMT Subject: Integrated: 8300086: Replace NULL with nullptr in share/c1/ In-Reply-To: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> References: <-Lzk6hh6ABzc4zZxBBBvfn1FjLKeveb44691GlIm-S8=.7d89e9e1-b2b3-4d76-b4f2-73f491585dc0@github.com> Message-ID: On Tue, 16 May 2023 12:08:47 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/c1. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 90d5041b Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/90d5041b6a055d6266140ffea2aa9a3b08b32209 Stats: 1549 lines in 33 files changed: 0 ins; 0 del; 1549 mod 8300086: Replace NULL with nullptr in share/c1/ Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14009 From kvn at openjdk.org Mon May 22 20:31:47 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 May 2023 20:31:47 GMT Subject: RFR: 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 In-Reply-To: References: Message-ID: On Mon, 22 May 2023 17:33:43 GMT, Tom Rodriguez wrote: > adjust requires declaration Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14091#pullrequestreview-1437445212 From fgao at openjdk.org Tue May 23 03:02:49 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 23 May 2023 03:02:49 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized In-Reply-To: References: Message-ID: <5RPtOn1e4TR6lq06RFrXlo4M1UbFj4M0ixalBtfTsA8=.2a61c0ec-55cb-486d-8d30-174c52738228@github.com> On Fri, 19 May 2023 23:27:32 GMT, Sandhya Viswanathan wrote: > This PR fixes the problem with double reduction on x86_64. > > In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: > jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java > The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. > > This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. > > With this PR the vector_reduction_double node is generated. > > Please review. > > Best Regards, > Sandhya src/hotspot/share/opto/superword.cpp line 3724: > 3722: // For vector reduction implemented check we need atleast two elements. > 3723: int min_vec_size = MAX2(Matcher::min_vector_size(bt), 2); > 3724: if (ReductionNode::implemented(use->Opcode(), min_vec_size, bt)) { Hi @sviswa7, can we use `superword_max_vector_size()` as the input here? `MAX2(Matcher::min_vector_size(bt), 2);` may not tally with the actual situation on other 64-bit platforms. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14065#discussion_r1201446626 From chagedorn at openjdk.org Tue May 23 06:36:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 May 2023 06:36:53 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v3] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 15:10:33 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - test failure > - Merge branch 'master' into JDK-8305189 > - review > - fix & test Would it still be required/useful to keep an `OpaqueZeroTripGuardNode` for such a counted loop inside an infinite loop? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1558618408 From chagedorn at openjdk.org Tue May 23 06:37:49 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 May 2023 06:37:49 GMT Subject: RFR: 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 In-Reply-To: References: Message-ID: On Mon, 22 May 2023 17:33:43 GMT, Tom Rodriguez wrote: > adjust requires declaration Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14091#pullrequestreview-1438712035 From jkarthikeyan at openjdk.org Tue May 23 07:09:42 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 23 May 2023 07:09:42 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: <3i_J-b5e5rWGvwniI16n9_9uPlM8A4nmy07WXAEUeFA=.e54fa3ad-701d-41ff-a0cf-6e0f227a40ac@github.com> References: <3i_J-b5e5rWGvwniI16n9_9uPlM8A4nmy07WXAEUeFA=.e54fa3ad-701d-41ff-a0cf-6e0f227a40ac@github.com> Message-ID: On Thu, 18 May 2023 14:36:42 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/addnode.cpp line 892: >> >>> 890: // Propagate xor through constant cmoves. This pattern can occur after expansion of Conv2B nodes. >>> 891: if (in1->Opcode() == Op_CMoveI && in2->is_Con()) { >>> 892: if (in1->in(2)->is_Con() && in1->in(3)->is_Con()) { >> >> `CMoveNode::IfTrue` and `CMoveNode::IfFalse` instead of 3 and 2. > > `CMoveNode::Condition` instead of `in1->in(1)` below, too. You need to check for the node actually being a `BoolNode` in case a constant-condition `CMove` has not been folded yet. Fixed, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1201643759 From jkarthikeyan at openjdk.org Tue May 23 07:09:45 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 23 May 2023 07:09:45 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 14:34:27 GMT, Quan Anh Mai wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Changes from code review > > src/hotspot/share/opto/addnode.cpp line 900: > >> 898: >> 899: if (cmp_op == Op_CmpI || cmp_op == Op_CmpP) { >> 900: // Flip the sense of comparison in the bool and return a new cmove > > Mistaken comment Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1201643599 From jkarthikeyan at openjdk.org Tue May 23 07:09:37 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 23 May 2023 07:09:37 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v8] In-Reply-To: References: Message-ID: > Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: > > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% > Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% > Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) > Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% > Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% > > Reviews would be greatly appreciated! > > Testing: tier1-2 on linux x64, GHA Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Cleanup from code review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13345/files - new: https://git.openjdk.org/jdk/pull/13345/files/69e914a5..9dcad187 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=06-07 Stats: 36 lines in 5 files changed: 0 ins; 20 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345 PR: https://git.openjdk.org/jdk/pull/13345 From jkarthikeyan at openjdk.org Tue May 23 07:09:40 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 23 May 2023 07:09:40 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v7] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 04:17:48 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Changes from code review Thanks for your review! I've updated the code accordingly. I also noticed that in addition to `setne`, there was also `setl` and `sete` with just a handful of uses, so I've modified those to also use `setb` with a condition code instead as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1558659341 From dnsimon at openjdk.org Tue May 23 07:24:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 07:24:02 GMT Subject: RFR: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit [v4] In-Reply-To: References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Mon, 22 May 2023 16:01:04 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 >> - avoid VM exit in more cases when creating or attaching to a libjvmci isolate >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8306992 >> - make JMCI more robust in low resource conditions > > I think the new changes look ok. Thanks for the reviews @tkrodriguez . ------------- PR Comment: https://git.openjdk.org/jdk/pull/13905#issuecomment-1558679026 From dnsimon at openjdk.org Tue May 23 07:24:04 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 07:24:04 GMT Subject: Integrated: 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit In-Reply-To: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> References: <9m8yas5gpr5EPwn9QJEBkyt6eZYSU5oTtwgn-Xvl2ww=.1b28e5fa-059b-4c1c-bf2d-6fa3be32fd7c@github.com> Message-ID: On Wed, 10 May 2023 14:00:51 GMT, Doug Simon wrote: > This PR makes the following changes to mitigate against an OOME or other recoverable error in JVMCI from causing the VM to exit: > * Tracks upcalls into libjvmci or creation of libjvmci. > * If 10% or more of these calls fail with an uncaught exception, then JVMCI compilation is disabled (i.e. future compilations fall back to Tier 1). > > When JVMCI compilation is disabled, a warning is emitted: > > [0.064s][warning][jit,compilation] JVMCI compiler disabled after 11 of 15 upcalls had errors (Last error: "uncaught exception in call_HotSpotJVMCIRuntime_compileMethod"). Use -Xlog:jit+compilation for more detail. > > > With `-Xlog:jit+compilation`, the extra detail shown is: > > [0.182s][info][jit,compilation] uncaught exception in call_HotSpotJVMCIRuntime_compileMethod while compiling java.util.stream.StreamOpFlag.fromCharacteristics(Ljava/util/Spliterator;)I > Exception in thread "JVMCI-native CompilerThread0": java.lang.InternalError > java.lang.InternalError: aborting compilation of HotSpotMethod()> > at jdk.internal.vm.ci at 20.0.2-internal/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:923) > > > Note that the errors treated by the changes in the PR are expected to be exceedingly rare. For example, an OOME while starting a libgraal isolate or initializing the JVMCI compiler. Exceptions thrown during compilation are already handled by the Graal [CompilationWrapper](https://github.com/oracle/graal/blob/431ecf7d26f56cee49708854fe0e89b05514492b/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/core/CompilationWrapper.java#L65). This pull request has now been integrated. Changeset: 422128b7 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/422128b70a57c8c6a997938fbf8d8cb19bed65e4 Stats: 278 lines in 10 files changed: 201 ins; 18 del; 59 mod 8306992: [JVMCI] mitigate more against JVMCI related OOME causing VM to exit Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/13905 From epeter at openjdk.org Tue May 23 08:06:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 May 2023 08:06:02 GMT Subject: RFR: 8302652: [SuperWord] Reduction should happen after loop, when possible [v5] In-Reply-To: References: <40OAlkC3bZBSrA2zV4iqTLLF5u7XQChZrW3BWYlD8_I=.1d607a20-ebcb-4732-8f23-73374771fb7f@github.com> <0GXFL0XmBgmvz2Ft9yfvfmdKbtHevTv35CdWyzvolpU=.66f9ced9-fae9-40b8-bf66-8f670d59f578@github.com> Message-ID: <2aCK60GP_jtRtKpf6zDBrijnsrHhcTq9hBvHxTDCUs0=.5720effe-1b6a-4d2a-a147-c7043e147871@github.com> On Thu, 11 May 2023 07:56:01 GMT, Jatin Bhateja wrote: >> Removed the noisy comment from the patch! >> >> With VectorAPI users are expected to be more intelligent and your optimizations can be directly implemented in kernel which perform VectorADD operations in main loop followed by Reduction out of loop e.g. >> >> >> outer_loop : >> hand_unrolled_vector_loop: >> v1 = VectorADD(broadcast(0)) >> v2 = v1.VectorADD(LoadVector) >> v3 = v2.VectorADD(LoadVector) >> ... >> ... >> inner_loop_end >> res += v3.ReductionAdd() >> outer_loop_end >> >> >> So its not a pressing issue anyways for us. > >> @jatin-bhateja exactly. With the Vector API the vector reduction can be explicitly put outside the loop. With SuperWord, we need to take care of it in the compiler. > > Your changes looks good to me. Thanks! @jatin-bhateja @sviswa7 @fg1417 @vnkozlov @pfustc Thanks to everybody for the help and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13056#issuecomment-1558743466 From epeter at openjdk.org Tue May 23 08:09:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 May 2023 08:09:23 GMT Subject: Integrated: 8302652: [SuperWord] Reduction should happen after loop, when possible In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 07:29:44 GMT, Emanuel Peter wrote: > https://github.com/openjdk/jdk/blob/cc9e7e8e773e773af87615fdae037a8f8ea82635/src/hotspot/share/opto/loopopts.cpp#L4125-L4171 > > I introduced a new abstract node type `UnorderedReductionNode` (subtype of `ReductionNode`). All of the reductions that can be re-ordered are to extend from this node type: `int/long add/mul/and/or/xor/min/max`, as well as `float/double min/max`. `float/double add/mul` do not allow for reordering of operations. > > The optimization is part of loop-opts, and called after `SuperWord` in `PhaseIdealLoop::build_and_optimize`. > > **Performance results** > I ran `test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java`, with `2_000` warmup and `100_000` perf iterations. I also increased the array length to `RANGE = 16*1024`. > > I disabled `turbo-boost`. > Machine: `11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16`. > Full `avx512` support, including `avx512dq` required for `MulReductionVL`. > > > operation M-N-2 M-N-3 M-2 M-3 P-2 P-3 | note | > --------------------------------------------------------------- > int add 2063 2085 660 530 415 283 | | > int mul 2272 2257 1189 733 908 439 | | > int min 2527 2520 2516 2579 2585 2542 | 1 | > int max 2548 2525 2551 2516 2515 2517 | 1 | > int and 2410 2414 602 480 353 263 | | > int or 2149 2151 597 498 354 262 | | > int xor 2059 2062 605 476 364 263 | | > long add 1776 1790 2000 1000 1683 591 | 2 | > long mul 2135 2199 2185 2001 2176 1307 | 2 | > long min 1439 1424 1421 1420 1430 1427 | 3 | > long max 2299 2287 2303 2305 1433 1425 | 3 | > long and 1657 1667 2015 1003 1679 568 | 4 | > long or 1776 1783 2032 1009 1680 569 | 4 | > long xor 1834 1783 2012 1024 1679 570 | 4 | > float add 2779 2644 2633 2648 2632 2639 | 5 | > float mul 2779 2871 2810 2776 2732 2791 | 5 | > float min 2294 2620 1725 1286 872 672 | | > float max 2371 2519 1697 1265 841 468 | | > double add 2634 2636 2635 2650 2635 2648 | 5 | > double mul 3053 2955 2881 3030 2979 2927 | 5 | > double min 2364 2400 2439 2399 2486 2398 | 6 | > double max 2488 2459 2501 2451 2493 2498 | 6 | > > Legen... This pull request has now been integrated. Changeset: 06b0a5e0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/06b0a5e03852dfed9f1dee4791fc71b4e4e1eeda Stats: 1084 lines in 16 files changed: 845 ins; 52 del; 187 mod 8302652: [SuperWord] Reduction should happen after loop, when possible Reviewed-by: kvn, pli, jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/13056 From chagedorn at openjdk.org Tue May 23 08:10:09 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 May 2023 08:10:09 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly In-Reply-To: References: Message-ID: <2DEZv0Dqrai9gFHdJ4RN5SUno7CZzhMVXMEDxr7WB1c=.e260673d-db9d-42f2-8560-496e7c1a37b5@github.com> On Thu, 4 May 2023 13:36:22 GMT, Tobias Holenstein wrote: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileC... Nice unification! I just have some small comments. Otherwise, looks good! src/hotspot/share/compiler/compilerOracle.cpp line 1015: > 1013: > 1014: void CompilerOracle::parse_compile_only(char* line) { > 1015: if (line[0] == '\0') return; I suggest to use braces for single line ifs: Suggestion: if (line[0] == '\0') { return; } src/hotspot/share/compiler/compilerOracle.cpp line 1017: > 1015: if (line[0] == '\0') return; > 1016: ResourceMark rm; > 1017: char error_buf[1024] = {0}; Wouldn't it be sufficient to only initialize the first character with `\0`? src/hotspot/share/compiler/compilerOracle.cpp line 1021: > 1019: char* method_pattern; > 1020: do { > 1021: if (line[0] == '\0') break; Suggestion: if (line[0] == '\0') { break; } test/hotspot/jtreg/compiler/loopopts/Test8211698.java line 57: > 55: } > 56: } > 57: You can leave this empty line in at the end of the file. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1438882451 PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1201705460 PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1201720761 PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1201705793 PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1201731777 From ngasson at openjdk.org Tue May 23 08:23:57 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Tue, 23 May 2023 08:23:57 GMT Subject: RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d Marked as reviewed by ngasson (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13851#pullrequestreview-1438956873 From dnsimon at openjdk.org Tue May 23 08:46:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 08:46:52 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v4] In-Reply-To: References: Message-ID: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8308151 - replace JVMCICompileMethodExceptionIsFatal VM flag with test.jvmci.compileMethodExceptionIsFatal system property - append elision comment to end of last stack trace line - send JVMCI exception info to hs-err log and/or tty - remove unused callToString method - make JMCI more robust in low resource conditions ------------- Changes: https://git.openjdk.org/jdk/pull/14000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=03 Stats: 404 lines in 11 files changed: 341 ins; 31 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From thartmann at openjdk.org Tue May 23 08:53:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 May 2023 08:53:49 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 08:35:45 GMT, Xiaolin Zheng wrote: >> The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. >> >> Passed fastdebug/release build on both AArch64/RISC-V platforms. >> >> Thanks, >> Xiaolin > > Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Further cleanups Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13983#pullrequestreview-1439049242 From thartmann at openjdk.org Tue May 23 09:07:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 May 2023 09:07:03 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations In-Reply-To: References: Message-ID: On Wed, 3 May 2023 12:43:29 GMT, Doug Simon wrote: > This PRs adds JVMCI API to reflect the fact that [deferred locals are not supported on virtual threads](https://bugs.openjdk.org/browse/JDK-8307125?focusedCommentId=14578728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578728). The test should be removed from `test/hotspot/jtreg/ProblemList-Virtual.txt` (added by [JDK-8307370](https://bugs.openjdk.org/browse/JDK-8307370)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13777#issuecomment-1558859099 From tholenstein at openjdk.org Tue May 23 09:08:20 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 23 May 2023 09:08:20 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v2] In-Reply-To: References: Message-ID: <3tF5Nlnp_YCBvjfj38WxdiRrYeOyfA9Wz4wJ8tm5z2o=.4719cce4-0d72-485b-9494-cd97594c42fa@github.com> > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileC... Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: - Update Test8211698.java - Update src/hotspot/share/compiler/compilerOracle.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/compiler/compilerOracle.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13802/files - new: https://git.openjdk.org/jdk/pull/13802/files/d6fca2b8..40b17296 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=00-01 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From tholenstein at openjdk.org Tue May 23 09:08:20 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 23 May 2023 09:08:20 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v2] In-Reply-To: <2DEZv0Dqrai9gFHdJ4RN5SUno7CZzhMVXMEDxr7WB1c=.e260673d-db9d-42f2-8560-496e7c1a37b5@github.com> References: <2DEZv0Dqrai9gFHdJ4RN5SUno7CZzhMVXMEDxr7WB1c=.e260673d-db9d-42f2-8560-496e7c1a37b5@github.com> Message-ID: On Tue, 23 May 2023 08:01:07 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update Test8211698.java >> - Update src/hotspot/share/compiler/compilerOracle.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/compiler/compilerOracle.cpp >> >> Co-authored-by: Christian Hagedorn > > test/hotspot/jtreg/compiler/loopopts/Test8211698.java line 57: > >> 55: } >> 56: } >> 57: > > You can leave this empty line in at the end of the file. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1201860361 From dnsimon at openjdk.org Tue May 23 09:24:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 09:24:54 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations [v2] In-Reply-To: References: Message-ID: > This PRs adds JVMCI API to reflect the fact that [deferred locals are not supported on virtual threads](https://bugs.openjdk.org/browse/JDK-8307125?focusedCommentId=14578728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578728). Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - remove MaterializeVirtualObjectTest.java from ProblemList-Virtual.txt - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8307125 - materializing frames on virtual threads is not supported ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13777/files - new: https://git.openjdk.org/jdk/pull/13777/files/3f5be9fc..ee33378a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13777&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13777&range=00-01 Stats: 152690 lines in 2740 files changed: 113916 ins; 18852 del; 19922 mod Patch: https://git.openjdk.org/jdk/pull/13777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13777/head:pull/13777 PR: https://git.openjdk.org/jdk/pull/13777 From thartmann at openjdk.org Tue May 23 09:43:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 May 2023 09:43:04 GMT Subject: RFR: 8260943: C2 SuperWord: Remove dead vectorization optimization added by 8076284 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 12:15:08 GMT, Emanuel Peter wrote: > I suggest we remove this dead `_do_vector_loop_experimental` code. > @vnkozlov disabled it 2.5 years ago [JDK-8251994](https://bugs.openjdk.org/browse/JDK-8251994) https://github.com/openjdk/jdk/commit/a7fa1b70f212566e95068936841b6e9702eccaed. > His [analysis](https://bugs.openjdk.org/browse/JDK-8251994?focusedCommentId=14364507&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14364507). > His conclusion back then: > > Using unrolling and cloning information to vectorize is interesting idea but as I see it is not complete. > Even if pack_parallel() method is able created packs they are all removed by filter_packs() method. > And additionally the above cases are vectorized without hoisting loads and pack_parallel - I verified it. > That code is useless now and I will put it under flag to not run it. It needs more work to be useful. > I reluctant to remove the code because may be in a future we will have time to invest into it. > > > He disabled it by renaming many occurances of `_do_vector_loop` with `_do_vector_loop_experimental = false`. > > I don't believe anybody wants to fix this code any time soon. Current `SuperWord` can do almost everything that this code promises. If we really want to have parallel iterations for the Stream API, then we should do this in the dependency graph directly, by removing the inter-iteration edges. > > If you care, you can read my arguments below. > I am also using this opportunity to think back: what were the motivations for this code. > And I am thinking forward: what could we do to improve our `SuperWord` algorithm? > > **Testing** > > Up to tier5 and stress testing, with and without `-XX:CompileCommand=option,path.to.Class::method,Vectorize`. **Running...** > > ----------- > > **Background** > > "Seeding" is crucial: > The SPL algorithm (Super Word Parallelism) relies on good detection of parallel instruction that can be packed. This is usually done with "seeding": one finds loads or stores that can be packed - preferrably they are adjacent so that we can use a vectorized load or store (alternatively gather and scatter can be used for strided or random accesses). After this "seeding", the vectorization is extended to non-seed operations (usually greedily). > > In `C2`'s `SuperWord` algorithm, we have two approaches for this "seeding": > 1. Normally, we simply try to find adjacent loads and stores for the same `base` (array). Second, we require load/store packs to be aligned to each other in the same memory slice... Nice analysis, Emanuel. Looks good to me but the title should be renamed from "Revisit" to "Remove". ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13930#pullrequestreview-1439110543 From thartmann at openjdk.org Tue May 23 15:09:44 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 May 2023 15:09:44 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative In-Reply-To: References: Message-ID: <_kalS-Dc9TGfOur-1wreV9Kuhk7sJD46UZ8EKHJJMOc=.235fad3f-e0e6-40ee-83e7-2e33aa2d6d25@github.com> On Fri, 19 May 2023 16:19:42 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. I'm seeing failures with `compiler/unsafe/JdkInternalMiscUnsafeAccessTestLong.java` and `JdkInternalMiscUnsafeAccessTestInt.java`: test compiler.unsafe.JdkInternalMiscUnsafeAccessTestLong.testArray(): failure java.lang.AssertionError: getAndAdd long expected [81985529216486895] but found [33520196056] at org.testng.Assert.fail(Assert.java:99) at org.testng.Assert.failNotEquals(Assert.java:1037) at org.testng.Assert.assertEqualsImpl(Assert.java:140) at org.testng.Assert.assertEquals(Assert.java:122) at org.testng.Assert.assertEquals(Assert.java:797) at compiler.unsafe.JdkInternalMiscUnsafeAccessTestLong.testAccess(JdkInternalMiscUnsafeAccessTestLong.java:361) at compiler.unsafe.JdkInternalMiscUnsafeAccessTestLong.testArray(JdkInternalMiscUnsafeAccessTestLong.java:130) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14061#issuecomment-1559018427 From vkempik at openjdk.org Tue May 23 15:10:52 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 23 May 2023 15:10:52 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers Message-ID: Please review this fix. vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. This fix resolves this situation. No noticable difference one might see in generated code for now. Testing: build testing only. ------------- Commit messages: - 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers Changes: https://git.openjdk.org/jdk/pull/14102/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14102&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308656 Stats: 42 lines in 2 files changed: 34 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14102/head:pull/14102 PR: https://git.openjdk.org/jdk/pull/14102 From thartmann at openjdk.org Tue May 23 15:13:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 May 2023 15:13:03 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v4] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Thu, 18 May 2023 14:15:22 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Nice! The new version looks good to me. src/hotspot/share/opto/c2_CodeStubs.hpp line 116: > 114: > 115: template > 116: class C2GeneralStub : public C2CodeStub { A comment describing the purpose of this class would be good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1439128798 PR Review Comment: https://git.openjdk.org/jdk/pull/13602#discussion_r1201897039 From dnsimon at openjdk.org Tue May 23 15:13:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 15:13:55 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v5] In-Reply-To: References: Message-ID: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Doug Simon has updated the pull request incrementally with one additional commit since the last revision: [skip ci] make TestUncaughtErrorInCompileMethod more robust ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14000/files - new: https://git.openjdk.org/jdk/pull/14000/files/fb50cbbe..fe2ca698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=03-04 Stats: 32 lines in 4 files changed: 21 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From duke at openjdk.org Tue May 23 15:18:41 2023 From: duke at openjdk.org (Chang Peng) Date: Tue, 23 May 2023 15:18:41 GMT Subject: Integrated: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java In-Reply-To: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> References: <32EIbtPLqejQLdpxMJTXaXuQ35FqT3zcw7pCcLaMXqM=.280bd95e-42ba-424c-8a53-730e08eb6c32@github.com> Message-ID: On Sat, 6 May 2023 02:01:20 GMT, Chang Peng wrote: > To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSet() into the outer method. The assembly of maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the fromLong()'s performance effectively. After this patch, we can see the bdep instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > Since using Blackhole to consume VectorMask will generate a heavy vector box, we don't use Blackhole to fix this benchmark. > > [1]: https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d This pull request has now been integrated. Changeset: 97d3b273 Author: changpeng1997 Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/97d3b2731ebd7594cbc3579f4c375ae70bb489a3 Stats: 136 lines in 1 file changed: 80 ins; 6 del; 50 mod 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java Reviewed-by: qamai, xgong, ngasson ------------- PR: https://git.openjdk.org/jdk/pull/13851 From roland at openjdk.org Tue May 23 15:32:01 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 23 May 2023 15:32:01 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v4] In-Reply-To: References: Message-ID: > pre/main/post loops are created for an inner loop of a loop nest but > assert predicates cause the main and post loops to be removed. The > OpaqueZeroTripGuard nodes for the loops are not removed: there's no > logic to trigger removal of the opaque nodes once the loops are no > longer there. With the inner loops gone, the outer loop becomes > candidate for optimizations and is unrolled which causes the zero trip > guards of the now removed loops to be duplicated and the opaque nodes > to have more than one use. > > The fix I propose is, using logic similar to > `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop > opts if every OpaqueZeroTripGuard node guards a loop and if not, > remove it. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13901/files - new: https://git.openjdk.org/jdk/pull/13901/files/8f37817c..5e26e81d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13901&range=02-03 Stats: 33 lines in 3 files changed: 22 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13901/head:pull/13901 PR: https://git.openjdk.org/jdk/pull/13901 From chagedorn at openjdk.org Tue May 23 15:32:03 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 May 2023 15:32:03 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v4] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 10:45:05 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - review I agree that it is better to be safe here - update looks good! I'll resubmit some testing. I agree that it is better to be safe here - update looks good! I'll resubmit some testing. src/hotspot/share/opto/loopnode.cpp line 4170: > 4168: // unreachable from _ltree_root: zero trip guard is in a newly discovered infinite loop. > 4169: // We can't tell if the opaque node is useful or not > 4170: assert(guarded_loop == nullptr || guarded_loop->is_in_infinite_subgraph(), ""); Indentation: Suggestion: assert(guarded_loop == nullptr || guarded_loop->is_in_infinite_subgraph(), ""); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1439241021 PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1439258383 PR Review Comment: https://git.openjdk.org/jdk/pull/13901#discussion_r1201972405 From roland at openjdk.org Tue May 23 15:32:07 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 23 May 2023 15:32:07 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v3] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 06:33:37 GMT, Christian Hagedorn wrote: > Would it still be required/useful to keep an `OpaqueZeroTripGuardNode` for such a counted loop inside an infinite loop? You're right that it's probably safer to be conservative and rather than remove the opaque node anyway, keep it if we can't tell if it's useful or not. I updated the change with a new commit that does that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1558893085 From dnsimon at openjdk.org Tue May 23 15:39:44 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 23 May 2023 15:39:44 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v6] In-Reply-To: References: Message-ID: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: [skip ci] make TestUncaughtErrorInCompileMethod more robust ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14000/files - new: https://git.openjdk.org/jdk/pull/14000/files/fe2ca698..0471e7c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14000&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14000.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14000/head:pull/14000 PR: https://git.openjdk.org/jdk/pull/14000 From qamai at openjdk.org Tue May 23 16:12:27 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 May 2023 16:12:27 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v5] In-Reply-To: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: > Hi, > > This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. > > Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: comments describe C2GeneralStub ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13602/files - new: https://git.openjdk.org/jdk/pull/13602/files/a17bcb76..76d181b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13602&range=03-04 Stats: 21 lines in 1 file changed: 21 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13602.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13602/head:pull/13602 PR: https://git.openjdk.org/jdk/pull/13602 From qamai at openjdk.org Tue May 23 16:12:28 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 May 2023 16:12:28 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v2] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: <_klgLAYJdRAhAGVrpe82LVL56U7yUD0s9XMXhQ6C2ro=.4e7c7394-2425-436b-9da1-66053659e0a3@github.com> On Wed, 17 May 2023 22:10:38 GMT, Vladimir Kozlov wrote: >> Is it possible to do this in `c2_MacroAssembler_x86` instead (as for `verified_entry`)? >> We are trying to move complex coding from .ad files to macroassembler. > >> @vnkozlov Yes we can explicitly define a stub without relying on code generation, it may be more preferable since it avoids adding complexity to adlc generation. The only downside is that there is some boilerplate for each usage but I think the boilerplate is not too terrible. > > Can you look on that? There could be other cases in Macroassembler which can use this @vnkozlov Thanks for your reviews and testing @TobiHartmann Thanks for your suggestion, I have added comments to describe the purpose of `C2GeneralStub` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13602#issuecomment-1559745291 From kvn at openjdk.org Tue May 23 16:32:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 May 2023 16:32:03 GMT Subject: RFR: 8306706: Support out-of-line code generation for MachNodes [v5] In-Reply-To: References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Tue, 23 May 2023 16:12:27 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. >> >> Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > comments describe C2GeneralStub Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13602#pullrequestreview-1440156733 From cslucas at openjdk.org Tue May 23 16:39:22 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 23 May 2023 16:39:22 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 22 May 2023 17:56:41 GMT, Cesar Soares Lucas wrote: > Speaking of _only_merge_candidate flag, I find it easier about the code when the property being tracked is whether the ObjectValue is referenced from corresponding JVM state or not. (Maybe call it is_root()?) So, ScopeDesc::objects_to_rematerialize() would skip everything not referenced from JVM state [...] @iwanowww - I want to make sure I understood "is_root(sv)" correctly. Are you suggesting to implement it as `ScopeDesc::is_root(ScopeValue* sv)` and the body of the method would just check if the `sv` is referenced in locals/expressions/monitor? Did I get it right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1559796992 From qamai at openjdk.org Tue May 23 17:10:15 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 May 2023 17:10:15 GMT Subject: Integrated: 8306706: Support out-of-line code generation for MachNodes In-Reply-To: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> References: <3haQdXHxlUHKAqi4MNWaVz3gVcFB9M8A20tGPQIok3c=.940d6d13-9764-449a-a9e1-36247f08b68e@github.com> Message-ID: On Sun, 23 Apr 2023 18:22:35 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds supports for MachNodes to emit an out-of-line piece of code in the stub section of the compiled method. This allows the separation of the uncommon path from the common one, which speeds up the common path a little bit and increases compiled code density. Please take a look and leave reviews. > > Thanks a lot. This pull request has now been integrated. Changeset: ab241b34 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/ab241b3428839fd121ee4ce5fdafeb649f453550 Stats: 282 lines in 9 files changed: 269 ins; 4 del; 9 mod 8306706: Support out-of-line code generation for MachNodes Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13602 From vkempik at openjdk.org Tue May 23 17:10:55 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 23 May 2023 17:10:55 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers In-Reply-To: References: Message-ID: On Tue, 23 May 2023 13:31:18 GMT, Vladimir Kempik wrote: > Please review this fix. > vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. > This fix resolves this situation. > No noticable difference one might see in generated code for now. > > Testing: build testing only. macos build failure unrelated, No code touched outside of cpu/riscv in this PR ------------- PR Comment: https://git.openjdk.org/jdk/pull/14102#issuecomment-1559839537 From vlivanov at openjdk.org Tue May 23 17:15:07 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 May 2023 17:15:07 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 23 May 2023 16:36:32 GMT, Cesar Soares Lucas wrote: > Are you suggesting to implement it as ScopeDesc::is_root(ScopeValue* sv) and the body of the method would just check if the sv is referenced in locals/expressions/monitor? Did I get it right? I didn't propose exactly that, but I like your idea. I'm not against having it cached on `ScopeValue` side (and serialized in debug info), but implementing it as a query on `ScopeDesc` does look like a better alternative. (If it turns out to matter from performance POV, the check can be then turned into an assert and the cached value is used.) Maybe call it `ScopeDesc::has_reference_to(ScopeValue* sv)` then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1559844211 From dcubed at openjdk.org Tue May 23 20:48:16 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 23 May 2023 20:48:16 GMT Subject: Integrated: 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 Message-ID: Trivial fixes to ProblemList some tests: [JDK-8308716](https://bugs.openjdk.org/browse/JDK-8308716) ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 [JDK-8308718](https://bugs.openjdk.org/browse/JDK-8308718) ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode [JDK-8308720](https://bugs.openjdk.org/browse/JDK-8308720) ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 ------------- Commit messages: - 8308720: ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 - 8308718: ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode - 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 Changes: https://git.openjdk.org/jdk/pull/14106/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14106&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308716 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14106/head:pull/14106 PR: https://git.openjdk.org/jdk/pull/14106 From azvegint at openjdk.org Tue May 23 20:48:16 2023 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Tue, 23 May 2023 20:48:16 GMT Subject: Integrated: 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 In-Reply-To: References: Message-ID: On Tue, 23 May 2023 20:24:18 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList some tests: > [JDK-8308716](https://bugs.openjdk.org/browse/JDK-8308716) ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 > [JDK-8308718](https://bugs.openjdk.org/browse/JDK-8308718) ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode > [JDK-8308720](https://bugs.openjdk.org/browse/JDK-8308720) ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 Marked as reviewed by azvegint (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14106#pullrequestreview-1440533061 From darcy at openjdk.org Tue May 23 20:48:17 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 23 May 2023 20:48:17 GMT Subject: Integrated: 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 In-Reply-To: References: Message-ID: On Tue, 23 May 2023 20:24:18 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList some tests: > [JDK-8308716](https://bugs.openjdk.org/browse/JDK-8308716) ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 > [JDK-8308718](https://bugs.openjdk.org/browse/JDK-8308718) ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode > [JDK-8308720](https://bugs.openjdk.org/browse/JDK-8308720) ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 Marked as reviewed by darcy (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14106#pullrequestreview-1440535048 From dcubed at openjdk.org Tue May 23 20:48:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 23 May 2023 20:48:17 GMT Subject: Integrated: 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 In-Reply-To: References: Message-ID: On Tue, 23 May 2023 20:39:02 GMT, Alexander Zvegintsev wrote: >> Trivial fixes to ProblemList some tests: >> [JDK-8308716](https://bugs.openjdk.org/browse/JDK-8308716) ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 >> [JDK-8308718](https://bugs.openjdk.org/browse/JDK-8308718) ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode >> [JDK-8308720](https://bugs.openjdk.org/browse/JDK-8308720) ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 > > Marked as reviewed by azvegint (Reviewer). @azvegint and @jddarcy - Thanks for the fast reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14106#issuecomment-1560097131 From dcubed at openjdk.org Tue May 23 20:48:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 23 May 2023 20:48:18 GMT Subject: Integrated: 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 In-Reply-To: References: Message-ID: On Tue, 23 May 2023 20:24:18 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList some tests: > [JDK-8308716](https://bugs.openjdk.org/browse/JDK-8308716) ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 > [JDK-8308718](https://bugs.openjdk.org/browse/JDK-8308718) ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode > [JDK-8308720](https://bugs.openjdk.org/browse/JDK-8308720) ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 This pull request has now been integrated. Changeset: ed0e956f Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/ed0e956fc28a54a0eb49bab70a7d010095ce2544 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod 8308716: ProblemList java/util/concurrent/ScheduledThreadPoolExecutor/BasicCancelTest.java with genzgc on windows-x64 8308718: ProblemList three mlvm/indy/func/jvmti tests on windows-x64 in Xcomp mode 8308720: ProblemList java/awt/event/SequencedEvent/MultipleContextsFunctionalTest.java on macosx-x64 Reviewed-by: azvegint, darcy ------------- PR: https://git.openjdk.org/jdk/pull/14106 From sviswanathan at openjdk.org Tue May 23 22:24:37 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 May 2023 22:24:37 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v2] In-Reply-To: References: Message-ID: > This PR fixes the problem with double reduction on x86_64. > > In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: > jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java > The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. > > This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. > > With this PR the vector_reduction_double node is generated. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: use max_vector_size instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14065/files - new: https://git.openjdk.org/jdk/pull/14065/files/0cc5157a..d0acd1e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14065/head:pull/14065 PR: https://git.openjdk.org/jdk/pull/14065 From sviswanathan at openjdk.org Tue May 23 22:35:04 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 May 2023 22:35:04 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: Message-ID: > This PR fixes the problem with double reduction on x86_64. > > In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: > jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java > The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. > > This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. > > With this PR the vector_reduction_double node is generated. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: change to superword_max_vector_size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14065/files - new: https://git.openjdk.org/jdk/pull/14065/files/d0acd1e6..ba3b5dfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14065/head:pull/14065 PR: https://git.openjdk.org/jdk/pull/14065 From sviswanathan at openjdk.org Tue May 23 22:35:06 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 May 2023 22:35:06 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: <5RPtOn1e4TR6lq06RFrXlo4M1UbFj4M0ixalBtfTsA8=.2a61c0ec-55cb-486d-8d30-174c52738228@github.com> References: <5RPtOn1e4TR6lq06RFrXlo4M1UbFj4M0ixalBtfTsA8=.2a61c0ec-55cb-486d-8d30-174c52738228@github.com> Message-ID: <77EKge9623Hd4f65bKgcII_F9sxhBsq5lcU702bvlZ0=.0c1da309-223d-4568-996a-96a22184e0c2@github.com> On Tue, 23 May 2023 02:59:41 GMT, Fei Gao wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> change to superword_max_vector_size > > src/hotspot/share/opto/superword.cpp line 3724: > >> 3722: // For vector reduction implemented check we need atleast two elements. >> 3723: int min_vec_size = MAX2(Matcher::min_vector_size(bt), 2); >> 3724: if (ReductionNode::implemented(use->Opcode(), min_vec_size, bt)) { > > Hi @sviswa7, can we use `superword_max_vector_size()` as the input here? `MAX2(Matcher::min_vector_size(bt), 2);` may not tally with the actual situation on other 64-bit platforms. WDYT? @fg1417 Thanks a lot for the review. Yes, we could use superword_max_vector_size here. I have made the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14065#discussion_r1203099854 From fgao at openjdk.org Wed May 24 02:23:00 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 24 May 2023 02:23:00 GMT Subject: RFR: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file Message-ID: If a match rule belongs to one of the following situations, we can remove extra `UseSVE` Predicate: 1. If any src operand type is `pReg`, which is SVE specific, we can remove `Predicate(UseSVE > 0)`. But if only dst operand type is `pReg`, we can't remove `Predicate(UseSVE > 0)`, since the DFA of matcher selects by src operands and instruction cost, not involving dst operand. 2. If matcher can use src operand type, i.e., `pReg` or `vReg`, to distinguish sve from neon, we can remove `Predicate(UseSVE == 0)` for rules on neon. 3. When the condition in `Predicate()` is false on current platform, it's definitely impossible to generate the corresponding node pattern from C2. Then we can remove `Predicate()`, like removing `predicate(UseSVE > 0)` for all `PopulateIndex` rules. After the patch, the code size of libjvm.so decreased from 25.42M to 25.39M, by 25.3K. Testing: No new failures found on tier 1 - 3. No significant performance regression compared with master. ------------- Commit messages: - 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file Changes: https://git.openjdk.org/jdk/pull/14112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308339 Stats: 302 lines in 2 files changed: 0 ins; 202 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/14112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14112/head:pull/14112 PR: https://git.openjdk.org/jdk/pull/14112 From xlinzheng at openjdk.org Wed May 24 03:30:56 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 24 May 2023 03:30:56 GMT Subject: RFR: 8308091: Remove unused iRegIHeapbase() matching operand [v2] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 13:04:30 GMT, Fei Yang wrote: >> Xiaolin Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> Further cleanups > > Updated change LGTM. Thanks. Thanks for reviewing! @RealFYang @TobiHartmann This is a trivial patch, so integrate then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13983#issuecomment-1560408272 From yzhu at openjdk.org Wed May 24 04:13:54 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Wed, 24 May 2023 04:13:54 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers In-Reply-To: References: Message-ID: On Tue, 23 May 2023 13:31:18 GMT, Vladimir Kempik wrote: > Please review this fix. > vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. > This fix resolves this situation. > No noticable difference one might see in generated code for now. > > Testing: build testing only. src/hotspot/cpu/riscv/riscv_v.ad line 2227: > 2225: iRegI_R10 result, vReg_V1 v1, vReg_V2 v2, vReg_V3 v3, vReg_V4 v4, > 2226: vReg_V5 v5, vReg_V6 v6, vReg_V7 v7, > 2227: vRegMask_V0 v0, iRegP_R28 tmp1, iRegL_R29 tmp2) Hi, When StrCompNode is `StrIntrinsicNode::UU` or `StrIntrinsicNode::LL`, if-branch (element_compare) will be executed in `C2_MacroAssembler::string_compare_v`, and `lmul` is set to 2, so v6 and v7 are not used in `string_compareL` and `string_compareU`, here is the code: C2_MacroAssembler::string_compare_v if (str1_isL == str2_isL) { // LL or UU element_compare(str1, str2, zr, cnt2, tmp1, tmp2, v2, v4, v1, encLL, DIFFERENCE); j(DONE); } else { // LU or UL Register strL = encLU ? str1 : str2; Register strU = encLU ? str2 : str1; C2_MacroAssembler::element_compare bind(loop); vsetvli(tmp1, cnt, sew, Assembler::m2); vlex_v(vr1, a1, sew); vlex_v(vr2, a2, sew); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14102#discussion_r1203391244 From jbhateja at openjdk.org Wed May 24 05:04:02 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 May 2023 05:04:02 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Thu, 11 May 2023 03:56:42 GMT, Fei Gao wrote: >> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? > >> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? > > Hi @eme64 , nice rewrite! > > BTW, have you tested your patch with `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` for all jtreg? Thanks. > Thanks @fg1417 for the review! > > Yes, the testing passes up to at least tier5 and stress testing. With and without `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` > > Yes, I hope that someone from intel / x86 specialists can review this too :) These are candidates: @jatin-bhateja @sviswa7 @merykitty Your patch looks good to me. Patch testing with UseSSE=2 shows failure in following tests, failure is unrelated to your changes, we need to add a strict feature based check in test tag * @requires os.simpleArch == "x64" & (vm.cpu.features ~= ".*avx.*") or use applyIfCPUFeature make test TEST="test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxReductions.java" CONF=linux-x86_64-server-fastdebug JTREG="RETAIN=all;JOBS=8;TIMEOUT_FACTOR=8;JAVA_OPTIONS=-XX:UseSSE=2" ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1560459854 From jbhateja at openjdk.org Wed May 24 05:09:57 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 May 2023 05:09:57 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D [v2] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:45:45 GMT, Emanuel Peter wrote: >> **Bug** >> In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). >> >> The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) >> On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 >> >> The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: >> https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 >> >> The wrong results with `NaN` are because of a bug in `x`: >> https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 >> The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). >> >> **Solution** >> @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. >> >> This has a few benefits: >> - `VectorMaskCmp + VectorBlend` is more powerful: >> - `CMoveVF/D` required the same inputs to the compare than to the move itself. >> - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. >> - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). >> - We need less code (I completely removed all code for `CMoveVF/D`). >> >> I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. >> >> As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment on request of @fg1417 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13493#pullrequestreview-1440940243 From thartmann at openjdk.org Wed May 24 06:23:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 May 2023 06:23:57 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v8] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 07:09:37 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup from code review I run this through some quick testing and I'm seeing the following crash with `compiler/vectorapi/TestMaskedMacroLogicVector.java` and `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline`: # Internal Error (/workspace/open/src/hotspot/share/opto/type.hpp:1967), pid=3563106, tid=3563120 # assert(_base == Int) failed: Not an Int # # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-05-23-0846313.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-05-23-0846313.tobias.hartmann.jdk2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x5cca5f] Type::is_int() const [clone .part.0]+0x2f # Current CompileTask: C2: 9002 777 % b compiler.vectorapi.TestMaskedMacroLogicVector::verifyInt2 @ 3 (136 bytes) Stack: [0x00007f83d89e0000,0x00007f83d8ae1000], sp=0x00007f83d8adb530, free space=1005k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x5cca5f] Type::is_int() const [clone .part.0]+0x2f (type.hpp:1967) V [libjvm.so+0x5d2488] XorINode::Ideal(PhaseGVN*, bool)+0x468 (node.hpp:394) V [libjvm.so+0x151c7fe] PhaseIterGVN::transform_old(Node*)+0x22e (phaseX.cpp:833) V [libjvm.so+0x1516141] PhaseIterGVN::optimize()+0x81 (phaseX.cpp:1218) V [libjvm.so+0x9f3a12] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x4d2 (loopnode.hpp:1203) V [libjvm.so+0x9efdae] Compile::Optimize()+0x10fe (compile.cpp:2350) V [libjvm.so+0x9f24c5] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1ae5 (compile.cpp:839) V [libjvm.so+0x84c414] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x3c4 (c2compiler.cpp:118) V [libjvm.so+0x9fe2c0] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa00 (compileBroker.cpp:2265) V [libjvm.so+0x9ff148] CompileBroker::compiler_thread_loop()+0x618 (compileBroker.cpp:1944) V [libjvm.so+0xeb96fc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:719) V [libjvm.so+0x179b59a] Thread::call_run()+0xba (thread.cpp:217) V [libjvm.so+0x14980cc] thread_native_entry(Thread*)+0x11c (os_linux.cpp:775) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1560518615 From fgao at openjdk.org Wed May 24 06:31:54 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 24 May 2023 06:31:54 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 22:35:04 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change to superword_max_vector_size LGTM ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/14065#pullrequestreview-1441024956 From never at openjdk.org Wed May 24 06:58:58 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 24 May 2023 06:58:58 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v6] In-Reply-To: References: Message-ID: <0pFHulKR_iivOKeNxbX6aIrljWkUWXWAP2CTVrZ0Krs=.fc9dc852-7f05-4788-a263-af64ac7cbaf9@github.com> On Tue, 23 May 2023 15:39:44 GMT, Doug Simon wrote: >> When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). >> >> This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): >> >> JVMCI Events (11 events): >> ... >> Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError >> Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) >> >> >> It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: >> >> COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] >> >> >> [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > [skip ci] make TestUncaughtErrorInCompileMethod more robust The latest version looks ok to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14000#issuecomment-1560552464 From epeter at openjdk.org Wed May 24 07:04:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 May 2023 07:04:11 GMT Subject: RFR: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Wed, 24 May 2023 05:00:39 GMT, Jatin Bhateja wrote: >>> @fg1417 Since I'm basically implementing your suggestion: do you agree with this fix? >> >> Hi @eme64 , nice rewrite! >> >> BTW, have you tested your patch with `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` for all jtreg? Thanks. > >> Thanks @fg1417 for the review! >> >> Yes, the testing passes up to at least tier5 and stress testing. With and without `-XX:+UseCMoveUnconditionally` and `-XX:+UseVectorCmov` >> >> Yes, I hope that someone from intel / x86 specialists can review this too :) These are candidates: @jatin-bhateja @sviswa7 @merykitty > > Your patch looks good to me. > > Patch testing with UseSSE=2 shows failure in following tests, failure is unrelated to your changes, we need to add a strict feature based check in test tag * @requires os.simpleArch == "x64" & (vm.cpu.features ~= ".*avx.*") or use applyIfCPUFeature > > make test TEST="test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxReductions.java" CONF=linux-x86_64-server-fastdebug JTREG="RETAIN=all;JOBS=8;TIMEOUT_FACTOR=8;JAVA_OPTIONS=-XX:UseSSE=2" @jatin-bhateja I could reproduce it on master, so it is indeed unrelated. Filed the bug https://bugs.openjdk.org/browse/JDK-8308746 Thanks @jatin-bhateja @fg1417 thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13493#issuecomment-1560556252 From epeter at openjdk.org Wed May 24 07:04:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 May 2023 07:04:13 GMT Subject: Integrated: 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D In-Reply-To: References: Message-ID: On Mon, 17 Apr 2023 13:14:37 GMT, Emanuel Peter wrote: > **Bug** > In `x86`, `CMoveVF/D` were not correctly implemented for the `eq` and `neq` case (leads to assert). And the `lt/le/gt/ge` cases did not all handle `NaN's` correctly (ordered vs unordered comparision, leads to wrong results). > > The assert gets triggered in the code from this change: [JDK-8285973](https://bugs.openjdk.org/browse/JDK-8285973) > On this line: https://github.com/openjdk/jdk/commit/c1db70d827f7ac81aa6c6646e2431f672c71c8dc#diff-e5266a3774f26ac663dcc67e0be18608b1735f38c0576673ce36e0cd689bab4aR4309 > > The problematic line wants to find a Cmp above the Bool, and compare its inputs. But we have no Cmp there, just a constant, that we have set during matching: > https://github.com/openjdk/jdk/blob/af4d5600e37ec6d331e62c5d37491ee97cad5311/src/hotspot/share/opto/matcher.cpp#L2394 > > The wrong results with `NaN` are because of a bug in `x`: > https://github.com/openjdk/jdk/commit/0485593fbc4a3264b79969de192e8e7d36e5b590#diff-7070c036c7d88ba4a8467e404d8d88aee646b97bf7bacc8b73cbc93f3ef11d2dR2106 > The cases `lt` and `le` include the `-1` case, which shoud return `true` if any comparison input is a `NaN`, just as defined for java bytecode `fcmpl/dcmpl`. But they were mapped to ordered comparison codes, not unordered ones. More [here](https://bugs.openjdk.org/browse/JDK-8306302?focusedCommentId=14579078&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579078). > > **Solution** > @fg1417 suggested that `CMoveVF/D` is perfectly composed of `VectorMaskCmp + VectorBlend`. So instead of fixing `CMoveVF/D`, I replaced it. Performance should be the same, as it goes down to the same assembly instructions. > > This has a few benefits: > - `VectorMaskCmp + VectorBlend` is more powerful: > - `CMoveVF/D` required the same inputs to the compare than to the move itself. > - `CMoveVF/D` on x86 was only implemented for 32 bytes. Any other size would simply fail to vectorize. > - `VectorMaskCmp` and `VectorBlend` can have different compare inputs, and even different types. For now, the input types must have the same data-width (`float` and `int`, `double` and `long`). > - We need less code (I completely removed all code for `CMoveVF/D`). > > I also moved the whole `CMove` code in `SuperWord` into `SuperWord::output`, rather than the complex code `SuperWord::merge_packs_to_cmove / CMoveKit`. > > As reported in [JDK-8306088](https://bugs.openjdk.org/browse/JDK-8306088) https://github.com/openjdk/jdk/pull/13354, the CMove code did not prop... This pull request has now been integrated. Changeset: beb75e65 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/beb75e651f1e4a9bd21f611f9abc7ca28afbae31 Stats: 1370 lines in 13 files changed: 787 ins; 550 del; 33 mod 8306302: C2 Superword fix: use VectorMaskCmp and VectorBlend instead of CMoveVF/D Reviewed-by: fgao, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/13493 From dnsimon at openjdk.org Wed May 24 07:18:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 24 May 2023 07:18:07 GMT Subject: RFR: 8308151: [JVMCI] capture JVMCI exceptions in hs-err [v6] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:39:44 GMT, Doug Simon wrote: >> When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). >> >> This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): >> >> JVMCI Events (11 events): >> ... >> Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError >> Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) >> Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) >> >> >> It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: >> >> COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] >> >> >> [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > [skip ci] make TestUncaughtErrorInCompileMethod more robust Thanks Tom for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14000#issuecomment-1560574273 From dnsimon at openjdk.org Wed May 24 07:18:09 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 24 May 2023 07:18:09 GMT Subject: Integrated: 8308151: [JVMCI] capture JVMCI exceptions in hs-err In-Reply-To: References: Message-ID: On Tue, 16 May 2023 08:02:11 GMT, Doug Simon wrote: > When there is a pending exception after a JVMCI upcall into libjvmci, the VM calls the ExceptionDescribe JNI function to print the exception. Unfortunately, this output goes to "a system error-reporting channel" [1] which may not be tty. It also means the output is not in a hs-err log should the VM then exit with a fatal error. This has historically made it harder to triage libgraal bugs (i.e. the console output is usually required in addition to the hs-err crash log). > > This PR addresses these shortcomings by printing the exception info to a string which is added to the JVMCI event log (for hs-err): > > JVMCI Events (11 events): > ... > Event: 0.274 Thread 0x0000000146819210 compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError > Event: 0.274 Thread 0x0000000146819210 at compiler.jvmci.TestUncaughtErrorInCompileMethod$1.createCompiler(TestUncaughtErrorInCompileMethod.java:147) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.getCompiler(HotSpotJVMCIRuntime.java:829) > Event: 0.274 Thread 0x0000000146819210 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:943) > > > It is also be used to enhance the `-XX:+PrintCompilation` message issued for a failed compilation: > > COMPILE SKIPPED: uncaught exception in call_HotSpotJVMCIRuntime_compileMethod [compiler.jvmci.TestUncaughtErrorInCompileMethod$CompilerCreationError] > > > [1] https://docs.oracle.com/en/java/javase/17/docs/specs/jni/functions.html#exceptiondescribe This pull request has now been integrated. Changeset: 05c095cf Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/05c095cf39447d8becb3094c38c84a2c0853112b Stats: 427 lines in 12 files changed: 362 ins; 32 del; 33 mod 8308151: [JVMCI] capture JVMCI exceptions in hs-err Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/14000 From aph at openjdk.org Wed May 24 09:42:55 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 24 May 2023 09:42:55 GMT Subject: RFR: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file In-Reply-To: References: Message-ID: On Wed, 24 May 2023 02:16:16 GMT, Fei Gao wrote: > If a match rule belongs to one of the following situations, we can remove extra `UseSVE` Predicate: > > 1. If any src operand type is `pReg`, which is SVE specific, we can remove `Predicate(UseSVE > 0)`. But if only dst operand type is `pReg`, we can't remove `Predicate(UseSVE > 0)`, since the DFA of matcher selects by src operands and instruction cost, not involving dst operand. > > 2. If matcher can use src operand type, i.e., `pReg` or `vReg`, to distinguish sve from neon, we can remove > `Predicate(UseSVE == 0)` for rules on neon. > > 3. When the condition in `Predicate()` is false on current platform, it's definitely impossible to generate the corresponding node pattern from C2. Then we can remove `Predicate()`, like removing `predicate(UseSVE > 0)` for all `PopulateIndex` rules. > > After the patch, the code size of libjvm.so decreased from 25.42M to 25.39M, by 25.3K. > > Testing: > No new failures found on tier 1 - 3. > No significant performance regression compared with master. I appreciate the space saving, but this doesn't help maintainability. A predicate `UseSVE` is also a flag for the reader that this is SVE-only. It's also a simple rule to follow. Simple rules are good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14112#issuecomment-1560788510 From xlinzheng at openjdk.org Wed May 24 09:46:04 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Wed, 24 May 2023 09:46:04 GMT Subject: Integrated: 8308091: Remove unused iRegIHeapbase() matching operand In-Reply-To: References: Message-ID: On Mon, 15 May 2023 10:56:26 GMT, Xiaolin Zheng wrote: > The `iRegIHeapbase()` matching operand has no usage on both AArch64 and RISC-V platforms after [JDK-8242449](https://bugs.openjdk.org/browse/JDK-8242449) and [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667), respectively. As the following-up action discussed in the code review process of [JDK-8306667](https://bugs.openjdk.org/browse/JDK-8306667) (#13577), this is a small cleanup for the `iRegIHeapbase()` matching operand. > > Passed fastdebug/release build on both AArch64/RISC-V platforms. > > Thanks, > Xiaolin This pull request has now been integrated. Changeset: 2d4d8508 Author: Xiaolin Zheng Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/2d4d850813235a7533cd3bbf776adf69f90f02e6 Stats: 39 lines in 2 files changed: 0 ins; 31 del; 8 mod 8308091: Remove unused iRegIHeapbase() matching operand Reviewed-by: fyang, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13983 From fgao at openjdk.org Wed May 24 10:01:56 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 24 May 2023 10:01:56 GMT Subject: RFR: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file In-Reply-To: References: Message-ID: On Wed, 24 May 2023 09:40:07 GMT, Andrew Haley wrote: > I appreciate the space saving, but this doesn't help maintainability. A predicate `UseSVE` is also a flag for the reader that this is SVE-only. It's also a simple rule to follow. Simple rules are good. @theRealAph thanks for your review! How about moving these predicates as assertion lines to encode part? In this way, it could also work as indication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14112#issuecomment-1560817570 From chagedorn at openjdk.org Wed May 24 11:17:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 24 May 2023 11:17:59 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v4] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:32:01 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - review Marked as reviewed by chagedorn (Reviewer). All testing passed! ------------- PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1441577446 PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1560929054 From tholenstein at openjdk.org Wed May 24 11:23:01 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 24 May 2023 11:23:01 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v2] In-Reply-To: <2DEZv0Dqrai9gFHdJ4RN5SUno7CZzhMVXMEDxr7WB1c=.e260673d-db9d-42f2-8560-496e7c1a37b5@github.com> References: <2DEZv0Dqrai9gFHdJ4RN5SUno7CZzhMVXMEDxr7WB1c=.e260673d-db9d-42f2-8560-496e7c1a37b5@github.com> Message-ID: On Tue, 23 May 2023 07:55:35 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update Test8211698.java >> - Update src/hotspot/share/compiler/compilerOracle.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/compiler/compilerOracle.cpp >> >> Co-authored-by: Christian Hagedorn > > src/hotspot/share/compiler/compilerOracle.cpp line 1017: > >> 1015: if (line[0] == '\0') return; >> 1016: ResourceMark rm; >> 1017: char error_buf[1024] = {0}; > > Wouldn't it be sufficient to only initialize the first character with `\0`? Probably, but all other error buffer also init with zero. I will leave it for now. Can update them all together in the future ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13802#discussion_r1203930148 From roland at openjdk.org Wed May 24 11:35:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 24 May 2023 11:35:55 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v4] In-Reply-To: References: Message-ID: <4SG2y0uI1feXyvIbKy74v7OMubFuzllY4t6ax3vgfJk=.20f170ff-0bfa-4ba1-9fd6-472232a91da1@github.com> On Wed, 24 May 2023 11:14:21 GMT, Christian Hagedorn wrote: > All testing passed! Thanks for the review and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1560953280 From jwaters at openjdk.org Wed May 24 14:04:03 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 24 May 2023 14:04:03 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows Message-ID: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code ------------- Commit messages: - Fix the Java Integer types on Windows Changes: https://git.openjdk.org/jdk/pull/14125/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308780 Stats: 24 lines in 11 files changed: 0 ins; 5 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From vkempik at openjdk.org Wed May 24 15:37:07 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 24 May 2023 15:37:07 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: > Please review this fix. > vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. > This fix resolves this situation. > No noticable difference one might see in generated code for now. > > Testing: build testing only. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Revert part of the fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14102/files - new: https://git.openjdk.org/jdk/pull/14102/files/59612b8d..f385c66f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14102&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14102&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14102.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14102/head:pull/14102 PR: https://git.openjdk.org/jdk/pull/14102 From vkempik at openjdk.org Wed May 24 15:37:10 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 24 May 2023 15:37:10 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 04:07:10 GMT, Yanhong Zhu wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert part of the fix > > src/hotspot/cpu/riscv/riscv_v.ad line 2227: > >> 2225: iRegI_R10 result, vReg_V1 v1, vReg_V2 v2, vReg_V3 v3, vReg_V4 v4, >> 2226: vReg_V5 v5, vReg_V6 v6, vReg_V7 v7, >> 2227: vRegMask_V0 v0, iRegP_R28 tmp1, iRegL_R29 tmp2) > > Hi, > When StrCompNode is `StrIntrinsicNode::UU` or `StrIntrinsicNode::LL`, if-branch (element_compare) will be executed in `C2_MacroAssembler::string_compare_v`, and `lmul` is set to 2, so v6 and v7 are not used in `string_compareL` and `string_compareU`, here is the code: > C2_MacroAssembler::string_compare_v > > if (str1_isL == str2_isL) { // LL or UU > element_compare(str1, str2, zr, cnt2, tmp1, tmp2, v2, v4, v1, encLL, DIFFERENCE); > j(DONE); > } else { // LU or UL > Register strL = encLU ? str1 : str2; > Register strU = encLU ? str2 : str1; > > C2_MacroAssembler::element_compare > > bind(loop); > vsetvli(tmp1, cnt, sew, Assembler::m2); > vlex_v(vr1, a1, sew); > vlex_v(vr2, a2, sew); > > So we don't need to do this for the LL and UU case. Right, thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14102#discussion_r1204399890 From jsjolen at openjdk.org Wed May 24 18:01:10 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 24 May 2023 18:01:10 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ Message-ID: Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Merge remote-tracking branch 'origin/master' into JDK-8299974 - Last one? - Anotehr - Missing fail - Fixes - Replace NULL with nullptr in share/adlc/ Changes: https://git.openjdk.org/jdk/pull/14008/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14008&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8299974 Stats: 1265 lines in 19 files changed: 0 ins; 0 del; 1265 mod Patch: https://git.openjdk.org/jdk/pull/14008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14008/head:pull/14008 PR: https://git.openjdk.org/jdk/pull/14008 From jsjolen at openjdk.org Wed May 24 18:01:28 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 24 May 2023 18:01:28 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Mostly alignment and comment issues. Found all (most?) of the faulty NULL -> null conversions. Hm. I think that this is completely broken. This is a compiler/code generation tool so any string containing `null` instead of `NULL` now is incorrect. I'll leave this until a bit later. Builds on linux-x64, running tier1. Open for review. src/hotspot/share/adlc/adlArena.cpp line 56: > 54: > 55: AdlChunk::AdlChunk(size_t length) { > 56: _next = nullptr; // Chain on the linked list Align src/hotspot/share/adlc/adlArena.cpp line 168: > 166: AdlArena *a = new AdlArena(this); // New empty arena > 167: _first = _chunk = nullptr; // Normal, new-arena initialization > 168: _hwm = _max = nullptr; Align src/hotspot/share/adlc/adlparse.cpp line 871: > 869: do { > 870: char *pType = nullptr; // parameter type > 871: char *pName = nullptr; // parameter name Align? src/hotspot/share/adlc/adlparse.cpp line 973: > 971: void ADLParser::frame_parse(void) { > 972: FrameForm *frame; // Information about stack-frame layout > 973: char *desc = nullptr; // String representation of frame align src/hotspot/share/adlc/adlparse.cpp line 1067: > 1065: } > 1066: // !!!!! !!!!! > 1067: // if(frame->_interpreter_frame_ptr_reg == null) { nullptr src/hotspot/share/adlc/adlparse.cpp line 1219: > 1217: //------------------------------return_value_parse----------------------------- > 1218: char *ADLParser::return_value_parse() { > 1219: char *desc = nullptr; // String representation of return_value align src/hotspot/share/adlc/adlparse.cpp line 2033: > 2031: void ADLParser::peep_parse(void) { > 2032: Peephole *peep; // Pointer to current peephole rule form > 2033: char *desc = nullptr; // String representation of rule align src/hotspot/share/adlc/adlparse.cpp line 3159: > 3157: // if ( _curchar != ',' && _curchar != ')' ) { > 3158: // parse_err(SYNERR, "expected ',' or ')' after encode method inside ins_encode.\n"); > 3159: // return null; nullptr src/hotspot/share/adlc/adlparse.cpp line 3837: > 3835: MatchRule *ADLParser::match_parse(FormDict &operands) { > 3836: MatchRule *match; // Match Rule class for instruction/operand > 3837: char *cnstr = nullptr; // Code for constructor align src/hotspot/share/adlc/adlparse.cpp line 3847: > 3845: skipws(); // Skip whitespace > 3846: if ( _curchar == ';' ) { // Semicolon is valid terminator > 3847: cnstr = nullptr; // no constructor for this form align src/hotspot/share/adlc/adlparse.cpp line 3853: > 3851: parse_err(SYNERR, "invalid construction of match rule\n" > 3852: "Missing ';' or invalid '%%{' and '%%}' constructor\n"); > 3853: return nullptr; // No MatchRule to return align src/hotspot/share/adlc/adlparse.cpp line 3871: > 3869: skipws(); // Skip whitespace > 3870: if ( _curchar == ';' ) { // Semicolon is valid terminator > 3871: desc = nullptr; // no constructor for this form align src/hotspot/share/adlc/adlparse.cpp line 4304: > 4302: Attribute *ADLParser::attr_parse(char* ident) { > 4303: Attribute *attrib; // Attribute class > 4304: char *cost = nullptr; // String representation of cost attribute align src/hotspot/share/adlc/adlparse.cpp line 4362: > 4360: // Lookup the root value in the operands dict to perform substitution > 4361: const char *result = nullptr; // Result type will be filled in later > 4362: const char *name = token; // local name associated with this node align src/hotspot/share/adlc/adlparse.cpp line 4457: > 4455: char* ADLParser::find_cpp_block(const char* description) { > 4456: char *next; // Pointer for finding block delimiters > 4457: char* cppBlock = nullptr; // Beginning of C++ code block align src/hotspot/share/adlc/adlparse.cpp line 4794: > 4792: int result; // Storage for integer result > 4793: > 4794: if( _curline == nullptr ) // Return null at EOF. align src/hotspot/share/adlc/adlparse.cpp line 4830: > 4828: > 4829: if( _curline == nullptr ) // Return null at EOF. > 4830: return nullptr; align src/hotspot/share/adlc/adlparse.cpp line 5188: > 5186: } > 5187: } > 5188: while(_curline != nullptr) { // Check for end of file align src/hotspot/share/adlc/adlparse.cpp line 5202: > 5200: if (*_ptr == '\n') { // keep proper track of new lines > 5201: next_line(); // skip newlines within comments > 5202: if (_curline == nullptr) { // check for end of file align src/hotspot/share/adlc/adlparse.cpp line 5241: > 5239: else { ++_ptr; ++next; } > 5240: } > 5241: if( _curline != nullptr ) // at end of file _curchar isn't valid align src/hotspot/share/adlc/archDesc.cpp line 394: > 392: // const Form *form = operands[_result]; > 393: // OpClassForm *opcForm = form ? form->is_opclass() : null; > 394: // assert(opcForm != null, "Match Rule contains invalid operand name."); nullptr src/hotspot/share/adlc/dfa.cpp line 196: > 194: // If unpredicated vector unary operation, add one extra check, i.e. right > 195: // child should be null, to distinguish from the predicated version. > 196: fprintf(fp, " && _kids[1] == null"); nullptr src/hotspot/share/adlc/dfa.cpp line 952: > 950: printf("%s", (_result == null ? "null" : _result ) ); > 951: printf("%s", (_constraint == null ? "null" : _constraint ) ); > 952: printf("%s", (_valid == null ? "null" : _valid ) ); nullptr src/hotspot/share/adlc/dict2.cpp line 207: > 205: b->_keyvals[b->_cnt+b->_cnt+1] = val; > 206: b->_cnt++; > 207: return nullptr; // Nothing found prior align src/hotspot/share/adlc/forms.cpp line 306: > 304: // Form *cur = _root; > 305: // Form *next = null; > 306: // for( ; (cur = next) != null; ) { nullptr src/hotspot/share/adlc/forms.hpp line 517: > 515: Max = 0x7fffffff > 516: }; > 517: const char *_external_name; // if !null, then print this instead of _expr if not src/hotspot/share/adlc/formssel.cpp line 301: > 299: if (strcmp(opType,"ThreadLocal")==0) { > 300: fprintf(stderr, "Warning: ThreadLocal instruction %s should be named 'tlsLoadP_*'\n", > 301: (_ident == nullptr ? "null" : _ident)); nullptr src/hotspot/share/adlc/formssel.cpp line 714: > 712: // // unique def, some uses > 713: // // must return bottom unless all uses match def > 714: // unique = null; nullptr? src/hotspot/share/adlc/formssel.cpp line 984: > 982: } else { > 983: // This would be a nice warning but it triggers in a few places in a benign way > 984: // if (_matrule != null && !expands()) { nullptr src/hotspot/share/adlc/formssel.cpp line 2947: > 2945: // // This list may not own its elements if copied via assignment > 2946: // Component *component; > 2947: // for (reset(); (component = iter()) != null;) { nullptr src/hotspot/share/adlc/formssel.cpp line 3385: > 3383: : mnode->_rChild->_opType; > 3384: } > 3385: // Else, May be simple chain rule: (Set dst operand_form), rightStr=null; nullptr src/hotspot/share/adlc/output_c.cpp line 779: > 777: fprintf(fp_cpp, "static const Pipeline pipeline_class_Zero_Instructions(0, 0, true, 0, 0, false, false, false, false, null, null, null, Pipeline_Use(0, 0, 0, null));\n\n"); > 778: fprintf(fp_cpp, "static const Pipeline pipeline_class_Unknown_Instructions(0, 0, true, 0, 0, false, true, true, false, null, null, null, Pipeline_Use(0, 0, 0, null));\n\n"); > 779: nullptr src/hotspot/share/adlc/output_c.cpp line 892: > 890: } > 891: else > 892: fprintf(fp_cpp, " null,"); nullptr src/hotspot/share/adlc/output_c.cpp line 903: > 901: pipeline_res_mask_index+1); > 902: else > 903: fprintf(fp_cpp, "null"); nullptr src/hotspot/share/adlc/output_c.cpp line 1050: > 1048: print_block_index(fp, inst_position); > 1049: fprintf(fp, ");\n inst%d = (n->is_Mach()) ? ", inst_position); > 1050: fprintf(fp, "n->as_Mach() : null;\n }\n"); nullptr src/hotspot/share/adlc/output_c.cpp line 1056: > 1054: // Test we have the correct instruction by comparing the rule. > 1055: if( parent != -1 ) { > 1056: fprintf(fp, " matches = matches && (inst%d != null) && (inst%d->rule() == %s_rule);\n", nullptr src/hotspot/share/adlc/output_c.cpp line 1363: > 1361: for (int i = 0; i <= max_position; i++) { > 1362: fprintf(fp, " inst%d->set_removed();\n", i); > 1363: fprintf(fp, " cfg_->map_node_to_block(inst%d, null);\n", i); nullptr src/hotspot/share/adlc/output_c.cpp line 1388: > 1386: // ... > 1387: // MachNode *instMAX = null; > 1388: // nullptr src/hotspot/share/adlc/output_c.cpp line 1403: > 1401: fprintf(fp, " MachNode *inst0 = this;\n"); > 1402: } else { > 1403: fprintf(fp, " MachNode *inst%d = null;\n", i); nullptr src/hotspot/share/adlc/output_c.cpp line 1524: > 1522: } > 1523: else { > 1524: fprintf(fp," MachNode *tmp%d = null;\n", i); nullptr src/hotspot/share/adlc/output_c.cpp line 1554: > 1552: > 1553: // Declare variable to hold root of expansion > 1554: fprintf(fp," MachNode *result = null;\n"); nullptr src/hotspot/share/adlc/output_c.cpp line 1642: > 1640: cnt, new_pos, exp_pos-node->num_opnds(), opid); > 1641: // Check for who defines this operand & add edge if needed > 1642: fprintf(fp," if(tmp%d != null)\n", exp_pos); nullptr src/hotspot/share/adlc/output_c.cpp line 2862: > 2860: fprintf(fp," }\n"); > 2861: fprintf(fp," ShouldNotReachHere();\n"); > 2862: fprintf(fp," return null;\n"); nullptr src/hotspot/share/adlc/output_c.cpp line 3354: > 3352: fprintf(_CPP_PIPELINE_file._fp, "const Pipeline * %*s::pipeline_class() { return %s; }\n", > 3353: max_ident_len, "MachNode", _pipeline ? "(&pipeline_class_Unknown_Instructions)" : "null"); > 3354: fprintf(_CPP_PIPELINE_file._fp, "const Pipeline * %*s::pipeline() const { return pipeline_class(); }\n", nullptr src/hotspot/share/adlc/output_c.cpp line 3903: > 3901: // Generate the case statement for this opcode > 3902: fprintf(fp_cpp, " case %s:", opEnumName); > 3903: fprintf(fp_cpp, " return null;\n"); nullptr src/hotspot/share/adlc/output_c.cpp line 3915: > 3913: > 3914: // Generate the closing for method Matcher::MachOperGenerator > 3915: fprintf(fp_cpp, " return null;\n"); nullptr src/hotspot/share/adlc/output_c.cpp line 4174: > 4172: > 4173: // Generate the closing for method Matcher::MachNodeGenerator > 4174: fprintf(fp_cpp, " return null;\n"); nullptr src/hotspot/share/adlc/output_h.cpp line 583: > 581: oper.ext_format(fp, globals, 0); > 582: } > 583: } else { // oper._format == null is null src/hotspot/share/adlc/output_h.cpp line 715: > 713: else if( inst.is_ideal_mem() ) { > 714: // Print out the field name if available to improve readability > 715: fprintf(fp, " if (ra->C->alias_type(adr_type())->field() != null) {\n"); nullptr src/hotspot/share/adlc/output_h.cpp line 1115: > 1113: > 1114: // const char *classname; > 1115: // for (_pipeline->_classlist.reset(); (classname = _pipeline->_classlist.iter()) != null; ) { nullptr ------------- PR Review: https://git.openjdk.org/jdk/pull/14008#pullrequestreview-1428402551 PR Review: https://git.openjdk.org/jdk/pull/14008#pullrequestreview-1442331985 PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1549688257 PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1561699876 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195049718 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195049916 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195050812 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195051007 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195051165 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195051376 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195051690 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195052827 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195053371 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195053476 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195053564 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195053647 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195054014 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195054110 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195054282 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195054859 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195055035 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195055284 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195055381 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195055471 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195055895 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204483422 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204484056 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195056915 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195057492 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195057741 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204485353 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195059076 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195059369 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195060351 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195060588 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204488502 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204488791 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204488867 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204489279 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204489338 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204489705 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195061423 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204489843 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204490104 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204490222 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204490341 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204491334 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204492272 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204492782 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204492858 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204493048 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195062510 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204493636 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1195062686 From dlong at openjdk.org Wed May 24 18:30:01 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 May 2023 18:30:01 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: <9IltgApBXNZd3Oznbx3UO_9cQtXRKhvyD1VGAl6EW8Q=.b8152734-7365-4796-9014-10f717ed3df0@github.com> On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! src/hotspot/share/adlc/archDesc.cpp line 85: > 83: output(stderr); > 84: } > 85: Missing copyright year update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204607799 From never at openjdk.org Wed May 24 18:43:08 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 24 May 2023 18:43:08 GMT Subject: RFR: 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 In-Reply-To: References: Message-ID: On Mon, 22 May 2023 17:33:43 GMT, Tom Rodriguez wrote: > adjust requires declaration Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14091#issuecomment-1561754858 From never at openjdk.org Wed May 24 18:43:08 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 24 May 2023 18:43:08 GMT Subject: Integrated: 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 In-Reply-To: References: Message-ID: <2MedffH2ovgb6Kvoqu_NKrB1gweiOWLEhG2DKmhjSjk=.c3cc711f-5f7c-45cd-bfe2-5966b837d257@github.com> On Mon, 22 May 2023 17:33:43 GMT, Tom Rodriguez wrote: > adjust requires declaration This pull request has now been integrated. Changeset: ac89e304 Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/ac89e3045b653969dfce48a2b34fd37078a2b958 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8308291: compiler/jvmci/meta/ProfilingInfoTest.java fails with -XX:TieredStopAtLevel=1 Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14091 From duke at openjdk.org Wed May 24 18:47:21 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 24 May 2023 18:47:21 GMT Subject: RFR: 8308672: Add version number in the replay file generated by DumpInline Message-ID: Please review this PR that adds version to the replay file generated by DumpInline. ------------- Commit messages: - 8308672: Add version number in the replay file generated by DumpInline Changes: https://git.openjdk.org/jdk/pull/14131/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14131&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308672 Stats: 31 lines in 3 files changed: 10 ins; 5 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/14131.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14131/head:pull/14131 PR: https://git.openjdk.org/jdk/pull/14131 From dlong at openjdk.org Wed May 24 18:57:58 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 May 2023 18:57:58 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good, except I found one missing copyright update. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14008#pullrequestreview-1442560227 From kvn at openjdk.org Wed May 24 19:19:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 May 2023 19:19:09 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! I have few comments src/hotspot/share/adlc/adlparse.cpp line 4595: > 4593: > 4594: if( _curline == nullptr ) // Return null at EOF. > 4595: return nullptr; Spacing of comment. Please, fix code style to: if (_curline == nullptr) { // Return null at EOF. return nullptr; } src/hotspot/share/adlc/adlparse.cpp line 4627: > 4625: > 4626: // Make sure we do not try to use #defined identifiers. If start is > 4627: // null an error was already reported. May be use `nullptr` here since we are talking about value in `start` variable. src/hotspot/share/adlc/adlparse.cpp line 4795: > 4793: > 4794: if( _curline == nullptr ) // Return null at EOF. > 4795: return 0; Code style. src/hotspot/share/adlc/adlparse.cpp line 4830: > 4828: > 4829: if( _curline == nullptr ) // Return null at EOF. > 4830: return nullptr; Code style. src/hotspot/share/adlc/adlparse.cpp line 5242: > 5240: } > 5241: if( _curline != nullptr ) // at end of file _curchar isn't valid > 5242: _curchar = *_ptr; // reset _curchar to maintain invariant Code style. ------------- PR Review: https://git.openjdk.org/jdk/pull/14008#pullrequestreview-1442573130 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204642438 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204644334 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204645287 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204645560 PR Review Comment: https://git.openjdk.org/jdk/pull/14008#discussion_r1204646229 From kvn at openjdk.org Wed May 24 19:23:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 May 2023 19:23:55 GMT Subject: RFR: 8308672: Add version number in the replay file generated by DumpInline In-Reply-To: References: Message-ID: On Wed, 24 May 2023 18:27:26 GMT, Ashutosh Mehra wrote: > Please review this PR that adds version to the replay file generated by DumpInline. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14131#pullrequestreview-1442598626 From duke at openjdk.org Wed May 24 21:46:54 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 24 May 2023 21:46:54 GMT Subject: RFR: 8308672: Add version number in the replay file generated by DumpInline In-Reply-To: References: Message-ID: On Wed, 24 May 2023 19:21:27 GMT, Vladimir Kozlov wrote: >> Please review this PR that adds version to the replay file generated by DumpInline. > > Looks good. @vnkozlov thanks for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14131#issuecomment-1561964987 From never at openjdk.org Wed May 24 23:54:01 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 24 May 2023 23:54:01 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations [v2] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 09:24:54 GMT, Doug Simon wrote: >> This PRs adds JVMCI API to reflect the fact that [deferred locals are not supported on virtual threads](https://bugs.openjdk.org/browse/JDK-8307125?focusedCommentId=14578728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578728). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - remove MaterializeVirtualObjectTest.java from ProblemList-Virtual.txt > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8307125 > - materializing frames on virtual threads is not supported Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13777#pullrequestreview-1442895848 From yzhu at openjdk.org Thu May 25 01:22:56 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 25 May 2023 01:22:56 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 15:37:07 GMT, Vladimir Kempik wrote: >> Please review this fix. >> vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. >> This fix resolves this situation. >> No noticable difference one might see in generated code for now. >> >> Testing: build testing only. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Revert part of the fix Looks good. ------------- Marked as reviewed by yzhu (Author). PR Review: https://git.openjdk.org/jdk/pull/14102#pullrequestreview-1442945945 From sviswanathan at openjdk.org Thu May 25 01:23:59 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 25 May 2023 01:23:59 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v8] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 07:09:37 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup from code review src/hotspot/share/opto/addnode.cpp line 902: > 900: return new CMoveINode(in1->in(CMoveNode::Condition), phase->intcon(l_val ^ in2_val), phase->intcon(r_val ^ in2_val), TypeInt::INT); > 901: } > 902: } An isa_int() check is needed before doing is_int()->get_con(). Something like below: ``` const TypeInt* in2type = phase->type(in2)->isa_int(); const TypeInt* ltype = phase->type(in1->in(CMoveNode::IfFalse))->isa_int(); const TypeInt* rtype = phase->type(in1->in(CMoveNode::IfTrue))->isa_int(); if (in2type && ltype && rtype) { int in2_val = in2type->get_con(); int l_val = ltype->get_con(); int r_val = rtype->get_con(); if (cmp_op == Op_CmpI || cmp_op == Op_CmpP) { return new CMoveINode(in1->in(CMoveNode::Condition), phase->intcon(l_val ^ in2_val), phase->intcon(r_val ^ in2_val), TypeInt::INT); } } This should fix the crash seen in vectorapi test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1204900169 From gcao at openjdk.org Thu May 25 01:33:54 2023 From: gcao at openjdk.org (Gui Cao) Date: Thu, 25 May 2023 01:33:54 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: On Wed, 24 May 2023 15:37:07 GMT, Vladimir Kempik wrote: >> Please review this fix. >> vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. >> This fix resolves this situation. >> No noticable difference one might see in generated code for now. >> >> Testing: build testing only. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Revert part of the fix Hi, I tested tier1 and `test/jdk/jdk/incubator/vector` before and after using this PR , and the test results are consistent, no new errors are introduced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14102#issuecomment-1562134928 From dholmes at openjdk.org Thu May 25 01:36:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 May 2023 01:36:55 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code I think the JNI type definition change is okay. However many of the other changes appear to me to not involve Java variables and so don't need to be Java types i.e they should be `int` rather than `jint` - though as these are native Windows types there may not actually be any reason to change them from `long`. This is for the client-libs folk to decide. src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 325: > 323: } > 324: > 325: jint sx, sy, ex, ey; These are not Java variables. They get passed to the win32 GDI Arc function below which expects `int`. src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 605: > 603: return; > 604: } > 605: jint sx, sy, ex, ey; Again these don't seem to need to be Java types. ------------- PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1442950619 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1204903525 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1204904147 From fyang at openjdk.org Thu May 25 01:49:04 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 25 May 2023 01:49:04 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: <8MQAfGZdQLxim8mq_8udyaoxUI0dCTuvD92WHGQLg-8=.92d81988-52b2-4fa6-aaca-b9c5eae369d0@github.com> On Wed, 24 May 2023 15:37:07 GMT, Vladimir Kempik wrote: >> Please review this fix. >> vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. >> This fix resolves this situation. >> No noticable difference one might see in generated code for now. >> >> Testing: build testing only. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Revert part of the fix Looks reasonable. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14102#pullrequestreview-1442960458 From gcao at openjdk.org Thu May 25 03:29:50 2023 From: gcao at openjdk.org (Gui Cao) Date: Thu, 25 May 2023 03:29:50 GMT Subject: RFR: 8308817: RISC-V: Support VectorTest node for Vector API Message-ID: Hi, we have added VectorTest node, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. For example, we can use the following command to print the compilation log of a jtreg test case: /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ -v:default \ -concurrency:16 -timeout:50 \ -javaoption:-XX:+UnlockExperimentalVMOptions \ -javaoption:-XX:+UseRVV \ -javaoption:-XX:+PrintOptoAssembly \ -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/Int256VectorTests_PrintOptoAssembly_20230525.log \ -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java Also here's a more concise test case, VectorTestDemo: import jdk.incubator.vector.ByteVector; import jdk.incubator.vector.VectorMask; public class VectorTestDemo { static boolean[] d = new boolean[]{true, false, false, false, false, false, false, false}; static VectorMask avmask = VectorMask.fromArray(ByteVector.SPECIES_64, d, 0); public static void main(String[] args) { for (int i = 0; i < 300000; i++) { final boolean alltrue = alltrue(); if (alltrue != false) { throw new RuntimeException("alltrue"); } final boolean anytrue = anytrue(); if (anytrue != true) { throw new RuntimeException("anytrue"); } } } public static boolean anytrue() { return avmask.anyTrue(); } public static boolean alltrue() { return avmask.allTrue(); } } We can compile `VectorTestDemo.java` using `javac --add-modules jdk.incubator.vector VectorTestDemo.java`, and use `./java -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseRVV -XX:+PrintOptoAssembly -XX:+LogCompilation -XX:LogFile=compile.log VectorTestDemo > aaa.log` to start the test case, we can observe the specified compilation log `compile.log`, which contains the VectorTest node for the PR implementation. Some of the compilation logs of VectorTestDemo#anytrue method are as follows. 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) 062 decode_heap_oop R7, R28 #@decodeHeapOop 066 addi R7, R7, #16 # ptr, #@addP_reg_imm 068 loadV V1, [R7] # vector (rvv) 070 vloadmask V0, V1 078 CMove R10, (vectortest eq V0 V0), zero, one #@cmovI_vtest_anytrue_negate Some of the compilation logs of the VectorTestDemo#alltrue method are as follows. 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (constant) 062 decode_heap_oop R7, R28 #@decodeHeapOop 066 addi R7, R7, #16 # ptr, #@addP_reg_imm 068 loadV V1, [R7] # vector (rvv) 070 vloadmask V0, V1 078 CMove R10, (vectortest ne V0 V0), zero, one #@cmovI_vtest_alltrue_negate Some of the compilation logs of VectorTest#main method are as follows. 0b2 decode_heap_oop R7, R7 #@decodeHeapOop 0b4 addi R7, R7, #16 # ptr, #@addP_reg_imm 0b6 loadV V1, [R7] # vector (rvv) 0be vloadmask V0, V1 0c6 CMove R28, (vectortest eq V0 V0), zero, one #@cmovI_vtest_anytrue_negate 0d2 beq (vectortest overflow V0, V0) B8 #@vtest_alltrue_branch P=0.000001 C=-1.000000 As comment in the PR in cmovI_vtest_anytrue instruct, cmpOp is negated in CMoveINode::Ideal. So we keep this version for better understanding of the code change. [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java ### Testing: qemu with UseRVV: - [ ] Tier1 tests (release) - [ ] Tier2 tests (release) - [ ] Tier3 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - Support VectorTest node for Vector API Changes: https://git.openjdk.org/jdk/pull/14138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14138&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308817 Stats: 108 lines in 2 files changed: 107 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14138/head:pull/14138 PR: https://git.openjdk.org/jdk/pull/14138 From vkempik at openjdk.org Thu May 25 05:12:04 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 25 May 2023 05:12:04 GMT Subject: RFR: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 01:31:06 GMT, Gui Cao wrote: > Hi, I tested tier1 and `test/jdk/jdk/incubator/vector` before and after using this PR and the test results are the same, no new errors were introduced (using QEMU and with UseRVV enabled in JDK). Thanks a lot, these vector changes aren't easy to test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14102#issuecomment-1562275154 From vkempik at openjdk.org Thu May 25 05:12:05 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 25 May 2023 05:12:05 GMT Subject: Integrated: 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers In-Reply-To: References: Message-ID: On Tue, 23 May 2023 13:31:18 GMT, Vladimir Kempik wrote: > Please review this fix. > vstring_compare instrinsic ( from c2_MacroAssembler_riscv.cpp ) uses vector registers v6 and v7 ( https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1482 , vstr1 == v4, lmul=4) , but doesn't manifest their usage in riscv_v.ad file. > This fix resolves this situation. > No noticable difference one might see in generated code for now. > > Testing: build testing only. This pull request has now been integrated. Changeset: 2a18e537 Author: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/2a18e537d60c88c015bea738764eef2ca610abf1 Stats: 36 lines in 2 files changed: 32 ins; 0 del; 4 mod 8308656: RISC-V: vstring_compare doesnt manifest usage of all vector registers Reviewed-by: yzhu, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14102 From epeter at openjdk.org Thu May 25 06:43:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 May 2023 06:43:55 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: Message-ID: <_Pm4QfF6KyuXOqiFqcxuHuBwdrY6RZggOIC7GlSNkRs=.20e78329-a7b0-4b2d-a8a5-2c9afc63067a@github.com> On Tue, 23 May 2023 22:35:04 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change to superword_max_vector_size Running testing at commit 4. Will report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1562361887 From roland at openjdk.org Thu May 25 06:49:13 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 May 2023 06:49:13 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast Message-ID: At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/14123/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14123&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308583 Stats: 103 lines in 3 files changed: 101 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14123/head:pull/14123 PR: https://git.openjdk.org/jdk/pull/14123 From jwaters at openjdk.org Thu May 25 07:25:54 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 25 May 2023 07:25:54 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 25 May 2023 01:34:15 GMT, David Holmes wrote: > I think the JNI type definition change is okay. > > However many of the other changes appear to me to not involve Java variables and so don't need to be Java types i.e they should be `int` rather than `jint` - though as these are native Windows types there may not actually be any reason to change them from `long`. This is for the client-libs folk to decide. All the changes from long were done since there was conversion from or to a jint somewhere down the line, and the compilation would fail if not done otherwise. I also changed them to jint rather than int so there wouldn't be a need to keep the variables in sync with the jni.h declarations, but I guess I'll wait for more reviews to see what to do here ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1562408466 From jwaters at openjdk.org Thu May 25 07:28:57 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 25 May 2023 07:28:57 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Going to page for @aivanov-jdk for `client-libs` review ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1562412122 From dholmes at openjdk.org Thu May 25 07:33:54 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 May 2023 07:33:54 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <3gHtTfGRCwS2s9JxA8cpAdZmaw9ShU1v1VPIT--Rn5I=.2414e309-e34f-437d-b4dd-6a734e5bb88f@github.com> On Thu, 25 May 2023 07:22:50 GMT, Julian Waters wrote: > All the changes from long were done since there was conversion from or to a jint somewhere down the line, Okay I see that now. It is a messy situation - at some point the incoming jint's need to be "converted" to a native type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1562418414 From jwaters at openjdk.org Thu May 25 08:51:55 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 25 May 2023 08:51:55 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code I'll see what I can do, I'll check the parameter type for the methods that are called in relevant code, though if they take an int as an argument I'm not really sure what to change ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1562526271 From aph at openjdk.org Thu May 25 09:02:13 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 May 2023 09:02:13 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: On Thu, 18 May 2023 09:50:13 GMT, Chang Peng wrote: >> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. >> >> For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. >> >> However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. >> >> This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. >> >> For example, >> >> >> var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); >> m.not().trueCount(); >> >> >> will produce following assembly on a Neon machine before this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> xtn v16.4h, v16.4s >> xtn v16.8b, v16.8h >> neg v16.8b, v16.8b // VectorStoreMask >> addv b17, v16.8b >> umov w0, v17.b[0] // VectorMask.trueCount() >> ... >> >> >> After this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> addv s17, v16.4s >> smov x0, v17.b[0] >> neg x0, x0 // Optimized VectorMask.trueCount() >> ... >> >> >> In this case, we can save two xtn insns. >> >> Performance: >> >> Benchmark Before After Unit >> testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms >> testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms >> testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms >> >> [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vect... > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update benchmark to avoid potential optimization Marked as reviewed by aph (Reviewer). src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 3825: > 3823: %} > 3824: > 3825: // Combined rule for VectorStoreMask + VectorMaskTrueCount when the vector element type is not T_BYTE. Suggestion: // Combined rule for VectorMaskTrueCount (VectorStoreMask) when the vector element type is not T_BYTE. Using `+` is unnecessarily confusing, as is swapping the operands. ------------- PR Review: https://git.openjdk.org/jdk/pull/13974#pullrequestreview-1443414797 PR Review Comment: https://git.openjdk.org/jdk/pull/13974#discussion_r1205210387 From jwaters at openjdk.org Thu May 25 09:25:59 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 25 May 2023 09:25:59 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <18Hb8qz70ED7U2vLbTCN11B28K117KPmEh8CIVASDMk=.136e983b-1cad-43e5-b8c9-b9495a60ff84@github.com> On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code For reference as to what types the calls in affected code accepts as parameters, so any future reviews don't need to dig through the code GDIRenderer.cpp: - Java_sun_java2d_windows_GDIRenderer_doFillArc * AngleToCoord takes jints as arguments * ::Pie takes ints as arguments - Java_sun_java2d_windows_GDIRenderer_doDrawArc * AngleToCoord takes jints as arguments * ::Arc takes ints as arguments GDIWindowSurfaceData.cpp: - GDIWinSD_GetRasInfo * SurfaceDataRasInfo's lutBase field is a jint* awt_MenuBar.cpp (Encompasses the changes in awt_Menu.h and awt_MenuBar.h as well): - AwtMenuBar::GetItem passes the only relevant jint (formerly long) into env->CallObjectMethod() AccessInfo.cpp - getAccessibleInfo * start and end (both formerly long) are both passed to GetAccessibleTextLineBounds and GetAccessibleTextRange which take jints Only outlier is jaccesswalker, which I think I may have edited wrongly ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1562575536 From thartmann at openjdk.org Thu May 25 09:35:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 May 2023 09:35:58 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast In-Reply-To: References: Message-ID: On Wed, 24 May 2023 12:31:37 GMT, Roland Westrelin wrote: > At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14123#pullrequestreview-1443489618 From rcastanedalo at openjdk.org Thu May 25 09:36:01 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 May 2023 09:36:01 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 Message-ID: `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). #### Testing - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. ------------- Commit messages: - Require avx.* CPU feature instead of UseAVX for robustness Changes: https://git.openjdk.org/jdk/pull/14141/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14141&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308746 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14141.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14141/head:pull/14141 PR: https://git.openjdk.org/jdk/pull/14141 From chagedorn at openjdk.org Thu May 25 09:43:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 May 2023 09:43:08 GMT Subject: Integrated: 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes In-Reply-To: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> References: <0B3rIF7gU5vYveLCLF4ML1JapupYhzZ321DkQb0A9Xs=.b692ef86-8dd2-4a82-9511-c7ec1480222b@github.com> Message-ID: On Tue, 16 May 2023 15:27:26 GMT, Christian Hagedorn wrote: > This is the second PR towards fixing the issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch still does not change anything in the way the old Assertion Predicates work. The only observable change in the IR is the introduction of a new `ParsePredicateNode` instead of using an `IfNode` to better distinguish these dedicated Parse Predicates added during parsing (they still use the same inputs with `Opaque1Nodes` as before). > > Changes include: > - New `ParsePredicateNode` as subclass of `IfNode` and related code updates to make this work. > - Moving predicate access code (skipping, matching etc.), including the called predicate methods found in `PhaseIdealLoop`, to dedicated `Predicates/ParsePredicates` classes. This is only a first step and these classes are further updated in the next PR. They can therefore be seen as an intermediate state to make the entire update to predicate classes easier to follow. As a consequence, I've tried to not clean the code up too much in these classes. > - Cleanup of touched code (dead code, variable renaming, code style) > - Added comments (e.g. for some special case in Loop Predication) > > For more background, have a look at the first PR: #13864 > > Thanks, > Christian This pull request has now been integrated. Changeset: 4f096eb7 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4f096eb7c9066e5127d9ab8c1c893e991a23d316 Stats: 623 lines in 12 files changed: 295 ins; 195 del; 133 mod 8305635: Replace Parse Predicate IfNode with new ParsePredicateNode and route predicate queries through dedicated classes Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14017 From chagedorn at openjdk.org Thu May 25 09:49:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 May 2023 09:49:59 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: References: Message-ID: <4EzTynEiFU6j-SXZiWXljNj-CN7J7xchzoXzPLr_-is=.0c80f92c-e9a0-41cc-b1e9-5675b5721a76@github.com> On Thu, 25 May 2023 07:04:54 GMT, Roberto Casta?eda Lozano wrote: > `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). > > #### Testing > > - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). > - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14141#pullrequestreview-1443510725 From thartmann at openjdk.org Thu May 25 09:49:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 May 2023 09:49:59 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: References: Message-ID: On Thu, 25 May 2023 07:04:54 GMT, Roberto Casta?eda Lozano wrote: > `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). > > #### Testing > > - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). > - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14141#pullrequestreview-1443511017 From thartmann at openjdk.org Thu May 25 09:50:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 May 2023 09:50:00 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: <4EzTynEiFU6j-SXZiWXljNj-CN7J7xchzoXzPLr_-is=.0c80f92c-e9a0-41cc-b1e9-5675b5721a76@github.com> References: <4EzTynEiFU6j-SXZiWXljNj-CN7J7xchzoXzPLr_-is=.0c80f92c-e9a0-41cc-b1e9-5675b5721a76@github.com> Message-ID: <52DdL82wedU-mcR1dE0NaFMqf1nem7LzLG3Ww9Zwrfg=.9480cf7d-9052-4937-befe-f0737a699b8d@github.com> On Thu, 25 May 2023 09:44:48 GMT, Christian Hagedorn wrote: >> `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). >> >> #### Testing >> >> - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). >> - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. > > Looks good and trivial! @chhagedorn We seem to be well-synced this week :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14141#issuecomment-1562608024 From chagedorn at openjdk.org Thu May 25 09:50:01 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 May 2023 09:50:01 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: <52DdL82wedU-mcR1dE0NaFMqf1nem7LzLG3Ww9Zwrfg=.9480cf7d-9052-4937-befe-f0737a699b8d@github.com> References: <4EzTynEiFU6j-SXZiWXljNj-CN7J7xchzoXzPLr_-is=.0c80f92c-e9a0-41cc-b1e9-5675b5721a76@github.com> <52DdL82wedU-mcR1dE0NaFMqf1nem7LzLG3Ww9Zwrfg=.9480cf7d-9052-4937-befe-f0737a699b8d@github.com> Message-ID: <-6c2BacSRxIuJkGsB_NS3s5IQ4Vv_ovbXNB1gg5wHh4=.c4209b93-761b-434b-89ff-919a9fc11c41@github.com> On Thu, 25 May 2023 09:46:23 GMT, Tobias Hartmann wrote: >> Looks good and trivial! > > @chhagedorn We seem to be well-synced this week :) @TobiHartmann Haha, yes indeed! :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14141#issuecomment-1562608956 From rcastanedalo at openjdk.org Thu May 25 11:10:04 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 May 2023 11:10:04 GMT Subject: RFR: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: References: Message-ID: On Thu, 25 May 2023 07:04:54 GMT, Roberto Casta?eda Lozano wrote: > `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). > > #### Testing > > - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). > - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. Thanks for the fast reviews Christian and Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14141#issuecomment-1562712404 From rcastanedalo at openjdk.org Thu May 25 11:10:05 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 May 2023 11:10:05 GMT Subject: Integrated: 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 In-Reply-To: References: Message-ID: On Thu, 25 May 2023 07:04:54 GMT, Roberto Casta?eda Lozano wrote: > `TestFpMinMaxReductions.java` requires `UseAVX > 0` to enforce that floating-point min/max computations are matched by specialized implementations in x64 (see [x86.ad](https://github.com/openjdk/jdk/blob/2a18e537d60c88c015bea738764eef2ca610abf1/src/hotspot/cpu/x86/x86.ad#L1539)). This changeset guards the test file with an `avx.*` CPU feature instead, which is more robust because it reflects the final AVX configuration after all flags are processed (including flags such as `UseSSE` that might affect the final AVX configuration). > > #### Testing > > - `TestFpMinMaxReductions.java` in tier1-5 (windows-x64, linux-x64, and macosx-x64). > - `TestFpMinMaxReductions.java` with all possible combinations of `UseAVX` and `UseSSE`. This pull request has now been integrated. Changeset: 5a0a238f Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/5a0a238f67ae2a7757611881c5c713149cefe3c0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8308746: C2 IR test failures for TestFpMinMaxReductions.java with SSE2 Co-authored-by: Jatin Bhateja Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14141 From jsjolen at openjdk.org Thu May 25 12:29:12 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 25 May 2023 12:29:12 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ [v2] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Style fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14008/files - new: https://git.openjdk.org/jdk/pull/14008/files/46653faf..393d84f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14008&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14008&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14008/head:pull/14008 PR: https://git.openjdk.org/jdk/pull/14008 From jsjolen at openjdk.org Thu May 25 12:29:12 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 25 May 2023 12:29:12 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Fixed! Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1562814781 From duke at openjdk.org Thu May 25 14:37:10 2023 From: duke at openjdk.org (duke) Date: Thu, 25 May 2023 14:37:10 GMT Subject: Withdrawn: JDK-8304546: CompileTask::_directive leaked if CompileBroker::invoke_compiler_on_method not called In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 20:48:28 GMT, Justin King wrote: > Ensure `CompileTask::_directive` is not leaked when `CompileBroker::invoke_compiler_on_method` is not called. This can happen for stale tasks or when compilation is disabled. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13108 From duke at openjdk.org Thu May 25 14:45:21 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 25 May 2023 14:45:21 GMT Subject: RFR: 8308657: ReplayInline is not availabe in production build Message-ID: DumpInline functionality is available in product build but ReplayInline is available in non product build only. This patch makes ReplayInline functionality available in product builds by moving it out of "#ifndef PRODUCT" directive. ------------- Commit messages: - 8308657: ReplayInline is not availabe in production build Changes: https://git.openjdk.org/jdk/pull/14152/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14152&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308657 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14152.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14152/head:pull/14152 PR: https://git.openjdk.org/jdk/pull/14152 From never at openjdk.org Thu May 25 15:17:57 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 25 May 2023 15:17:57 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast In-Reply-To: References: Message-ID: On Wed, 24 May 2023 12:31:37 GMT, Roland Westrelin wrote: > At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. Thanks for the fix. I did wonder why the type checks themselves weren't properly folding but didn't follow it through all the way. I think it would be worth including an assert or guarantee in type_check_receiver that it never injects top into the graph. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14123#issuecomment-1563086526 From kvn at openjdk.org Thu May 25 15:54:56 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 15:54:56 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast In-Reply-To: References: Message-ID: On Wed, 24 May 2023 12:31:37 GMT, Roland Westrelin wrote: > At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. In addition to normal mach5 testing we need to run our stress testing too for this changes to run with `StressReflectiveCode` on. src/hotspot/share/opto/graphKit.cpp line 3525: > 3523: if (!StressReflectiveCode && inst_klass != nullptr) { > 3524: bool xklass = inst_klass->klass_is_exact(); > 3525: if (xklass || (inst_klass->isa_aryklassptr() && inst_klass->is_aryklassptr()->elem() != Type::BOTTOM)) { Can you rename `inst_klass` to `klass_ptr` because it is general klass. It is weird to see a check like `inst_klass->isa_aryklassptr()`. ------------- PR Review: https://git.openjdk.org/jdk/pull/14123#pullrequestreview-1444202237 PR Review Comment: https://git.openjdk.org/jdk/pull/14123#discussion_r1205715640 From roland at openjdk.org Thu May 25 16:06:37 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 May 2023 16:06:37 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast [v2] In-Reply-To: References: Message-ID: <4WdyJd3uDOmRybf4NSLPOplX6gkBGiMMWeu1nLCIX8Q=.bf9c1938-25d7-4703-82df-9bc4f8aa522f@github.com> > At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14123/files - new: https://git.openjdk.org/jdk/pull/14123/files/a886d516..75de9a72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14123&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14123&range=00-01 Stats: 8 lines in 1 file changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14123.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14123/head:pull/14123 PR: https://git.openjdk.org/jdk/pull/14123 From roland at openjdk.org Thu May 25 16:06:39 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 May 2023 16:06:39 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast In-Reply-To: References: Message-ID: On Thu, 25 May 2023 15:15:33 GMT, Tom Rodriguez wrote: > I think it would be worth including an assert or guarantee in type_check_receiver that it never injects top into the graph. Right. I included one in the new commit I just pushed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14123#issuecomment-1563152232 From roland at openjdk.org Thu May 25 16:06:43 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 May 2023 16:06:43 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 15:48:39 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/graphKit.cpp line 3525: > >> 3523: if (!StressReflectiveCode && inst_klass != nullptr) { >> 3524: bool xklass = inst_klass->klass_is_exact(); >> 3525: if (xklass || (inst_klass->isa_aryklassptr() && inst_klass->is_aryklassptr()->elem() != Type::BOTTOM)) { > > Can you rename `inst_klass` to `klass_ptr` because it is general klass. It is weird to see a check like `inst_klass->isa_aryklassptr()`. I pushed a new commit that includes the rename. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14123#discussion_r1205731094 From kvn at openjdk.org Thu May 25 16:07:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 16:07:03 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Thu, 25 May 2023 12:23:27 GMT, Johan Sj?len wrote: > Fixed! Thank you. May be I was not clear. I asked to add missing `{}` and correct spacing: `if (cond) {` instead of current `if( cond )` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1563157255 From kvn at openjdk.org Thu May 25 16:12:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 16:12:55 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast [v2] In-Reply-To: <4WdyJd3uDOmRybf4NSLPOplX6gkBGiMMWeu1nLCIX8Q=.bf9c1938-25d7-4703-82df-9bc4f8aa522f@github.com> References: <4WdyJd3uDOmRybf4NSLPOplX6gkBGiMMWeu1nLCIX8Q=.bf9c1938-25d7-4703-82df-9bc4f8aa522f@github.com> Message-ID: On Thu, 25 May 2023 16:06:37 GMT, Roland Westrelin wrote: >> At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14123#pullrequestreview-1444241058 From kvn at openjdk.org Thu May 25 16:25:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 16:25:57 GMT Subject: RFR: 8308657: ReplayInline is not availabe in production build In-Reply-To: References: Message-ID: <5XWofqVzGoAQgfrut6fpmlb2_5ROvE5KP4VLw5LjzVs=.31acb360-f5fc-42ce-8e9a-bbbfd941b3bb@github.com> On Thu, 25 May 2023 14:37:54 GMT, Ashutosh Mehra wrote: > DumpInline functionality is available in product build but ReplayInline is available in non product build only. This patch makes ReplayInline functionality available in product builds by moving it out of "#ifndef PRODUCT" directive. Good. We have already `InlineDataFile` as product flag. And `DumpInline` compile command option is available in product too. So we missed only this place. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14152#pullrequestreview-1444263286 From never at openjdk.org Thu May 25 16:30:05 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 25 May 2023 16:30:05 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast [v2] In-Reply-To: <4WdyJd3uDOmRybf4NSLPOplX6gkBGiMMWeu1nLCIX8Q=.bf9c1938-25d7-4703-82df-9bc4f8aa522f@github.com> References: <4WdyJd3uDOmRybf4NSLPOplX6gkBGiMMWeu1nLCIX8Q=.bf9c1938-25d7-4703-82df-9bc4f8aa522f@github.com> Message-ID: On Thu, 25 May 2023 16:06:37 GMT, Roland Westrelin wrote: >> At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14123#pullrequestreview-1444269301 From dnsimon at openjdk.org Thu May 25 16:30:12 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 25 May 2023 16:30:12 GMT Subject: RFR: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations [v2] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 09:24:54 GMT, Doug Simon wrote: >> This PRs adds JVMCI API to reflect the fact that [deferred locals are not supported on virtual threads](https://bugs.openjdk.org/browse/JDK-8307125?focusedCommentId=14578728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578728). > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - remove MaterializeVirtualObjectTest.java from ProblemList-Virtual.txt > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8307125 > - materializing frames on virtual threads is not supported Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13777#issuecomment-1563186110 From dnsimon at openjdk.org Thu May 25 16:30:14 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 25 May 2023 16:30:14 GMT Subject: Integrated: 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations In-Reply-To: References: Message-ID: On Wed, 3 May 2023 12:43:29 GMT, Doug Simon wrote: > This PRs adds JVMCI API to reflect the fact that [deferred locals are not supported on virtual threads](https://bugs.openjdk.org/browse/JDK-8307125?focusedCommentId=14578728&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14578728). This pull request has now been integrated. Changeset: 89b3c375 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/89b3c375ac55f960dbeac8a2355e528450e610a1 Stats: 49 lines in 7 files changed: 39 ins; 7 del; 3 mod 8307125: compiler/jvmci/compilerToVM/MaterializeVirtualObjectTest.java hits assert(!Continuation::is_frame_in_continuation(thread(), fr())) failed: No support for deferred values in continuations Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/13777 From chagedorn at openjdk.org Thu May 25 16:57:28 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 May 2023 16:57:28 GMT Subject: RFR: 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without a LoadRangeNode Message-ID: [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. Thanks, Christian ------------- Commit messages: - 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without LoadRangeNodes Changes: https://git.openjdk.org/jdk/pull/14156/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14156&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307683 Stats: 132 lines in 2 files changed: 127 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14156.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14156/head:pull/14156 PR: https://git.openjdk.org/jdk/pull/14156 From jsjolen at openjdk.org Thu May 25 17:12:00 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 25 May 2023 17:12:00 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ [v2] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 12:29:12 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Style fixes > /Users/runner/work/jdk/jdk/test/jdk/java/foreign/TestHFA.java:53: error: cannot find symbol static final OfFloat FLOAT = JAVA_FLOAT.withBitAlignment(32); Error doesn't seem relevant to this change (has to do with vector incubator). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1563240340 From jsjolen at openjdk.org Thu May 25 17:18:23 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 25 May 2023 17:18:23 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ [v3] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - One more! - Fix the rest of the style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14008/files - new: https://git.openjdk.org/jdk/pull/14008/files/393d84f4..39cdb84f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14008&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14008&range=01-02 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14008.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14008/head:pull/14008 PR: https://git.openjdk.org/jdk/pull/14008 From jsjolen at openjdk.org Thu May 25 17:18:23 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 25 May 2023 17:18:23 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: On Thu, 25 May 2023 16:04:13 GMT, Vladimir Kozlov wrote: > > Fixed! Thank you. > > May be I was not clear. I asked to add missing `{}` and correct spacing: `if (cond) {` instead of current `if( cond )` Aha, I only saw the spaces issues (and not even all of those). Please have a look now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1563244008 From sviswanathan at openjdk.org Thu May 25 17:30:55 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 25 May 2023 17:30:55 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: <_Pm4QfF6KyuXOqiFqcxuHuBwdrY6RZggOIC7GlSNkRs=.20e78329-a7b0-4b2d-a8a5-2c9afc63067a@github.com> References: <_Pm4QfF6KyuXOqiFqcxuHuBwdrY6RZggOIC7GlSNkRs=.20e78329-a7b0-4b2d-a8a5-2c9afc63067a@github.com> Message-ID: On Thu, 25 May 2023 06:40:56 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> change to superword_max_vector_size > > @sviswa7 Thanks for taking care of this. Looks good, but let me run testing at commit 4. I will report back. Thanks a lot @eme64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1563262416 From kvn at openjdk.org Thu May 25 17:49:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 17:49:03 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ [v3] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 17:18:23 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - One more! > - Fix the rest of the style issues Looks good now. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14008#pullrequestreview-1444387228 From duke at openjdk.org Thu May 25 18:10:55 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 25 May 2023 18:10:55 GMT Subject: RFR: 8308672: Add version number in the replay file generated by DumpInline In-Reply-To: References: Message-ID: On Wed, 24 May 2023 19:21:27 GMT, Vladimir Kozlov wrote: >> Please review this PR that adds version to the replay file generated by DumpInline. > > Looks good. @vnkozlov would you please sponsor it as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14131#issuecomment-1563307986 From duke at openjdk.org Thu May 25 18:35:08 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Thu, 25 May 2023 18:35:08 GMT Subject: Integrated: 8308672: Add version number in the replay file generated by DumpInline In-Reply-To: References: Message-ID: On Wed, 24 May 2023 18:27:26 GMT, Ashutosh Mehra wrote: > Please review this PR that adds version to the replay file generated by DumpInline. This pull request has now been integrated. Changeset: 7d2a7ce2 Author: Ashutosh Mehra Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/7d2a7ce2401bdacbfa084a502077ec98ecdcba33 Stats: 31 lines in 3 files changed: 10 ins; 5 del; 16 mod 8308672: Add version number in the replay file generated by DumpInline Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/14131 From iklam at openjdk.org Thu May 25 20:56:01 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 25 May 2023 20:56:01 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag Message-ID: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java CSR is not needed because this is a diagnostic VM option. ------------- Commit messages: - 8308906: Make CIPrintCompilerName a diagnostic flag Changes: https://git.openjdk.org/jdk/pull/14161/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14161&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308906 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14161/head:pull/14161 PR: https://git.openjdk.org/jdk/pull/14161 From Divino.Cesar at microsoft.com Thu May 25 21:04:29 2023 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Thu, 25 May 2023 21:04:29 +0000 Subject: Update on PEA in C2 (Episode 3) In-Reply-To: <2886E2C4-6D99-4F83-83A7-2C5C8B0922F1@amazon.com> References: <2886E2C4-6D99-4F83-83A7-2C5C8B0922F1@amazon.com> Message-ID: > > Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) > Yes, we intercept Parse::do_new() and increment the counter if we register the Object idx to the allocation state. > > > ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? > This counter is the number of materializations. One object may be materialized multiple times in different branches. Got it, thanks. > what you are going to do in your case? In my case nothing happens because the object escapes and also there is no Phi. From: Liu, Xin Date: Thursday, May 18, 2023 at 2:58 PM To: Cesar Soares Lucas , hotspot-compiler-dev at openjdk.java.net Subject: Re: Update on PEA in C2 (Episode 3) Hi, Cesar, > Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) Yes, we intercept Parse::do_new() and increment the counter if we register the Object idx to the allocation state. > ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? This counter is the number of materializations. One object may be materialized multiple times in different branches. Eg. We track one object, but num materializations = 3. Object o = new Object; If(a) escape(o); Else if (b) escape(o); Ese escaped (o); > Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? We are not tracking this case. Your example is very similar the ArgEscape case in the microbenchark. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnavyxliu%2Fjdk%2Fpull%2F36%2Ffiles%23diff-c96245c7aa8950a261e64f01570331420bf00e76ba1861130f7381458b345f33R76&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CyV3o7BDLxn6jB%2BbaYabZ0lFF0FN%2BmFy4eFOHCYs56Q%3D&reserved=0 what you are going to do in your case? Our scheme is like this. In the nutshell, our PEA materialization splits the lifecycle of an object. Let's say there's an object which will be marked 'Escape' by C2 EA. PEA keeps cloning this object at escaping points in flow-sensitive way. After parse, the original object becomes certainly NonEscaped anymore from the perspective of C2 EA. Point p = new Point(?); // NonEscaped if (?.) { Point p' = materialize(p); method(p'); } We don't clean it up. We just leave this to C2 Optimizer. There are 3 cases: 1. the object is useless. Removed by C2 optimizer. Like this case. 2. As long as the NonEscaped object is 'unque typing', Scalar Replacement will process it. 3. it's NSR. I wish I could leverage your work on this case. Thanks, --lx From: Cesar Soares Lucas Date: Thursday, May 18, 2023 at 10:40 AM To: "Liu, Xin" , "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL]Update on PEA in C2 (Episode 3) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, Xin Liu. Thank you for working on this. I?m glad to see the progress. > PEA: num allocations tracked = 24741, num materializations = 16037 Can you give more details on what these numbers are? Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) and ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? If that happens often perhaps it?s a low hanging fruit that you could pursue instead of the general PEA problem. I.e., something like this: Point p = new Point(?); if (?.) { method(p); } Thanks, Cesar From: hotspot-compiler-dev on behalf of Liu, Xin Date: Wednesday, May 17, 2023 at 4:48 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Update on PEA in C2 (Episode 3) Hi, I would like to update what we have done in C2 PEA. We manage to compile java.base module with PEA and inliner. It contains 7, 442 classes and 62,210 methods. Here are the number of objects we track and materialize. PEA: num allocations tracked = 24741, num materializations = 16037 We also CTW jdk.compiler and java.compiler modules. No compilation error is found. We fixed those compiler errors mainly by correcting allocation state. We verified behavior with one microbenchmark that we ported to JMH. It shows the allocation rate drops as expected. Because PEA is flow-sensitive, it can allocate on demand. The allocate rate reduces 75% when the object has 25% chance to escape (odd = 4); reduce to 1/8 when the object has only 12.5% chance to escape. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnavyxliu%2Fjdk%2Fpull%2F36&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZWWPdaHYa7o09UATSh3tmXm38zFE31sLER5r%2BahDNP8%3D&reserved=0 Remaining problems: 1. In order to curb complexity, we disable passive materialization for time being. Passive materialization takes place only at a merging point because any of predecessor has already materialized the object. We prove that it is still correct to skip passive materialization. The downside is that we may have partial redundant allocation because C2 can't guarantee to eliminate the original object now. Currently, JDK-8287061 is working on this problem. The patch unravels 'reducible phi nodes' and then the original AllocateNodes are eliminated by ScalarReplacement. More details can be found here. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fnavyxliu%2F6239ce24f1ae447060302cc8562cbb71%3Fpermalink_comment_id%3D4520588%23gistcomment-4520588&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OfOiMyPNNdLeWNmoFeuvWqsbyIkN1yEtLsrXHWK88Ts%3D&reserved=0 If JDK-8287061 processes all reducible phi nodes, PEA will have synergy effect with it. Our design goal is to punt complex jobs to C2 optimizer. If PEA introduces severe performance problem, we will revisit 'passive materialization'. 2. There are still 400+ runtime errors when we try to run hotspot:tier1 tests. Most of them are from javac. here is what we have so far. $make test TEST="hotspot:tier1" CONF=linux-x86_64-server-fastdebug JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+DoPartialEscapeAnalysis" Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2150 1717 145 288 << ============================== Our understanding is that PEA can't guarantee to replace all the old objects with the new objects in the debug sections of GraphKit::add_safepoint_edges(). If deoptimization happens, runtime will rematerialize objects based on the wrong debuginfo. We end up wrong objects then. Our next goal to fix those runtime errors. We post a draft PR for curious audiences. We will port those tests to jtreg once we fix tier1 tests. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fpull%2F14041&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7ukU3FbfcQK8rUZjLS4y%2BUnfSH6tCE0piT82IlV4n3c%3D&reserved=0 thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From kvn at openjdk.org Thu May 25 21:05:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 21:05:55 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag In-Reply-To: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: On Thu, 25 May 2023 20:36:42 GMT, Ioi Lam wrote: > Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: > > > java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java > > > CSR is not needed because this is a diagnostic VM option. Please, also fix name duplication in output: C1: 79 C1: 1 3 java.lang.Object:: (1 bytes) ------------- PR Review: https://git.openjdk.org/jdk/pull/14161#pullrequestreview-1444647175 From xxinliu at amazon.com Thu May 25 21:28:32 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 25 May 2023 21:28:32 +0000 Subject: Update on PEA in C2 (Episode 3) In-Reply-To: References: <2886E2C4-6D99-4F83-83A7-2C5C8B0922F1@amazon.com> Message-ID: <18BBE3EC-A9DD-4460-9F7F-A19C0A044BCA@amazon.com> Hi, Cesar, Doing nothing is harder for me than doing something. I am in Parse. C2 Parse Is traversing basic blocks . In your example, I just encounter the escaping point at an invoke bytecode in B1. I haven?t completed B1 yet, let alone B2. Therefore, it?s really hard to predicate how we use ?p? here in the future. If we want to complete PEA in one pass, we have to take action right away. [B0] Point p = new Point(?); if (?.) { [B1] method(p); } [B2] There?s remedy for this. if we enforce the paper?s algorithm, we will do passive materialization at merging point. That will cause the original object ?p? dead at the 1st merging point. Thanks, --lx From: Cesar Soares Lucas Date: Thursday, May 25, 2023 at 2:04 PM To: "Liu, Xin" , "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL]Update on PEA in C2 (Episode 3) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) > Yes, we intercept Parse::do_new() and increment the counter if we register the Object idx to the allocation state. > > > ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? > This counter is the number of materializations. One object may be materialized multiple times in different branches. Got it, thanks. > what you are going to do in your case? In my case nothing happens because the object escapes and also there is no Phi. From: Liu, Xin Date: Thursday, May 18, 2023 at 2:58 PM To: Cesar Soares Lucas , hotspot-compiler-dev at openjdk.java.net Subject: Re: Update on PEA in C2 (Episode 3) Hi, Cesar, > Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) Yes, we intercept Parse::do_new() and increment the counter if we register the Object idx to the allocation state. > ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? This counter is the number of materializations. One object may be materialized multiple times in different branches. Eg. We track one object, but num materializations = 3. Object o = new Object; If(a) escape(o); Else if (b) escape(o); Ese escaped (o); > Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? We are not tracking this case. Your example is very similar the ArgEscape case in the microbenchark. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnavyxliu%2Fjdk%2Fpull%2F36%2Ffiles%23diff-c96245c7aa8950a261e64f01570331420bf00e76ba1861130f7381458b345f33R76&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CyV3o7BDLxn6jB%2BbaYabZ0lFF0FN%2BmFy4eFOHCYs56Q%3D&reserved=0 what you are going to do in your case? Our scheme is like this. In the nutshell, our PEA materialization splits the lifecycle of an object. Let's say there's an object which will be marked 'Escape' by C2 EA. PEA keeps cloning this object at escaping points in flow-sensitive way. After parse, the original object becomes certainly NonEscaped anymore from the perspective of C2 EA. Point p = new Point(?); // NonEscaped if (?.) { Point p' = materialize(p); method(p'); } We don't clean it up. We just leave this to C2 Optimizer. There are 3 cases: 1. the object is useless. Removed by C2 optimizer. Like this case. 2. As long as the NonEscaped object is 'unque typing', Scalar Replacement will process it. 3. it's NSR. I wish I could leverage your work on this case. Thanks, --lx From: Cesar Soares Lucas Date: Thursday, May 18, 2023 at 10:40 AM To: "Liu, Xin" , "hotspot-compiler-dev at openjdk.java.net" Subject: RE: [EXTERNAL]Update on PEA in C2 (Episode 3) CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, Xin Liu. Thank you for working on this. I?m glad to see the progress. > PEA: num allocations tracked = 24741, num materializations = 16037 Can you give more details on what these numbers are? Is the ?num allocations tracked? all allocations happening in the methods (including no escape?) and ?num materializations? the number of (escaping) allocations that you had to rematerialize at least once? Do you by any chance have an idea of how many allocations escape inside a control block and aren?t used after escaping ? requiring no merge? If that happens often perhaps it?s a low hanging fruit that you could pursue instead of the general PEA problem. I.e., something like this: Point p = new Point(?); if (?.) { method(p); } Thanks, Cesar From: hotspot-compiler-dev on behalf of Liu, Xin Date: Wednesday, May 17, 2023 at 4:48 PM To: hotspot-compiler-dev at openjdk.java.net Subject: Update on PEA in C2 (Episode 3) Hi, I would like to update what we have done in C2 PEA. We manage to compile java.base module with PEA and inliner. It contains 7, 442 classes and 62,210 methods. Here are the number of objects we track and materialize. PEA: num allocations tracked = 24741, num materializations = 16037 We also CTW jdk.compiler and java.compiler modules. No compilation error is found. We fixed those compiler errors mainly by correcting allocation state. We verified behavior with one microbenchmark that we ported to JMH. It shows the allocation rate drops as expected. Because PEA is flow-sensitive, it can allocate on demand. The allocate rate reduces 75% when the object has 25% chance to escape (odd = 4); reduce to 1/8 when the object has only 12.5% chance to escape. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnavyxliu%2Fjdk%2Fpull%2F36&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZWWPdaHYa7o09UATSh3tmXm38zFE31sLER5r%2BahDNP8%3D&reserved=0 Remaining problems: 1. In order to curb complexity, we disable passive materialization for time being. Passive materialization takes place only at a merging point because any of predecessor has already materialized the object. We prove that it is still correct to skip passive materialization. The downside is that we may have partial redundant allocation because C2 can't guarantee to eliminate the original object now. Currently, JDK-8287061 is working on this problem. The patch unravels 'reducible phi nodes' and then the original AllocateNodes are eliminated by ScalarReplacement. More details can be found here. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fnavyxliu%2F6239ce24f1ae447060302cc8562cbb71%3Fpermalink_comment_id%3D4520588%23gistcomment-4520588&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OfOiMyPNNdLeWNmoFeuvWqsbyIkN1yEtLsrXHWK88Ts%3D&reserved=0 If JDK-8287061 processes all reducible phi nodes, PEA will have synergy effect with it. Our design goal is to punt complex jobs to C2 optimizer. If PEA introduces severe performance problem, we will revisit 'passive materialization'. 2. There are still 400+ runtime errors when we try to run hotspot:tier1 tests. Most of them are from javac. here is what we have so far. $make test TEST="hotspot:tier1" CONF=linux-x86_64-server-fastdebug JTREG="VM_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+DoPartialEscapeAnalysis" Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2150 1717 145 288 << ============================== Our understanding is that PEA can't guarantee to replace all the old objects with the new objects in the debug sections of GraphKit::add_safepoint_edges(). If deoptimization happens, runtime will rematerialize objects based on the wrong debuginfo. We end up wrong objects then. Our next goal to fix those runtime errors. We post a draft PR for curious audiences. We will port those tests to jtreg once we fix tier1 tests. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenjdk%2Fjdk%2Fpull%2F14041&data=05%7C01%7CDivino.Cesar%40microsoft.com%7Cb9c41a372dfe48f0b37708db57eaf63c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638200438900539346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7ukU3FbfcQK8rUZjLS4y%2BUnfSH6tCE0piT82IlV4n3c%3D&reserved=0 thanks, --lx -------------- next part -------------- An HTML attachment was scrubbed... URL: From cslucas at openjdk.org Thu May 25 22:54:15 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 25 May 2023 22:54:15 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Catching up with master branch. Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address PR review 6: refactoring around rematerialization & improve test cases. - Address PR review 5: refactor on rematerialization & add tests. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Fix tests. Remember previous reducible Phis. - Address PR review 3. Some comments and be able to abort compilation. - Merge with Master - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. - ... and 5 more: https://git.openjdk.org/jdk/compare/46c4da7f...8f81a7c8 ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=13 Stats: 2760 lines in 25 files changed: 2508 ins; 113 del; 139 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From vlivanov at openjdk.org Thu May 25 22:54:15 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 25 May 2023 22:54:15 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 22 May 2023 17:56:41 GMT, Cesar Soares Lucas wrote: >> Are you sure there's no way to end up with nested ObjectMergeValues in presence of iterative EA? > I don't think so. Ok. Please, add asserts to catch such situation and a check which bails out compilation (triggering recompilation w/ ReduceAllocationMerges turned off) if it happens with product binaries. > So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. Please, enhance `AllocationMergesTests` to cover deoptimization (e.g., using WhiteBox API or additional run w/ -XX:+DeoptimizeALot) and ensure that tests are sensitive enough to fail when wrong state is rematerialized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1559847061 PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1559852990 From kvn at openjdk.org Thu May 25 23:32:59 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 May 2023 23:32:59 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag In-Reply-To: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: On Thu, 25 May 2023 20:36:42 GMT, Ioi Lam wrote: > Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: > > > java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java > > > CSR is not needed because this is a diagnostic VM option. UL output is correct (no duplication): [0.055s][debug][jit,compilation] C1: 1 3 java.lang.Object:: (1 bytes) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14161#issuecomment-1563628628 From dzhang at openjdk.org Fri May 26 02:47:15 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Fri, 26 May 2023 02:47:15 GMT Subject: RFR: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 Message-ID: We have some macro assembler functions that use v0 hardcoded as a temporary register currently. However, the mask value used to control execution of a masked vector instruction is always supplied by vector register v0 in RVV1.0[1]. So if v0 is not used as a mask register in subsequent instructions, it is better to replace it with other vector registers to improve code execution efficiency. In addition, this pr also adds several missing spaces in the format of the instructions, and fixes several pipeline classes. [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc ## Testing: QEMU w/ UseRVV: - [x] Tier1 tests (release) - [ ] Tier2 tests (release) - [ ] Tier3 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 Changes: https://git.openjdk.org/jdk/pull/14166/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14166&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308915 Stats: 124 lines in 3 files changed: 61 ins; 0 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/14166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14166/head:pull/14166 PR: https://git.openjdk.org/jdk/pull/14166 From iklam at openjdk.org Fri May 26 04:37:56 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 26 May 2023 04:37:56 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag [v2] In-Reply-To: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: > Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: > > > java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java > > > CSR is not needed because this is a diagnostic VM option. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Removed duplicated compiler name from -XX:+PrintCompilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14161/files - new: https://git.openjdk.org/jdk/pull/14161/files/86fbe398..7a4b8fcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14161&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14161&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14161.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14161/head:pull/14161 PR: https://git.openjdk.org/jdk/pull/14161 From iklam at openjdk.org Fri May 26 04:37:58 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 26 May 2023 04:37:58 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag [v2] In-Reply-To: References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: On Thu, 25 May 2023 21:02:48 GMT, Vladimir Kozlov wrote: > Please, also fix name duplication in output: > > ``` > C1: 79 C1: 1 3 java.lang.Object:: (1 bytes) > ``` Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14161#issuecomment-1563796748 From stuefe at openjdk.org Fri May 26 04:59:54 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 May 2023 04:59:54 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag [v2] In-Reply-To: References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: On Fri, 26 May 2023 04:37:56 GMT, Ioi Lam wrote: >> Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: >> >> >> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java >> >> >> CSR is not needed because this is a diagnostic VM option. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Removed duplicated compiler name from -XX:+PrintCompilation That is very useful. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14161#pullrequestreview-1445061779 From epeter at openjdk.org Fri May 26 05:02:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 May 2023 05:02:11 GMT Subject: RFR: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit Message-ID: In SuperWord::output we create a CountedLoopReserveKit, so that we can reverse edits to the loop, in case something goes wrong. As far as I understand all of these conditions should never occur, prior condition checking in SuperWord should have already verified that. We should at least add asserts so that we can catch such failures and fix them, and do not just silently bail out of SuperWord (reverse the graph to before SuperWord and continue compilation). `DoReserveCopyInSuperWord` enables `do_reserve_copy()`. It is a product flag and default true. If it is disabled, and there is such a failure we just hit a `ShouldNotReachHere()`. **Testing** TODO testing up to tier6 plus stress testing. (it already passed tier3 and stress testing) **Discussion** Do we really want to keep the `DoReserveCopyInSuperWord` flag (product, always true), which enables the use of `CountedLoopReserveKit`? It means that we always duplicate the loop (and the loops can be rather large because they were unrolled before SuperWord). It seems a bit of an edge case to want to bail out of SuperWord, but not of the whole compilation. We can later decide if it makes sense to clone the whole loop via CountedLoopReserveKit (the loops can be large!), or if we should just have a regular compilation bailout instead (could simplify the code and reduce overhead of loop cloning). Plus: it seems the checks and bailouts are very selectively applied. I don't see why we would nullptr check some "vector_opd" but not all of them. So if we decide to keep it, we should probably apply it more consistently. What do you think? ------------- Commit messages: - 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit Changes: https://git.openjdk.org/jdk/pull/14168/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14168&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308917 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14168.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14168/head:pull/14168 PR: https://git.openjdk.org/jdk/pull/14168 From jkarthikeyan at openjdk.org Fri May 26 05:49:15 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 26 May 2023 05:49:15 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v9] In-Reply-To: References: Message-ID: > Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: > > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% > Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% > Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) > Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% > Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% > > Reviews would be greatly appreciated! > > Testing: tier1-2 on linux x64, GHA Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'master' into conv2b-x86-lowering - Fix assertion from not checking int type - Cleanup from code review - Changes from code review - Merge branch 'master' into conv2b-x86-lowering - Whitespace tweak - Make transform conditional - Remove Conv2B from backend as it's macro expanded now - Re-work transform to happen in macro expansion - Fix whitespace and add bug tag to IR test - ... and 5 more: https://git.openjdk.org/jdk/compare/31683722...65e841f3 ------------- Changes: https://git.openjdk.org/jdk/pull/13345/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13345&range=08 Stats: 410 lines in 13 files changed: 258 ins; 133 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13345.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13345/head:pull/13345 PR: https://git.openjdk.org/jdk/pull/13345 From jkarthikeyan at openjdk.org Fri May 26 05:49:16 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 26 May 2023 05:49:16 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v8] In-Reply-To: References: Message-ID: <_RDrB44pp68YFIbW1Y9mhvbQqrEPSv37ONa9bsodZoE=.2871f1c4-c61a-48a5-90fb-34e965dac938@github.com> On Tue, 23 May 2023 07:09:37 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup from code review I think the latest change should fix the error! The code now properly checks that both types are ints, and tier1 and `hotspot_vector_1` testing passes without errors for me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1563838600 From jkarthikeyan at openjdk.org Fri May 26 05:49:16 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 26 May 2023 05:49:16 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v8] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 01:20:56 GMT, Sandhya Viswanathan wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup from code review > > src/hotspot/share/opto/addnode.cpp line 902: > >> 900: return new CMoveINode(in1->in(CMoveNode::Condition), phase->intcon(l_val ^ in2_val), phase->intcon(r_val ^ in2_val), TypeInt::INT); >> 901: } >> 902: } > > An isa_int() check is needed before doing is_int()->get_con(). Something like below: > ``` > const TypeInt* in2type = phase->type(in2)->isa_int(); > const TypeInt* ltype = phase->type(in1->in(CMoveNode::IfFalse))->isa_int(); > const TypeInt* rtype = phase->type(in1->in(CMoveNode::IfTrue))->isa_int(); > > if (in2type && ltype && rtype) { > int in2_val = in2type->get_con(); > int l_val = ltype->get_con(); > int r_val = rtype->get_con(); > > if (cmp_op == Op_CmpI || cmp_op == Op_CmpP) { > return new CMoveINode(in1->in(CMoveNode::Condition), > phase->intcon(l_val ^ in2_val), > phase->intcon(r_val ^ in2_val), TypeInt::INT); > } > } > > This should fix the crash seen in vectorapi test. Ah yep, I missed that when writing the function, and it's a simple fix. Thanks for helping diagnose the issue! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13345#discussion_r1206262828 From fjiang at openjdk.org Fri May 26 06:41:53 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 26 May 2023 06:41:53 GMT Subject: RFR: 8308817: RISC-V: Support VectorTest node for Vector API In-Reply-To: References: Message-ID: On Thu, 25 May 2023 03:22:18 GMT, Gui Cao wrote: > Hi, > > we have added VectorTest node, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/Int256VectorTests_PrintOptoAssembly_20230525.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > Also here's a more concise test case, VectorTestDemo: > > > import jdk.incubator.vector.ByteVector; > import jdk.incubator.vector.VectorMask; > > public class VectorTestDemo { > static boolean[] d = new boolean[]{true, false, false, false, false, false, false, false}; > static VectorMask avmask = VectorMask.fromArray(ByteVector.SPECIES_64, d, 0); > > public static void main(String[] args) { > for (int i = 0; i < 300000; i++) { > > final boolean alltrue = alltrue(); > if (alltrue != false) { > throw new RuntimeException("alltrue"); > } > final boolean anytrue = anytrue(); > if (anytrue != true) { > throw new RuntimeException("anytrue"); > } > } > } > > public static boolean anytrue() { > return avmask.anyTrue(); > } > > public static boolean alltrue() { > return avmask.allTrue(); > } > } > > > We can compile `VectorTestDemo.java` using `javac --add-modules jdk.incubator.vector VectorTestDemo.java`, and use `./java -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseRVV -XX:+PrintOptoAssembly -XX:+LogCompilation -XX:LogFile=compile.log VectorTestDemo > aaa.log` to start the test case, we can observe the specified compilation log `compile.log`, which contains the VectorTest node for the PR implementation. > Some of the compilation logs of VectorTestDemo#anytrue method are as follows. > > 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (con... Looks good ------------- Marked as reviewed by fjiang (Author). PR Review: https://git.openjdk.org/jdk/pull/14138#pullrequestreview-1445273900 From roland at openjdk.org Fri May 26 07:07:06 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 07:07:06 GMT Subject: RFR: 8308583: SIGSEGV in GraphKit::gen_checkcast [v2] In-Reply-To: References: Message-ID: <7SpZ8_zR2fVdbG_P8n5AynBQ8shPuE5MSOf8IEY2aJ0=.2f148fe2-226b-4bc9-931f-b1c915cacf89@github.com> On Thu, 25 May 2023 09:32:40 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Looks good to me. @TobiHartmann @vnkozlov @tkrodriguez thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/14123#issuecomment-1563901571 From roland at openjdk.org Fri May 26 07:07:08 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 07:07:08 GMT Subject: Integrated: 8308583: SIGSEGV in GraphKit::gen_checkcast In-Reply-To: References: Message-ID: <2fmsXfQ1No7oo5le5uLvcy4wZRsvpKg8WDK3nfta7Sg=.f09dad06-668b-491f-b1cd-83700dc9ec56@github.com> On Wed, 24 May 2023 12:31:37 GMT, Roland Westrelin wrote: > At an `instanceof`, a node of type `bottom[int:>=0]` is checked to be of type `cc$Word` and on the success a `CheckCastPP` is inserted to change the node type. That `CheckCastPP` constant folds to `top` but the type check doesn't fold. The reason is that the type check loads the klass from the node with a `LoadNKlass` and the type of that node is `java/lang/Object` when it should be `bottom[int:>=0]` but logic in `LoadNKlass::Value()` gets in the way. This pull request has now been integrated. Changeset: 199b1bf5 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/199b1bf5009120efd1fd37a1ddabc0c6fb84f62c Stats: 110 lines in 3 files changed: 102 ins; 0 del; 8 mod 8308583: SIGSEGV in GraphKit::gen_checkcast Reviewed-by: thartmann, kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/14123 From thartmann at openjdk.org Fri May 26 07:20:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 May 2023 07:20:02 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v9] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 05:49:15 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into conv2b-x86-lowering > - Fix assertion from not checking int type > - Cleanup from code review > - Changes from code review > - Merge branch 'master' into conv2b-x86-lowering > - Whitespace tweak > - Make transform conditional > - Remove Conv2B from backend as it's macro expanded now > - Re-work transform to happen in macro expansion > - Fix whitespace and add bug tag to IR test > - ... and 5 more: https://git.openjdk.org/jdk/compare/31683722...65e841f3 Great work, the latest version looks good to me. I'll run some testing and report back once it finished. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13345#pullrequestreview-1445367111 From thartmann at openjdk.org Fri May 26 07:40:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 May 2023 07:40:10 GMT Subject: RFR: JDK-8304546: CompileTask::_directive leaked if CompileBroker::invoke_compiler_on_method not called In-Reply-To: References: Message-ID: On Tue, 28 Mar 2023 14:28:08 GMT, Justin King wrote: >> Ensure `CompileTask::_directive` is not leaked when `CompileBroker::invoke_compiler_on_method` is not called. This can happen for stale tasks or when compilation is disabled. > > Poke. @jcking Any plans to open this up again? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13108#issuecomment-1563943945 From yzhu at openjdk.org Fri May 26 08:14:00 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Fri, 26 May 2023 08:14:00 GMT Subject: RFR: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 In-Reply-To: References: Message-ID: <5B5lvQGq8Fce0oXLAdZEN58JwOc_5w8VR4nmVtWOtOY=.23498053-3092-4b25-89b9-aa0fed239b9a@github.com> On Fri, 26 May 2023 02:36:42 GMT, Dingli Zhang wrote: > We have some macro assembler functions that use v0 hardcoded as a temporary > register currently. > > However, the mask value used to control execution of a masked vector > instruction is always supplied by vector register v0 in RVV1.0[1]. If v0 is > alive holding a mask value the the same time, this will cause spilling of > this vector register. So it is better to replace v0 with other vector registers to > improve code execution efficiency. > > In addition, this pr also adds several missing spaces in the format of the > instructions, and fixes several pipeline classes. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > > ## Testing: > QEMU w/ UseRVV: > - [x] Tier1 tests (release) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) LGTM. ------------- Marked as reviewed by yzhu (Author). PR Review: https://git.openjdk.org/jdk/pull/14166#pullrequestreview-1445549586 From thartmann at openjdk.org Fri May 26 08:19:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 May 2023 08:19:57 GMT Subject: RFR: 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without a LoadRangeNode In-Reply-To: References: Message-ID: <0RDebqYP6nltBnvVTWlqu5zQ2UoyYQHfqBm7STemhzI=.8ffe64a2-2d17-48d0-b56d-ee0cdbc8f7bf@github.com> On Thu, 25 May 2023 16:48:35 GMT, Christian Hagedorn wrote: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Looks good to me. Scary, that we didn't find this earlier. test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java line 82: > 80: // to never executing iFld2++ (we removed the check and the branch with the trap). > 81: for (int i = -1; i < 1000; i++) { > 82: if (Integer.compareUnsigned(i, 100) < 0) { // Loop Predication creates a Hoisted Range Check Predicate due to trap with Float.isNan(). Maybe add a comment explaining that this is equivalent to `i `i >= 0 && i < 100` and we add a predicate for the else branch, i.e. for `i < 0 || i >= 100`, and remove the if branch. test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java line 103: > 101: static void testCalendar2(boolean flag) { > 102: > 103: flag = !flag; Suggestion: flag = !flag; ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14156#pullrequestreview-1445458846 PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1206393211 PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1206357502 From roland at openjdk.org Fri May 26 08:29:59 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 08:29:59 GMT Subject: RFR: 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without a LoadRangeNode In-Reply-To: References: Message-ID: On Thu, 25 May 2023 16:48:35 GMT, Christian Hagedorn wrote: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Looks good to me. Thanks for the quick fix. Do we want to file a bug to revisit this in a correct way? Do we want to file a bug to investigate whether this can be done in a correct way? src/hotspot/share/opto/loopPredicate.cpp line 856: > 854: if (range->Opcode() != Op_LoadRange) { > 855: const TypeInteger* tinteger = phase->_igvn.type(range)->isa_integer(bt); > 856: if (!iff->is_RangeCheck() || tinteger == nullptr || tinteger->empty() || tinteger->lo_as_long() < 0) { Can the problem not happen with a LoadRange as second input? Couldn't the LoadRange constant fold and then the predicates could constant fold as well? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14156#pullrequestreview-1445589807 PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1564007121 PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1206417031 From thartmann at openjdk.org Fri May 26 08:31:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 May 2023 08:31:59 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v4] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:32:01 GMT, Roland Westrelin wrote: >> pre/main/post loops are created for an inner loop of a loop nest but >> assert predicates cause the main and post loops to be removed. The >> OpaqueZeroTripGuard nodes for the loops are not removed: there's no >> logic to trigger removal of the opaque nodes once the loops are no >> longer there. With the inner loops gone, the outer loop becomes >> candidate for optimizations and is unrolled which causes the zero trip >> guards of the now removed loops to be duplicated and the opaque nodes >> to have more than one use. >> >> The fix I propose is, using logic similar to >> `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop >> opts if every OpaqueZeroTripGuard node guards a loop and if not, >> remove it. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - review Looks reasonable to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13901#pullrequestreview-1445598078 From roland at openjdk.org Fri May 26 08:35:56 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 08:35:56 GMT Subject: RFR: 8308657: ReplayInline is not availabe in production build In-Reply-To: References: Message-ID: On Thu, 25 May 2023 14:37:54 GMT, Ashutosh Mehra wrote: > DumpInline functionality is available in product build but ReplayInline is available in non product build only. This patch makes ReplayInline functionality available in product builds by moving it out of "#ifndef PRODUCT" directive. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14152#pullrequestreview-1445608308 From jsjolen at openjdk.org Fri May 26 08:47:10 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 May 2023 08:47:10 GMT Subject: RFR: 8299974: Replace NULL with nullptr in share/adlc/ [v3] In-Reply-To: References: Message-ID: On Thu, 25 May 2023 17:18:23 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - One more! > - Fix the rest of the style issues I'm integrating this. Only test failure on GHA seems entirely unrelated to this changeset. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14008#issuecomment-1564025696 From jsjolen at openjdk.org Fri May 26 08:47:11 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 26 May 2023 08:47:11 GMT Subject: Integrated: 8299974: Replace NULL with nullptr in share/adlc/ In-Reply-To: References: Message-ID: <4SLlsCS-p1dHZFGKPheoLjOfrCyjvHYFR5K9xjlQSkw=.401b652d-3f70-4ae0-91ea-f563091ff51b@github.com> On Tue, 16 May 2023 11:54:20 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/adlc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 62537d20 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/62537d200f01d58ff1c236f31f71c5839316db9e Stats: 1270 lines in 19 files changed: 4 ins; 0 del; 1266 mod 8299974: Replace NULL with nullptr in share/adlc/ Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14008 From roland at openjdk.org Fri May 26 09:06:06 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 09:06:06 GMT Subject: RFR: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" [v2] In-Reply-To: References: <0NfM51lje5HXS9Exo4CyNQldOhogABBuJazmEsFuDy0=.2517d655-8824-4adb-a35f-18b38c5fa938@github.com> Message-ID: On Mon, 15 May 2023 13:39:12 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > I see the following failure with `TestMissingMulLOptimization` from JDK-8299546 and `-XX:StressLongCountedLoop=2000000`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (c:\sb\prod\1684151031\workspace\open\src\hotspot\share\opto\loopnode.cpp:4157), pid=5368, tid=836 > # Error: assert(loop == nullptr) failed > > Current CompileTask: > C2: 267 15 b 4 compiler.ccp.TestMissingMulLOptimization::test (101 bytes) > > Stack: [0x0000002f4f600000,0x0000002f4f700000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0xc53091] os::win32::platform_print_native_stack+0xf1 (os_windows_x86.cpp:236) > V [jvm.dll+0xee2a99] VMError::report+0x1019 (vmError.cpp:815) > V [jvm.dll+0xee4775] VMError::report_and_die+0x645 (vmError.cpp:1596) > V [jvm.dll+0xee4e84] VMError::report_and_die+0x64 (vmError.cpp:1361) > V [jvm.dll+0x55053b] report_vm_error+0x5b (debug.cpp:191) > V [jvm.dll+0xadcab2] PhaseIdealLoop::eliminate_useless_zero_trip_guard+0x2f2 (loopnode.cpp:4157) > V [jvm.dll+0xad0fb1] PhaseIdealLoop::build_and_optimize+0x971 (loopnode.cpp:4455) > V [jvm.dll+0x4ebc51] Compile::optimize_loops+0x1d1 (compile.cpp:2155) > V [jvm.dll+0x4de2e8] Compile::Optimize+0xef8 (compile.cpp:2391) > V [jvm.dll+0x4db378] Compile::Compile+0x1458 (compile.cpp:840) > V [jvm.dll+0x3f05ba] C2Compiler::compile_method+0x11a (c2compiler.cpp:121) > V [jvm.dll+0x4f6a81] CompileBroker::invoke_compiler_on_method+0x881 (compileBroker.cpp:2268) > V [jvm.dll+0x4f3ea6] CompileBroker::compiler_thread_loop+0x396 (compileBroker.cpp:1945) > V [jvm.dll+0x7f2ff9] JavaThread::thread_main_inner+0x279 (javaThread.cpp:720) > V [jvm.dll+0xe5434d] Thread::call_run+0x1cd (thread.cpp:222) > V [jvm.dll+0xc519c2] os::win32::thread_native_entry+0xa2 (os_windows.cpp:551) @TobiHartmann thanks for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/13901#issuecomment-1564060406 From roland at openjdk.org Fri May 26 09:09:07 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 26 May 2023 09:09:07 GMT Subject: Integrated: 8305189: C2 failed "assert(_outcnt==1) failed: not unique" In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:27:06 GMT, Roland Westrelin wrote: > pre/main/post loops are created for an inner loop of a loop nest but > assert predicates cause the main and post loops to be removed. The > OpaqueZeroTripGuard nodes for the loops are not removed: there's no > logic to trigger removal of the opaque nodes once the loops are no > longer there. With the inner loops gone, the outer loop becomes > candidate for optimizations and is unrolled which causes the zero trip > guards of the now removed loops to be duplicated and the opaque nodes > to have more than one use. > > The fix I propose is, using logic similar to > `PhaseIdealLoop::eliminate_useless_predicates()`, to check during loop > opts if every OpaqueZeroTripGuard node guards a loop and if not, > remove it. This pull request has now been integrated. Changeset: bac02b6e Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/bac02b6e9d9e1e93db27c7888188f29631e07f47 Stats: 154 lines in 5 files changed: 154 ins; 0 del; 0 mod 8305189: C2 failed "assert(_outcnt==1) failed: not unique" Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13901 From fyang at openjdk.org Fri May 26 09:42:55 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 May 2023 09:42:55 GMT Subject: RFR: 8308817: RISC-V: Support VectorTest node for Vector API In-Reply-To: References: Message-ID: <4LCdEWiPy5S_XKUWnRnufZHXUG_Jl2DRQKx3ivPBV7I=.c158359f-4e6d-46f5-b4f0-c5d6695d77ed@github.com> On Thu, 25 May 2023 03:22:18 GMT, Gui Cao wrote: > Hi, > > we have added VectorTest node, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/Int256VectorTests_PrintOptoAssembly_20230525.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > Also here's a more concise test case, VectorTestDemo: > > > import jdk.incubator.vector.ByteVector; > import jdk.incubator.vector.VectorMask; > > public class VectorTestDemo { > static boolean[] d = new boolean[]{true, false, false, false, false, false, false, false}; > static VectorMask avmask = VectorMask.fromArray(ByteVector.SPECIES_64, d, 0); > > public static void main(String[] args) { > for (int i = 0; i < 300000; i++) { > > final boolean alltrue = alltrue(); > if (alltrue != false) { > throw new RuntimeException("alltrue"); > } > final boolean anytrue = anytrue(); > if (anytrue != true) { > throw new RuntimeException("anytrue"); > } > } > } > > public static boolean anytrue() { > return avmask.anyTrue(); > } > > public static boolean alltrue() { > return avmask.allTrue(); > } > } > > > We can compile `VectorTestDemo.java` using `javac --add-modules jdk.incubator.vector VectorTestDemo.java`, and use `./java -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseRVV -XX:+PrintOptoAssembly -XX:+LogCompilation -XX:LogFile=compile.log VectorTestDemo > aaa.log` to start the test case, we can observe the specified compilation log `compile.log`, which contains the VectorTest node for the PR implementation. > Some of the compilation logs of VectorTestDemo#anytrue method are as follows. > > 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (con... Looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14138#pullrequestreview-1445818459 From fyang at openjdk.org Fri May 26 09:43:55 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 May 2023 09:43:55 GMT Subject: RFR: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 In-Reply-To: References: Message-ID: On Fri, 26 May 2023 02:36:42 GMT, Dingli Zhang wrote: > We have some macro assembler functions that use v0 hardcoded as a temporary > register currently. > > However, the mask value used to control execution of a masked vector > instruction is always supplied by vector register v0 in RVV1.0[1]. If v0 is > alive holding a mask value the the same time, this will cause spilling of > this vector register. So it is better to replace v0 with other vector registers to > improve code execution efficiency. > > In addition, this pr also adds several missing spaces in the format of the > instructions, and fixes several pipeline classes. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > > ## Testing: > QEMU w/ UseRVV: > - [x] Tier1 tests (release) > - [ ] Tier2 tests (release) > - [ ] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Looks fine. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14166#pullrequestreview-1445821108 PR Review: https://git.openjdk.org/jdk/pull/14166#pullrequestreview-1445821588 From epeter at openjdk.org Fri May 26 13:57:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 May 2023 13:57:04 GMT Subject: RFR: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit [v2] In-Reply-To: References: Message-ID: > In SuperWord::output we create a CountedLoopReserveKit, so that we can reverse edits to the loop, in case something goes wrong. As far as I understand all of these conditions should never occur, prior condition checking in SuperWord should have already verified that. We should at least add asserts so that we can catch such failures and fix them, and do not just silently bail out of SuperWord (reverse the graph to before SuperWord and continue compilation). > > `DoReserveCopyInSuperWord` enables `do_reserve_copy()`. It is a product flag and default true. If it is disabled, and there is such a failure we just hit a `ShouldNotReachHere()`. > > There was one occurance I could not assert for: `vmask = create_post_loop_vmask();`. Read more below, there is actually a but there. > > **Testing** > > TODO testing up to tier6 plus stress testing. > (it already passed tier3 and stress testing) > > **Discussion** > > Do we really want to keep the `DoReserveCopyInSuperWord` flag (product, always true), which enables the use of `CountedLoopReserveKit`? It means that we always duplicate the loop (and the loops can be rather large because they were unrolled before SuperWord). It seems a bit of an edge case to want to bail out of SuperWord, but not of the whole compilation. > We can later decide if it makes sense to clone the whole loop via CountedLoopReserveKit (the loops can be large!), or if we should just have a regular compilation bailout instead (could simplify the code and reduce overhead of loop cloning). > > Plus: it seems the checks and bailouts are very selectively applied. I don't see why we would nullptr check some "vector_opd" but not all of them. So if we decide to keep it, we should probably apply it more consistently. > > What do you think? > > ------ > > **Bug: bad combination of -XX:+PostLoopMultiversioning -XX:-DoReserveCopyInSuperWord** > > I filed it here: [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) > > `PostLoopMultiversioning` unrolls the post-loop with the use of a vmask. Read more about post-loop vectorization here https://github.com/openjdk/jdk/pull/6828. But in `create_post_loop_vmask` we have some conditions which have to hold, and if they fail we get a `nullptr`, and bail out of SuperWord, via `CountedLoopReserveKit`. > > But if we turn off `DoReserveCopyInSuperWord`, this is not cought, and we hit an assert. > > Generally, this looks a bit unclean, what we have now: we should do the checks of `create_post_loop_vmask` before `SuperWord::output`,... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: remove rce post loop vmask assert -> it is legit there ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14168/files - new: https://git.openjdk.org/jdk/pull/14168/files/ef8b5845..83531a0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14168&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14168&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14168.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14168/head:pull/14168 PR: https://git.openjdk.org/jdk/pull/14168 From thartmann at openjdk.org Fri May 26 14:51:13 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 May 2023 14:51:13 GMT Subject: RFR: 8308657: ReplayInline is not availabe in production build In-Reply-To: References: Message-ID: <9oO1XLuYt1Gv-oDDQ8ktRND07Zh6yOP42tYs-I3PZgo=.d5045d21-1455-4478-bace-bb096b33cbc7@github.com> On Thu, 25 May 2023 14:37:54 GMT, Ashutosh Mehra wrote: > DumpInline functionality is available in product build but ReplayInline is available in non product build only. This patch makes ReplayInline functionality available in product builds by moving it out of "#ifndef PRODUCT" directive. Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14152#pullrequestreview-1446395479 From duke at openjdk.org Fri May 26 14:51:14 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Fri, 26 May 2023 14:51:14 GMT Subject: Integrated: 8308657: ReplayInline is not availabe in production build In-Reply-To: References: Message-ID: On Thu, 25 May 2023 14:37:54 GMT, Ashutosh Mehra wrote: > DumpInline functionality is available in product build but ReplayInline is available in non product build only. This patch makes ReplayInline functionality available in product builds by moving it out of "#ifndef PRODUCT" directive. This pull request has now been integrated. Changeset: ce5251af Author: Ashutosh Mehra Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ce5251aff7b3d8fb458061ae209d713b6a5a88c8 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod 8308657: ReplayInline is not availabe in production build Reviewed-by: kvn, roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14152 From dnsimon at openjdk.org Fri May 26 15:38:09 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 26 May 2023 15:38:09 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out Message-ID: This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. ------------- Commit messages: - [skip ci] kill subprocess in TestUncaughtErrorInCompileMethod after 10 secs Changes: https://git.openjdk.org/jdk/pull/14173/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14173&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308930 Stats: 48 lines in 2 files changed: 35 ins; 11 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14173.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14173/head:pull/14173 PR: https://git.openjdk.org/jdk/pull/14173 From kvn at openjdk.org Fri May 26 15:45:01 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 May 2023 15:45:01 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag [v2] In-Reply-To: References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: <_ODC_vV_TZHwTbYrdnm5Ydkc-cqoi_kAvkB2nDsSLWA=.694f57d2-ac86-4ffb-a84e-6b74df9c96bc@github.com> On Fri, 26 May 2023 04:37:56 GMT, Ioi Lam wrote: >> Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: >> >> >> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java >> >> >> CSR is not needed because this is a diagnostic VM option. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Removed duplicated compiler name from -XX:+PrintCompilation Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14161#pullrequestreview-1446502142 From kvn at openjdk.org Fri May 26 15:57:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 May 2023 15:57:57 GMT Subject: RFR: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit [v2] In-Reply-To: References: Message-ID: <83U6cThtLr0jnVlOEkgBy367XJAm2oOvHFPX5BbPxD0=.ab8c74b2-26a1-40c3-9382-c7ef7dd1db29@github.com> On Fri, 26 May 2023 13:57:04 GMT, Emanuel Peter wrote: >> In SuperWord::output we create a CountedLoopReserveKit, so that we can reverse edits to the loop, in case something goes wrong. As far as I understand all of these conditions should never occur, prior condition checking in SuperWord should have already verified that. We should at least add asserts so that we can catch such failures and fix them, and do not just silently bail out of SuperWord (reverse the graph to before SuperWord and continue compilation). >> >> `DoReserveCopyInSuperWord` enables `do_reserve_copy()`. It is a product flag and default true. If it is disabled, and there is such a failure we just hit a `ShouldNotReachHere()`. >> >> There was one occurance I could not assert for: `vmask = create_post_loop_vmask();`. Read more below, there is actually a but there. >> >> **Testing** >> >> TODO testing up to tier6 plus stress testing. >> (it already passed tier3 and stress testing) >> >> **Discussion** >> >> Do we really want to keep the `DoReserveCopyInSuperWord` flag (product, always true), which enables the use of `CountedLoopReserveKit`? It means that we always duplicate the loop (and the loops can be rather large because they were unrolled before SuperWord). It seems a bit of an edge case to want to bail out of SuperWord, but not of the whole compilation. >> We can later decide if it makes sense to clone the whole loop via CountedLoopReserveKit (the loops can be large!), or if we should just have a regular compilation bailout instead (could simplify the code and reduce overhead of loop cloning). >> >> Plus: it seems the checks and bailouts are very selectively applied. I don't see why we would nullptr check some "vector_opd" but not all of them. So if we decide to keep it, we should probably apply it more consistently. >> >> What do you think? >> >> ------ >> >> **Bug: bad combination of -XX:+PostLoopMultiversioning -XX:-DoReserveCopyInSuperWord** >> >> I filed it here: [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) >> >> `PostLoopMultiversioning` unrolls the post-loop with the use of a vmask. Read more about post-loop vectorization here https://github.com/openjdk/jdk/pull/6828. But in `create_post_loop_vmask` we have some conditions which have to hold, and if they fail we get a `nullptr`, and bail out of SuperWord, via `CountedLoopReserveKit`. >> >> But if we turn off `DoReserveCopyInSuperWord`, this is not cought, and we hit an assert. >> >> Generally, this looks a bit unclean, what we have now: we should do the checks of `create_post_loop_vm... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > remove rce post loop vmask assert -> it is legit there At least the flag could be diagnostic if we want to keep it. But I am fine with removing this code and flag. We do wanted to bailout superword and undo its graph transformations if something goes wrong but it could be done the other way - recompile this method without superword as we do for escape analysis. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14168#pullrequestreview-1446523155 PR Review: https://git.openjdk.org/jdk/pull/14168#pullrequestreview-1446523382 From sviswanathan at openjdk.org Fri May 26 16:42:02 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 May 2023 16:42:02 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v9] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 05:49:15 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into conv2b-x86-lowering > - Fix assertion from not checking int type > - Cleanup from code review > - Changes from code review > - Merge branch 'master' into conv2b-x86-lowering > - Whitespace tweak > - Make transform conditional > - Remove Conv2B from backend as it's macro expanded now > - Re-work transform to happen in macro expansion > - Fix whitespace and add bug tag to IR test > - ... and 5 more: https://git.openjdk.org/jdk/compare/31683722...65e841f3 Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13345#pullrequestreview-1446590751 From cslucas at openjdk.org Fri May 26 17:30:58 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 26 May 2023 17:30:58 GMT Subject: RFR: 8306625 - Missing instructions on IR-based test framework ALLOC Regex In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 01:18:39 GMT, Cesar Soares Lucas wrote: > On AArch64 with -XX:-UseTLAB, C2 can add an `add`, `mulw` or `addw` around the method call to allocate an object/array. When this happens the current Regex of the IR-based test framework will NOT recognize the instruction sequence as an allocation and the result will be a false-negative test results. > > This PR is to adjust the four Regex to account for those possible instructions. I'll keep a hold on this for now because I've not been able to reproduce the failure _again_. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13631#issuecomment-1564705515 From cslucas at openjdk.org Fri May 26 17:46:03 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 26 May 2023 17:46:03 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 23 May 2023 17:19:23 GMT, Vladimir Ivanov wrote: >>> I verified that the new test cases do trigger SR+NSR scenario. >>> >>> How do you test that deoptimization works as expected? >>> >> >> I have a copy of the tests in AllocationMergesTests.java in a separate file (not included in this PR) and I run the tests with a tool that compares the output of the test with RAM enabled and disabled. So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. >> >>> Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. >>> >> >> I'll take care of that. I was testing only with PrintDebugInfo. >> >>> FTR `_skip_rematerialization` flag is unused now. >>> >> >> yeah, I forgot to remove that. Thanks. >> >>> Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. >>> >> >> Sounds like a good idea. I'll do that. Thanks. >> >>> Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? >> >> I don't think so. This current patch only handle Phis that don't have NULL as input. As part of the reduction process we set at least one of the reducible Phi inputs to NULL. Therefore, subsequent iterations of EA won't reduce the same Phi. > >> So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. > > Please, enhance `AllocationMergesTests` to cover deoptimization (e.g., using WhiteBox API or additional run w/ -XX:+DeoptimizeALot) and ensure that tests are sensitive enough to fail when wrong state is rematerialized. Hi @iwanowww - I pushed some changes to address your latest feedback. > Please, enhance AllocationMergesTests to cover deoptimization (e.g., using WhiteBox API or additional run w/ -XX:+DeoptimizeALot) and ensure that tests are sensitive enough to fail when wrong state is rematerialized. I added the "+DeoptimizeALot" flag on the tests execution. I also refactored the tests so that each test is executed in the Interpreter and in C2 with the same parameters so that we can confirm that result of the test with RAM enabled is correct. > Please, add asserts to catch such situation and a check which bails out compilation (triggering recompilation w/ ReduceAllocationMerges turned off) if it happens with product binaries. I added a new static method `ConnectionGraph::verify_ram_nodes` that does some verification in the inputs and users of RAM nodes. I decided to call the method after each iteration of EA->IGVN->MacroNodeElimination so that we also check that IGVN or `eliminate_macro_nodes` transformations didn't mess with RAM nodes. > I didn't propose exactly that, but I like your idea. I'm not against having it cached on ScopeValue side (and serialized in debug info), but implementing it as a query on ScopeDesc does look like a better alternative. [...] I ended up implementing this a little bit different from what I mentioned earlier. I had some problems with the approach that I described before... In current approach I set the `is_root` flag of ObjectValue's right before the object pool is serialized in output.cpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1564718582 From dnsimon at openjdk.org Fri May 26 19:56:56 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 26 May 2023 19:56:56 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v2] In-Reply-To: References: Message-ID: > This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: [skip ci] replace File with static boolean for communicating between app and compiler threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14173/files - new: https://git.openjdk.org/jdk/pull/14173/files/5bea1aae..8faf2b2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14173&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14173&range=00-01 Stats: 18 lines in 1 file changed: 0 ins; 13 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14173.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14173/head:pull/14173 PR: https://git.openjdk.org/jdk/pull/14173 From never at openjdk.org Fri May 26 20:07:56 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 26 May 2023 20:07:56 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v2] In-Reply-To: References: Message-ID: <27fwvDjD7qATK1c63EdjtyQfjTECKQhwALhj4XUFxWA=.75fd2d17-8b01-4ff5-8d41-6c8091a9c81b@github.com> On Fri, 26 May 2023 19:56:56 GMT, Doug Simon wrote: >> This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] replace File with static boolean for communicating between app and compiler threads Marked as reviewed by never (Reviewer). test/hotspot/jtreg/compiler/jvmci/TestUncaughtErrorInCompileMethod.java line 70: > 68: int total = 0; > 69: while (!compilerCreationErrorOccurred) { > 70: total += getTime(); This summing is weird since it's adding currentTimeMillis each time. Aren't you just trying to report how long it waited, which would just be end - start. Also having a sleep here so it isn't simply spinning wouldn't hurt. ------------- PR Review: https://git.openjdk.org/jdk/pull/14173#pullrequestreview-1446872122 PR Review Comment: https://git.openjdk.org/jdk/pull/14173#discussion_r1207279983 From dnsimon at openjdk.org Fri May 26 20:07:58 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 26 May 2023 20:07:58 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v2] In-Reply-To: <27fwvDjD7qATK1c63EdjtyQfjTECKQhwALhj4XUFxWA=.75fd2d17-8b01-4ff5-8d41-6c8091a9c81b@github.com> References: <27fwvDjD7qATK1c63EdjtyQfjTECKQhwALhj4XUFxWA=.75fd2d17-8b01-4ff5-8d41-6c8091a9c81b@github.com> Message-ID: On Fri, 26 May 2023 20:01:17 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> [skip ci] replace File with static boolean for communicating between app and compiler threads > > test/hotspot/jtreg/compiler/jvmci/TestUncaughtErrorInCompileMethod.java line 70: > >> 68: int total = 0; >> 69: while (!compilerCreationErrorOccurred) { >> 70: total += getTime(); > > This summing is weird since it's adding currentTimeMillis each time. Aren't you just trying to report how long it waited, which would just be end - start. Also having a sleep here so it isn't simply spinning wouldn't hurt. I'm not really trying to report anything related to timing. I just need this loop to do some work that triggers JVMCI compilation. I'm open to better suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14173#discussion_r1207285520 From never at openjdk.org Fri May 26 20:27:57 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 26 May 2023 20:27:57 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v2] In-Reply-To: References: <27fwvDjD7qATK1c63EdjtyQfjTECKQhwALhj4XUFxWA=.75fd2d17-8b01-4ff5-8d41-6c8091a9c81b@github.com> Message-ID: On Fri, 26 May 2023 20:05:22 GMT, Doug Simon wrote: >> test/hotspot/jtreg/compiler/jvmci/TestUncaughtErrorInCompileMethod.java line 70: >> >>> 68: int total = 0; >>> 69: while (!compilerCreationErrorOccurred) { >>> 70: total += getTime(); >> >> This summing is weird since it's adding currentTimeMillis each time. Aren't you just trying to report how long it waited, which would just be end - start. Also having a sleep here so it isn't simply spinning wouldn't hurt. > > I'm not really trying to report anything related to timing. I just need this loop to do some work that triggers JVMCI compilation. I'm open to better suggestions. A comment explaining that it's useless work to trigger a compilation would be help then. The work doesn't really matter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14173#discussion_r1207306633 From dnsimon at openjdk.org Fri May 26 21:04:23 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 26 May 2023 21:04:23 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v3] In-Reply-To: References: Message-ID: <2RPJQMxEzv0mbq422rH8K1wh27P1a28I6NkQZ-EikXg=.7d57221c-ea0a-4788-854c-698f938c2e36@github.com> > This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: [skip ci] clarify that main loop is doing busy work just to trigger compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14173/files - new: https://git.openjdk.org/jdk/pull/14173/files/8faf2b2e..73c065e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14173&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14173&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14173.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14173/head:pull/14173 PR: https://git.openjdk.org/jdk/pull/14173 From tonyp at openjdk.org Fri May 26 22:38:13 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 26 May 2023 22:38:13 GMT Subject: RFR: 8308977: gtest:codestrings fails on riscv Message-ID: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> 8308977: gtest:codestrings fails on riscv ------------- Commit messages: - 8308977: gtest:codestrings fails on riscv Changes: https://git.openjdk.org/jdk/pull/14189/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14189&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308977 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14189.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14189/head:pull/14189 PR: https://git.openjdk.org/jdk/pull/14189 From tonyp at openjdk.org Fri May 26 22:40:53 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 26 May 2023 22:40:53 GMT Subject: RFR: 8308977: gtest:codestrings fails on riscv In-Reply-To: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> References: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> Message-ID: On Fri, 26 May 2023 22:30:11 GMT, Antonios Printezis wrote: > 8308977: gtest:codestrings fails on riscv There's some arch-specific code to trim trailing entries that needed to be extended for RISC V. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14189#issuecomment-1565034042 From chagedorn at openjdk.org Fri May 26 23:34:26 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 May 2023 23:34:26 GMT Subject: RFR: 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without a LoadRangeNode [v2] In-Reply-To: References: Message-ID: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - fix assertion - new fix with bailout for "if iv References: Message-ID: <2W94BG3PEoyOu8dZ8DqUMxEGr-QQR2MV3gW38FEVH4w=.09e910a5-0fc1-4eef-b878-a415b547dd74@github.com> On Thu, 25 May 2023 16:48:35 GMT, Christian Hagedorn wrote: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Thanks Tobias and Roland for the initial reviews. After thinking about the fix again and discussing it with Roland, I updated the fix: We disallow this pattern for an `IfNode`: if (iv =u limit) { } else { trap(); } to have the trap to the false projection. However, this does not match the range check pattern of a `RangeCheckNode`: if (iv =u limit && iv_last_iteration >=u limit <=> // -1 >=u 100 && 999 >= u 100 for (int i = -1; i < 1000; i++) { if (Integer.compareUnsigned(i, 100) < 0) { iFld2++; Float.isNaN(34); // Float class is unloaded with -Xcomp -> inserts trap } else { iFld++; } } However, if `0 <= i < 100`, then the hoisted check `Integer.compareUnsigned(i, 100) < 0` would be false. We then wrongly skip the branch with the `Float.isNan(34)` trap (was removed when creating the Hoisted Range Check Predicate) and miss to execute `iFld2` leading to a wrong execution (or when splitting this loop, we halt because of an initialized Assertion Predicate which will fail). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1565066702 From chagedorn at openjdk.org Fri May 26 23:34:26 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 May 2023 23:34:26 GMT Subject: RFR: 8307683: Loop Predication is wrongly applied to non-RangeCheckNodes without a LoadRangeNode [v2] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 08:26:27 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix assertion >> - new fix with bailout for "if iv > src/hotspot/share/opto/loopPredicate.cpp line 856: > >> 854: if (range->Opcode() != Op_LoadRange) { >> 855: const TypeInteger* tinteger = phase->_igvn.type(range)->isa_integer(bt); >> 856: if (!iff->is_RangeCheck() || tinteger == nullptr || tinteger->empty() || tinteger->lo_as_long() < 0) { > > Can the problem not happen with a LoadRange as second input? Couldn't the LoadRange constant fold and then the predicates could constant fold as well? You're right. It's also a problem with a `LoadRange`, even if it does not constant fold. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1207476796 From roland at openjdk.org Sat May 27 09:52:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Sat, 27 May 2023 09:52:55 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v2] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 23:34:26 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - fix assertion > - new fix with bailout for "if iv This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. For exmple, 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry will become 0x0000ffffa409da88: stp x29, x30, [sp, #16] ; * invocation entry (also synchronization entry if synchronized) ------------- Commit messages: - Update the output of PrintAssembly in a jtreg test - Fix the misleading code comment at method entry Changes: https://git.openjdk.org/jdk/pull/14192/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303451 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14192/head:pull/14192 PR: https://git.openjdk.org/jdk/pull/14192 From duke at openjdk.org Sat May 27 16:34:15 2023 From: duke at openjdk.org (Daohan Qu) Date: Sat, 27 May 2023 16:34:15 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v2] In-Reply-To: References: Message-ID: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> > This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). > > It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. > > For exmple, > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry > > will become > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ; * invocation entry (also synchronization entry if synchronized) Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: Update output again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14192/files - new: https://git.openjdk.org/jdk/pull/14192/files/2f21f37b..362fc750 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14192/head:pull/14192 PR: https://git.openjdk.org/jdk/pull/14192 From chagedorn at openjdk.org Sun May 28 22:13:22 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Sun, 28 May 2023 22:13:22 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 Message-ID: The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. Thanks, Christian ------------- Commit messages: - 8308892: Bad graph detected in build_loop_late after JDK-8305635 Changes: https://git.openjdk.org/jdk/pull/14196/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308892 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14196.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14196/head:pull/14196 PR: https://git.openjdk.org/jdk/pull/14196 From dzhang at openjdk.org Mon May 29 00:55:42 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 29 May 2023 00:55:42 GMT Subject: RFR: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 [v2] In-Reply-To: References: Message-ID: > We have some macro assembler functions that use v0 hardcoded as a temporary > register currently. > > However, the mask value used to control execution of a masked vector > instruction is always supplied by vector register v0 in RVV1.0[1]. If v0 is > alive holding a mask value the the same time, this will cause spilling of > this vector register. So it is better to replace v0 with other vector registers to > improve code execution efficiency. > > In addition, this pr also adds several missing spaces in the format of the > instructions, and fixes several pipeline classes. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > > ## Testing: > QEMU w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14166/files - new: https://git.openjdk.org/jdk/pull/14166/files/05446b0a..9fea08dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14166&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14166&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14166/head:pull/14166 PR: https://git.openjdk.org/jdk/pull/14166 From dzhang at openjdk.org Mon May 29 01:01:02 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 29 May 2023 01:01:02 GMT Subject: RFR: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 [v2] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 09:41:19 GMT, Fei Yang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment > > Marked as reviewed by fyang (Reviewer). @RealFYang @yhzhu20 Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14166#issuecomment-1566339436 From dzhang at openjdk.org Mon May 29 01:06:05 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 29 May 2023 01:06:05 GMT Subject: Integrated: 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 In-Reply-To: References: Message-ID: On Fri, 26 May 2023 02:36:42 GMT, Dingli Zhang wrote: > We have some macro assembler functions that use v0 hardcoded as a temporary > register currently. > > However, the mask value used to control execution of a masked vector > instruction is always supplied by vector register v0 in RVV1.0[1]. If v0 is > alive holding a mask value the the same time, this will cause spilling of > this vector register. So it is better to replace v0 with other vector registers to > improve code execution efficiency. > > In addition, this pr also adds several missing spaces in the format of the > instructions, and fixes several pipeline classes. > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > > ## Testing: > QEMU w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: e21f865d Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/e21f865d84c7c861843ff568019e1ad11d280a50 Stats: 125 lines in 3 files changed: 61 ins; 0 del; 64 mod 8308915: RISC-V: Improve temporary vector register usage avoiding the use of v0 Reviewed-by: yzhu, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14166 From fyang at openjdk.org Mon May 29 01:32:54 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 May 2023 01:32:54 GMT Subject: RFR: 8308977: gtest:codestrings fails on riscv In-Reply-To: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> References: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> Message-ID: On Fri, 26 May 2023 22:30:11 GMT, Antonios Printezis wrote: > 8308977: gtest:codestrings fails on riscv Looks good to me. I missed this failure as I forgot to prepare a hsdis-riscv64.so when running the gtest. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14189#pullrequestreview-1448595419 From jjshanwei at gmail.com Mon May 29 02:04:50 2023 From: jjshanwei at gmail.com (Zhang Li) Date: Mon, 29 May 2023 10:04:50 +0800 Subject: unsubscribe In-Reply-To: References: Message-ID: ?2023?5?29? ??09:01??? > Send hotspot-compiler-dev mailing list submissions to > hotspot-compiler-dev at openjdk.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.openjdk.org/mailman/listinfo/hotspot-compiler-dev > or, via email, send a message with subject or body 'help' to > hotspot-compiler-dev-request at openjdk.org > > You can reach the person managing the list at > hotspot-compiler-dev-owner at openjdk.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of hotspot-compiler-dev digest..." > > > Today's Topics: > > 1. Re: RFR: 8303451: Synchronization entry in C2 debug info is > misleading [v2] (Daohan Qu) > 2. RFR: 8308892: Bad graph detected in build_loop_late after > JDK-8305635 (Christian Hagedorn) > 3. Re: RFR: 8308915: RISC-V: Improve temporary vector register > usage avoiding the use of v0 [v2] (Dingli Zhang) > 4. Re: RFR: 8308915: RISC-V: Improve temporary vector register > usage avoiding the use of v0 [v2] (Dingli Zhang) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 27 May 2023 16:34:15 GMT > From: Daohan Qu > To: > Subject: Re: RFR: 8303451: Synchronization entry in C2 debug info is > misleading [v2] > Message-ID: > <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=. > c257dc76-7544-44b0-94ee-5c89a60c82ed at github.com> > > Content-Type: text/plain; charset=utf-8 > > > This should fix [JDK-8303451]( > https://bugs.openjdk.org/browse/JDK-8303451). > > > > It is a trivial patch that fixes a misleading code comment at method > entry printed by `-XX:+PrintAssembly`. > > > > For exmple, > > > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization > entry > > > > will become > > > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ; * invocation > entry (also synchronization entry if synchronized) > > Daohan Qu has updated the pull request incrementally with one additional > commit since the last revision: > > Update output again > > ------------- > > Changes: > - all: https://git.openjdk.org/jdk/pull/14192/files > - new: https://git.openjdk.org/jdk/pull/14192/files/2f21f37b..362fc750 > > Webrevs: > - full: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=01 > - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=00-01 > > Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod > Patch: https://git.openjdk.org/jdk/pull/14192.diff > Fetch: git fetch https://git.openjdk.org/jdk.git > pull/14192/head:pull/14192 > > PR: https://git.openjdk.org/jdk/pull/14192 > > > ------------------------------ > > Message: 2 > Date: Sun, 28 May 2023 22:13:22 GMT > From: Christian Hagedorn > To: > Subject: RFR: 8308892: Bad graph detected in build_loop_late after > JDK-8305635 > Message-ID: > bf59e9e7-87b7-45c4-9c2d-701c813f6a0f at github.com> > > Content-Type: text/plain; charset=utf-8 > > The cleanup done in [JDK-8305635]( > https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated > Parse Predicates which are not cleaned up, yet. It just walks from the > entry of the loop up and tries to find each of the three Parse Predicates > once but in no particular order. This order insensitive walk is wrong as > seen in the following graph (from the attached replay file of this bug): > > ![image]( > https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52 > ) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse > Predicate` for Profiled Loop Predicates and then stop when finding `71 > Parse Predicate` for Loop Predicates because we've already found a Parse > Predicate for Loop Predicates already. We then wrongly create Loop > Predicates (above `116 Parse Predicate`) which are below newly created > Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a > bad graph because of data dependencies that rely on the fact that Loop > Predicates are above Profiled Loop Predicates: > > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate > projections in `ParsePredicates` aware of the relative ordering constraint. > Note that this class will be refactored again in [JDK-8305636]( > https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing > this first is better than waiting for JDK-8305636 to go in. > > Thanks, > Christian > > ------------- > > Commit messages: > - 8308892: Bad graph detected in build_loop_late after JDK-8305635 > > Changes: https://git.openjdk.org/jdk/pull/14196/files > Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=00 > Issue: https://bugs.openjdk.org/browse/JDK-8308892 > Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod > Patch: https://git.openjdk.org/jdk/pull/14196.diff > Fetch: git fetch https://git.openjdk.org/jdk.git > pull/14196/head:pull/14196 > > PR: https://git.openjdk.org/jdk/pull/14196 > > > ------------------------------ > > Message: 3 > Date: Mon, 29 May 2023 00:55:42 GMT > From: Dingli Zhang > To: > Subject: Re: RFR: 8308915: RISC-V: Improve temporary vector register > usage avoiding the use of v0 [v2] > Message-ID: > 907e2247-489d-4081-af8b-b138350dfe7a at github.com> > > Content-Type: text/plain; charset=utf-8 > > > We have some macro assembler functions that use v0 hardcoded as a > temporary > > register currently. > > > > However, the mask value used to control execution of a masked vector > > instruction is always supplied by vector register v0 in RVV1.0[1]. If v0 > is > > alive holding a mask value the the same time, this will cause spilling > of > > this vector register. So it is better to replace v0 with other vector > registers to > > improve code execution efficiency. > > > > In addition, this pr also adds several missing spaces in the format of > the > > instructions, and fixes several pipeline classes. > > > > [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc > > > > ## Testing: > > QEMU w/ UseRVV: > > - [x] Tier1 tests (release) > > - [x] Tier2 tests (release) > > - [x] Tier3 tests (release) > > - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Dingli Zhang has updated the pull request incrementally with one > additional commit since the last revision: > > Fix comment > > ------------- > > Changes: > - all: https://git.openjdk.org/jdk/pull/14166/files > - new: https://git.openjdk.org/jdk/pull/14166/files/05446b0a..9fea08dc > > Webrevs: > - full: https://webrevs.openjdk.org/?repo=jdk&pr=14166&range=01 > - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14166&range=00-01 > > Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod > Patch: https://git.openjdk.org/jdk/pull/14166.diff > Fetch: git fetch https://git.openjdk.org/jdk.git > pull/14166/head:pull/14166 > > PR: https://git.openjdk.org/jdk/pull/14166 > > > ------------------------------ > > Message: 4 > Date: Mon, 29 May 2023 01:01:02 GMT > From: Dingli Zhang > To: > Subject: Re: RFR: 8308915: RISC-V: Improve temporary vector register > usage avoiding the use of v0 [v2] > Message-ID: > 9e998047-1019-4ac9-ac6c-22bef573048f at github.com> > > Content-Type: text/plain; charset=utf-8 > > On Fri, 26 May 2023 09:41:19 GMT, Fei Yang wrote: > > >> Dingli Zhang has updated the pull request incrementally with one > additional commit since the last revision: > >> > >> Fix comment > > > > Marked as reviewed by fyang (Reviewer). > > @RealFYang @yhzhu20 Thanks for the review! > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/14166#issuecomment-1566339436 > > > End of hotspot-compiler-dev Digest, Vol 192, Issue 196 > ****************************************************** > -- ?? ???????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Mon May 29 02:20:07 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 29 May 2023 02:20:07 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4] In-Reply-To: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: > In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. > > For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. > > However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. > > This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. > > For example, > > > var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); > m.not().trueCount(); > > > will produce following assembly on a Neon machine before this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > xtn v16.4h, v16.4s > xtn v16.8b, v16.8h > neg v16.8b, v16.8b // VectorStoreMask > addv b17, v16.8b > umov w0, v17.b[0] // VectorMask.trueCount() > ... > > > After this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > addv s17, v16.4s > smov x0, v17.b[0] > neg x0, x0 // Optimized VectorMask.trueCount() > ... > > > In this case, we can save two xtn insns. > > Performance: > > Benchmark Before After Unit > testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms > testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms > testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms > > [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 > [2]: https://github.com/openjdk/jdk/b... Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Update aarch64_vector.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13974/files - new: https://git.openjdk.org/jdk/pull/13974/files/567f69a2..e8762c03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13974&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13974.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13974/head:pull/13974 PR: https://git.openjdk.org/jdk/pull/13974 From Pengfei.Li at arm.com Mon May 29 03:12:52 2023 From: Pengfei.Li at arm.com (Pengfei Li) Date: Mon, 29 May 2023 03:12:52 +0000 Subject: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization Message-ID: Hi, I'm writing to let you know that I just filed "JDK-8308994: C2: Re-implement experimental post loop vectorization". [BACKGROUND] Current post loop vectorization in the C2 compiler has a long history. It was firstly implemented in JDK-8153998 in 2016 as an experimental feature to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, I took over JDK-8183390 to fix and re-enable this feature. Several issues were fixed and AArch64 SVE vector mask support was added in the meanwhile. We (Arm) proposed to make post loop vectorization non-experimental in future JDK releases. So early in this year (2023), we did a lot of tests on this but found more problems inside. [PROBLEMS] Problems include stability, maintainability and performance. 1) Stability issues Multiple C2 crash or mis-compilation issues were filed on JBS, including JDK-8301657, JDK-8301904, JDK-8301944, JDK-8304774, JDK-8308949 and perhaps more. 2) Maintainability issue The original implementation was based on multi-versioned post loops and the logic was mixed in SuperWord. But the algorithm for post loop vectorization is actually *not* SLP. As more and more new features were added in SuperWord, legacy code for post loop vectorization is becoming more and more difficult to maintain. 3) Performance issue Post loop vectorization was expected to bring performance improvement for small-iteration vectorizable loops. But JMH tests show it doesn't. A main reason is that the vector masked post loop is skipped (not executed) if the loop trip count is small due to zero-trip guard of the main loop. That's a major defect of current multi-versioning framework. (See JDK-8307084 for more details.) [ACTIONS] For better stability, maintainability and performance, we now propose to deprecate current multi-versioning framework and completely re-implement the experimental post loop vectorization, for both x86 AVX-512 and AArch64 SVE. Our new proposal is to add a standalone ideal loop phase (outside SuperWord) to do vector mask transformation directly on the original scalar post loop. We have been working on this internally for a while. So far we have finished a draft patch. I will push the patch for review soon after it passes all tests and becomes polished enough. -- Thanks, Pengfei From rcastanedalo at openjdk.org Mon May 29 07:11:01 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 29 May 2023 07:11:01 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Thanks, > Christian Looks good! Windows build failure in GHA testing is unrelated. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14196#pullrequestreview-1448985802 From chagedorn at openjdk.org Mon May 29 09:57:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 May 2023 09:57:53 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1566876652 From dnsimon at openjdk.org Mon May 29 10:08:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 29 May 2023 10:08:07 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v3] In-Reply-To: <2RPJQMxEzv0mbq422rH8K1wh27P1a28I6NkQZ-EikXg=.7d57221c-ea0a-4788-854c-698f938c2e36@github.com> References: <2RPJQMxEzv0mbq422rH8K1wh27P1a28I6NkQZ-EikXg=.7d57221c-ea0a-4788-854c-698f938c2e36@github.com> Message-ID: On Fri, 26 May 2023 21:04:23 GMT, Doug Simon wrote: >> This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] clarify that main loop is doing busy work just to trigger compilation Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14173#issuecomment-1566887209 From dnsimon at openjdk.org Mon May 29 10:08:08 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 29 May 2023 10:08:08 GMT Subject: Integrated: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:35:58 GMT, Doug Simon wrote: > This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. This pull request has now been integrated. Changeset: a5d8d594 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/a5d8d594694c0e863dd30780a691a3a5ad9c6ee8 Stats: 65 lines in 2 files changed: 35 ins; 22 del; 8 mod 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/14173 From iklam at openjdk.org Mon May 29 20:29:04 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 29 May 2023 20:29:04 GMT Subject: RFR: 8308906: Make CIPrintCompilerName a diagnostic flag In-Reply-To: References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: <1lBcQc3ppvLPzQOlQZbJ4-uf3Hj1jSW_GkVUgj0a2CM=.f1804b2b-3e62-4f9c-8b8e-52c4b3020b03@github.com> On Thu, 25 May 2023 23:29:45 GMT, Vladimir Kozlov wrote: >> Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: >> >> >> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java >> >> >> CSR is not needed because this is a diagnostic VM option. > > UL output is correct (no duplication): > > [0.055s][debug][jit,compilation] C1: 1 3 java.lang.Object:: (1 bytes) Thanks @vnkozlov and @tstuefe for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14161#issuecomment-1567488550 From iklam at openjdk.org Mon May 29 20:29:05 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 29 May 2023 20:29:05 GMT Subject: Integrated: 8308906: Make CIPrintCompilerName a diagnostic flag In-Reply-To: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> References: <0x30il3amCGtFE9ybdt1BujGMYaBJNgjL1F9xiIIWUo=.b723c7cd-59cc-4cf1-8ad1-d24bc991f6c0@github.com> Message-ID: On Thu, 25 May 2023 20:36:42 GMT, Ioi Lam wrote: > Please review a very simple change. This makes it easy to see which JIT compiler is used to compile each method: > > > java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+CIPrintCompilerName -jar MyApp.java > > > CSR is not needed because this is a diagnostic VM option. This pull request has now been integrated. Changeset: 7508d9f9 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/7508d9f9e0cea84d2be8d974215daae3c75140c3 Stats: 5 lines in 2 files changed: 0 ins; 4 del; 1 mod 8308906: Make CIPrintCompilerName a diagnostic flag Reviewed-by: kvn, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/14161 From gcao at openjdk.org Tue May 30 00:05:56 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 May 2023 00:05:56 GMT Subject: RFR: 8308817: RISC-V: Support VectorTest node for Vector API In-Reply-To: <4LCdEWiPy5S_XKUWnRnufZHXUG_Jl2DRQKx3ivPBV7I=.c158359f-4e6d-46f5-b4f0-c5d6695d77ed@github.com> References: <4LCdEWiPy5S_XKUWnRnufZHXUG_Jl2DRQKx3ivPBV7I=.c158359f-4e6d-46f5-b4f0-c5d6695d77ed@github.com> Message-ID: On Fri, 26 May 2023 09:40:08 GMT, Fei Yang wrote: >> Hi, >> >> we have added VectorTest node, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. >> We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. >> >> For example, we can use the following command to print the compilation log of a jtreg test case: >> >> /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ >> -v:default \ >> -concurrency:16 -timeout:50 \ >> -javaoption:-XX:+UnlockExperimentalVMOptions \ >> -javaoption:-XX:+UseRVV \ >> -javaoption:-XX:+PrintOptoAssembly \ >> -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/Int256VectorTests_PrintOptoAssembly_20230525.log \ >> -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ >> -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ >> /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java >> >> >> Also here's a more concise test case, VectorTestDemo: >> >> >> import jdk.incubator.vector.ByteVector; >> import jdk.incubator.vector.VectorMask; >> >> public class VectorTestDemo { >> static boolean[] d = new boolean[]{true, false, false, false, false, false, false, false}; >> static VectorMask avmask = VectorMask.fromArray(ByteVector.SPECIES_64, d, 0); >> >> public static void main(String[] args) { >> for (int i = 0; i < 300000; i++) { >> >> final boolean alltrue = alltrue(); >> if (alltrue != false) { >> throw new RuntimeException("alltrue"); >> } >> final boolean anytrue = anytrue(); >> if (anytrue != true) { >> throw new RuntimeException("anytrue"); >> } >> } >> } >> >> public static boolean anytrue() { >> return avmask.anyTrue(); >> } >> >> public static boolean alltrue() { >> return avmask.allTrue(); >> } >> } >> >> >> We can compile `VectorTestDemo.java` using `javac --add-modules jdk.incubator.vector VectorTestDemo.java`, and use `./java -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseRVV -XX:+PrintOptoAssembly -XX:+LogCompilation -XX:LogFile=compile.log VectorTestDemo > aaa.log` to start the test case, we can observe the specified compilation log `compile.log`, which contains the VectorTest node for the PR implementation. >> Some of the compilation logs of VectorTestDemo#anytrue method are as follows. >> >> 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/i... > > Looks good. @RealFYang @feilongjiang Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14138#issuecomment-1567607373 From gcao at openjdk.org Tue May 30 00:43:19 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 May 2023 00:43:19 GMT Subject: Integrated: 8308817: RISC-V: Support VectorTest node for Vector API In-Reply-To: References: Message-ID: On Thu, 25 May 2023 03:22:18 GMT, Gui Cao wrote: > Hi, > > we have added VectorTest node, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. > We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. > > For example, we can use the following command to print the compilation log of a jtreg test case: > > /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ > -v:default \ > -concurrency:16 -timeout:50 \ > -javaoption:-XX:+UnlockExperimentalVMOptions \ > -javaoption:-XX:+UseRVV \ > -javaoption:-XX:+PrintOptoAssembly \ > -javaoption:-XX:LogFile=/home/zifeihan/jdk-rvv/Int256VectorTests_PrintOptoAssembly_20230525.log \ > -jdk:/home/zifeihan/jdk-rvv/build/linux-riscv64-server-fastdebug/jdk \ > -compilejdk:/home/zifeihan/jdk-rvv/build/linux-x86_64-server-release/images/jdk \ > /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java > > > Also here's a more concise test case, VectorTestDemo: > > > import jdk.incubator.vector.ByteVector; > import jdk.incubator.vector.VectorMask; > > public class VectorTestDemo { > static boolean[] d = new boolean[]{true, false, false, false, false, false, false, false}; > static VectorMask avmask = VectorMask.fromArray(ByteVector.SPECIES_64, d, 0); > > public static void main(String[] args) { > for (int i = 0; i < 300000; i++) { > > final boolean alltrue = alltrue(); > if (alltrue != false) { > throw new RuntimeException("alltrue"); > } > final boolean anytrue = anytrue(); > if (anytrue != true) { > throw new RuntimeException("anytrue"); > } > } > } > > public static boolean anytrue() { > return avmask.anyTrue(); > } > > public static boolean alltrue() { > return avmask.allTrue(); > } > } > > > We can compile `VectorTestDemo.java` using `javac --add-modules jdk.incubator.vector VectorTestDemo.java`, and use `./java -XX:-TieredCompilation -XX:+UnlockExperimentalVMOptions -XX:+UseRVV -XX:+PrintOptoAssembly -XX:+LogCompilation -XX:LogFile=compile.log VectorTestDemo > aaa.log` to start the test case, we can observe the specified compilation log `compile.log`, which contains the VectorTest node for the PR implementation. > Some of the compilation logs of VectorTestDemo#anytrue method are as follows. > > 05e lwu R28, [R7, #12] # loadN, compressed ptr, #@loadN ! Field: jdk/internal/vm/vector/VectorSupport$VectorPayload.payload (con... This pull request has now been integrated. Changeset: 457e1cb8 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/457e1cb827f4d0a28da2fb76bff760401d677bef Stats: 108 lines in 2 files changed: 107 ins; 0 del; 1 mod 8308817: RISC-V: Support VectorTest node for Vector API Co-authored-by: Dingli Zhang Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14138 From duke at openjdk.org Tue May 30 02:54:09 2023 From: duke at openjdk.org (duke) Date: Tue, 30 May 2023 02:54:09 GMT Subject: Withdrawn: JDK-8304684: Memory leak in DirectivesParser::set_option_flag In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 16:53:18 GMT, Justin King wrote: > Update `DirectivesSet` to take ownership of string options in some cases, to not leak memory. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13125 From eliu at openjdk.org Tue May 30 03:44:54 2023 From: eliu at openjdk.org (Eric Liu) Date: Tue, 30 May 2023 03:44:54 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: On Mon, 29 May 2023 02:20:07 GMT, Chang Peng wrote: >> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. >> >> For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. >> >> However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. >> >> This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. >> >> For example, >> >> >> var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); >> m.not().trueCount(); >> >> >> will produce following assembly on a Neon machine before this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> xtn v16.4h, v16.4s >> xtn v16.8b, v16.8h >> neg v16.8b, v16.8b // VectorStoreMask >> addv b17, v16.8b >> umov w0, v17.b[0] // VectorMask.trueCount() >> ... >> >> >> After this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> addv s17, v16.4s >> smov x0, v17.b[0] >> neg x0, x0 // Optimized VectorMask.trueCount() >> ... >> >> >> In this case, we can save two xtn insns. >> >> Performance: >> >> Benchmark Before After Unit >> testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms >> testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms >> testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms >> >> [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vect... > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update aarch64_vector.ad Still looks good to me. ------------- Marked as reviewed by eliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/13974#pullrequestreview-1450026196 From thartmann at openjdk.org Tue May 30 06:08:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 May 2023 06:08:08 GMT Subject: RFR: JDK-8304684: Memory leak in DirectivesParser::set_option_flag [v4] In-Reply-To: References: <9XO5we9RK8MKNE5HpGWLFySNOr6Y_TB6gXl13ksg0Yo=.dec7763e-9483-4c8c-ba79-7b6d47148d81@github.com> Message-ID: On Fri, 31 Mar 2023 14:27:58 GMT, Justin King wrote: >> Justin King has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjust logic based on review >> >> Signed-off-by: Justin King > > Going to run this through ASan/LSan to double check. @jcking any plans to still integrate this? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13125#issuecomment-1567793932 From thartmann at openjdk.org Tue May 30 06:07:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 May 2023 06:07:59 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v9] In-Reply-To: References: Message-ID: <9-SUY6vmhq6WGWjSSte4gd7boTIejVCjbLTIojmPhbI=.1fa63b23-92ef-412a-87be-db1f3257b11c@github.com> On Fri, 26 May 2023 05:49:15 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into conv2b-x86-lowering > - Fix assertion from not checking int type > - Cleanup from code review > - Changes from code review > - Merge branch 'master' into conv2b-x86-lowering > - Whitespace tweak > - Make transform conditional > - Remove Conv2B from backend as it's macro expanded now > - Re-work transform to happen in macro expansion > - Fix whitespace and add bug tag to IR test > - ... and 5 more: https://git.openjdk.org/jdk/compare/31683722...65e841f3 All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1567788214 From thartmann at openjdk.org Tue May 30 06:08:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 May 2023 06:08:01 GMT Subject: RFR: 8308930: [JVMCI] TestUncaughtErrorInCompileMethod times out [v3] In-Reply-To: <2RPJQMxEzv0mbq422rH8K1wh27P1a28I6NkQZ-EikXg=.7d57221c-ea0a-4788-854c-698f938c2e36@github.com> References: <2RPJQMxEzv0mbq422rH8K1wh27P1a28I6NkQZ-EikXg=.7d57221c-ea0a-4788-854c-698f938c2e36@github.com> Message-ID: <4MAuo34be60JGRFV5koQT18YGrLcVwFJCI0JVa3iADM=.ffc79ec5-54a1-4e1e-8716-b97f9b300614@github.com> On Fri, 26 May 2023 21:04:23 GMT, Doug Simon wrote: >> This PR makes TestUncaughtErrorInCompileMethod more robust against HotSpot compilation scheduling variability which should prevent timeouts in this test. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > [skip ci] clarify that main loop is doing busy work just to trigger compilation Unfortunately, the test still times out, see [JDK-8309073](https://bugs.openjdk.org/browse/JDK-8309073). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14173#issuecomment-1567793000 From duke at openjdk.org Tue May 30 06:24:28 2023 From: duke at openjdk.org (Daohan Qu) Date: Tue, 30 May 2023 06:24:28 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v2] In-Reply-To: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> References: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> Message-ID: On Sat, 27 May 2023 16:34:15 GMT, Daohan Qu wrote: >> This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). >> >> It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. >> >> For exmple, >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry >> >> will become >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;* invocation entry (also synchronization entry if synchronized) > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Update output again Hi @TobiHartmann , could you please review this change? Thanks in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14192#issuecomment-1567833444 From epeter at openjdk.org Tue May 30 07:16:28 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 May 2023 07:16:28 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required Message-ID: This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). **Changes to Tests** I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such optimizations. **Testing** Passes tier6 and stress testing. No significant change in performance testing. You can find some `x64` and `aarch64` benchmarks below, together with analysis and explanations. **There is a lot of information below. Feel free to read as little or as much as you want and find helpful.** --------- **Benchmark Data** Machine: 11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16. With `AVX512` support. Executed like this: make test TEST="micro:vm.compiler.VectorAlignment" CONF=linux-x64 I have 4 flag combinations: NoSuperWord: -XX:-UseSuperWord (expect no vectorization) SuperWord: -XX:+UseSuperWord (normal mode) SuperWordAlignVector: -XX:+UseSuperWord -XX:+AlignVector (normal mode on machine with strict alignment) SuperWordWithVectorize: -XX:+UseSuperWord -XX:CompileCommand=Option,*::*,Vectorize (Vectorize flag enabled) With patch: VectorAlignment.VectorAlignmentNoSuperWord.bench000_control 2048 0 avgt 2465.937 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2509.747 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench100_misaligned_load 2048 0 avgt 2484.883 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 2489.044 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2463.388 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2464.048 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2476.954 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2592.562 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2563.649 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000_control 2048 0 avgt 315.926 ns/op VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 327.533 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100_misaligned_load 2048 0 avgt 319.991 ns/op VectorAlignment.VectorAlignmentSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 318.550 ns/op VectorAlignment.VectorAlignmentSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2504.033 ns/op VectorAlignment.VectorAlignmentSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2455.425 ns/op VectorAlignment.VectorAlignmentSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2545.703 ns/op VectorAlignment.VectorAlignmentSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2499.617 ns/op VectorAlignment.VectorAlignmentSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2473.191 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench000_control 2048 0 avgt 313.877 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench001_control 2048 0 avgt 341.554 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench100_misaligned_load 2048 0 avgt 2465.338 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench200_hand_unrolled_aligned 2048 0 avgt 312.662 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench300_multiple_misaligned_loads 2048 0 avgt 2455.039 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench301_multiple_misaligned_loads 2048 0 avgt 2456.872 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2604.665 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench400_hand_unrolled_misaligned 2048 0 avgt 2456.425 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench401_hand_unrolled_misaligned 2048 0 avgt 2507.887 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench000_control 2048 0 avgt 312.670 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench001_control 2048 0 avgt 328.561 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench100_misaligned_load 2048 0 avgt 314.785 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench200_hand_unrolled_aligned 2048 0 avgt 2454.712 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench300_multiple_misaligned_loads 2048 0 avgt 320.622 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench301_multiple_misaligned_loads 2048 0 avgt 341.595 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 516.716 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench400_hand_unrolled_misaligned 2048 0 avgt 2469.011 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench401_hand_unrolled_misaligned 2048 0 avgt 2542.513 ns/op On master: VectorAlignment.VectorAlignmentNoSuperWord.bench000_control 2048 0 avgt 2467.072 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2476.239 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench100_misaligned_load 2048 0 avgt 2467.182 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 2460.985 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2564.807 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2568.871 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2498.102 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2492.498 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2473.459 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000_control 2048 0 avgt 320.142 ns/op VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 328.415 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100_misaligned_load 2048 0 avgt 2464.787 ns/op VectorAlignment.VectorAlignmentSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 313.505 ns/op VectorAlignment.VectorAlignmentSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2459.245 ns/op VectorAlignment.VectorAlignmentSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2500.698 ns/op VectorAlignment.VectorAlignmentSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2579.449 ns/op VectorAlignment.VectorAlignmentSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2465.709 ns/op VectorAlignment.VectorAlignmentSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2470.722 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench000_control 2048 0 avgt 312.058 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench001_control 2048 0 avgt 329.024 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench100_misaligned_load 2048 0 avgt 2472.375 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench200_hand_unrolled_aligned 2048 0 avgt 309.370 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench300_multiple_misaligned_loads 2048 0 avgt 2468.434 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench301_multiple_misaligned_loads 2048 0 avgt 2477.122 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2561.528 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench400_hand_unrolled_misaligned 2048 0 avgt 2478.820 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench401_hand_unrolled_misaligned 2048 0 avgt 2462.620 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench000_control 2048 0 avgt 313.276 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench001_control 2048 0 avgt 331.348 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench100_misaligned_load 2048 0 avgt 314.130 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench200_hand_unrolled_aligned 2048 0 avgt 2465.140 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench300_multiple_misaligned_loads 2048 0 avgt 335.176 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench301_multiple_misaligned_loads 2048 0 avgt 335.492 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 550.598 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench400_hand_unrolled_misaligned 2048 0 avgt 2511.170 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench401_hand_unrolled_misaligned 2048 0 avgt 2468.112 ns/op Generally: we can see the difference between vectorization and non-vectorization easily: without vectorization the runtime is over `2000 ns/op`, with vectorization it is under `600 ns/op`. In comparison on a `aarch64` machine with `asimd` support: With the patch: VectorAlignment.VectorAlignmentNoSuperWord.bench000_control 2048 0 avgt 2058.132 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2071.570 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench100_misaligned_load 2048 0 avgt 2063.994 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 2051.104 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2058.493 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2060.856 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2213.880 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2060.412 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2055.939 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000_control 2048 0 avgt 1032.666 ns/op VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 1034.138 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100_misaligned_load 2048 0 avgt 1031.412 ns/op VectorAlignment.VectorAlignmentSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 1030.791 ns/op VectorAlignment.VectorAlignmentSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2057.689 ns/op VectorAlignment.VectorAlignmentSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2057.009 ns/op VectorAlignment.VectorAlignmentSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1465.270 ns/op VectorAlignment.VectorAlignmentSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2053.011 ns/op VectorAlignment.VectorAlignmentSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2055.820 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench000_control 2048 0 avgt 1032.645 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench001_control 2048 0 avgt 1034.199 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench100_misaligned_load 2048 0 avgt 2064.206 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench200_hand_unrolled_aligned 2048 0 avgt 1026.581 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench300_multiple_misaligned_loads 2048 0 avgt 2057.236 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench301_multiple_misaligned_loads 2048 0 avgt 2057.276 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1465.736 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench400_hand_unrolled_misaligned 2048 0 avgt 2056.355 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench401_hand_unrolled_misaligned 2048 0 avgt 2064.056 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench000_control 2048 0 avgt 1033.816 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench001_control 2048 0 avgt 1034.002 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench100_misaligned_load 2048 0 avgt 1032.607 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench200_hand_unrolled_aligned 2048 0 avgt 2052.119 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench300_multiple_misaligned_loads 2048 0 avgt 1026.828 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench301_multiple_misaligned_loads 2048 0 avgt 1027.582 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1034.751 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench400_hand_unrolled_misaligned 2048 0 avgt 2052.453 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench401_hand_unrolled_misaligned 2048 0 avgt 2058.007 ns/op On master: VectorAlignment.VectorAlignmentNoSuperWord.bench000_control 2048 0 avgt 2058.009 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2070.553 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench100_misaligned_load 2048 0 avgt 2064.553 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 2053.390 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2058.187 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2060.125 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 2208.483 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2058.145 ns/op VectorAlignment.VectorAlignmentNoSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2056.145 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000_control 2048 0 avgt 1032.566 ns/op VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 1033.856 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100_misaligned_load 2048 0 avgt 2065.720 ns/op VectorAlignment.VectorAlignmentSuperWord.bench200_hand_unrolled_aligned 2048 0 avgt 1026.648 ns/op VectorAlignment.VectorAlignmentSuperWord.bench300_multiple_misaligned_loads 2048 0 avgt 2057.476 ns/op VectorAlignment.VectorAlignmentSuperWord.bench301_multiple_misaligned_loads 2048 0 avgt 2058.508 ns/op VectorAlignment.VectorAlignmentSuperWord.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1465.702 ns/op VectorAlignment.VectorAlignmentSuperWord.bench400_hand_unrolled_misaligned 2048 0 avgt 2053.303 ns/op VectorAlignment.VectorAlignmentSuperWord.bench401_hand_unrolled_misaligned 2048 0 avgt 2052.170 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench000_control 2048 0 avgt 1032.788 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench001_control 2048 0 avgt 1033.912 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench100_misaligned_load 2048 0 avgt 2064.447 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench200_hand_unrolled_aligned 2048 0 avgt 1027.305 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench300_multiple_misaligned_loads 2048 0 avgt 2058.339 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench301_multiple_misaligned_loads 2048 0 avgt 2057.675 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1465.643 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench400_hand_unrolled_misaligned 2048 0 avgt 2055.289 ns/op VectorAlignment.VectorAlignmentSuperWordAlignVector.bench401_hand_unrolled_misaligned 2048 0 avgt 2052.978 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench000_control 2048 0 avgt 1032.738 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench001_control 2048 0 avgt 1034.188 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench100_misaligned_load 2048 0 avgt 1031.948 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench200_hand_unrolled_aligned 2048 0 avgt 2051.954 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench300_multiple_misaligned_loads 2048 0 avgt 1027.746 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench301_multiple_misaligned_loads 2048 0 avgt 1028.121 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench302_multiple_misaligned_loads_and_stores 2048 0 avgt 1035.034 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench400_hand_unrolled_misaligned 2048 0 avgt 2054.449 ns/op VectorAlignment.VectorAlignmentSuperWordWithVectorize.bench401_hand_unrolled_misaligned 2048 0 avgt 2053.339 ns/op Also with `aarch64` we can see a clear difference between vectorization and non-vectorization. The pattern is the same, even though the concrete numbers are a bit different. **Benchmark Discussion: 0xx control** These are simple examples that vectorize unless `SuperWord` is disabled. Just to make sure the benchmark works. https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/test/micro/org/openjdk/bench/vm/compiler/VectorAlignment.java#L70-L77 **Benchmark Discussion: 1xx load and store misaligned** This vectorizes with the patch, but does not vectorize on master. It does not vectorize with `AlignVector` because of the misalignment (misaligned by `1 int = 4 byte`). On master, we require all vectors to align with all other vectors of the same memory slice. https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/test/micro/org/openjdk/bench/vm/compiler/VectorAlignment.java#L79-L85 **Benchmark Discussion: 2xx vectorizes only without Vectorize** Hand-unrolling confuses SuperWord with `Vectorize` flag. The issue is that adjacent memops are not from the same original same-iteration node - rather they are from two different lines. https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/test/micro/org/openjdk/bench/vm/compiler/VectorAlignment.java#L87-L94 Here the relevant checks in `SuperWord::find_adjacent_refs`: https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/src/hotspot/share/opto/superword.cpp#L717-L718 https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/src/hotspot/share/opto/superword.cpp#L743 **Benchmark Discussion: 3xx vectorizes only with Vectorize** Regular SuperWord fails in these cases for 2 reasons: - 300 fails because of the modulo computation of the algignment - 301 fails because we can confuse multiple loads (`aI[5]` with `aI[4+1]`). https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/test/micro/org/openjdk/bench/vm/compiler/VectorAlignment.java#L96-L110 In `SuperWord::memory_alignment` we compute the alignment, modulo the `vw` (wector width). https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/src/hotspot/share/opto/superword.cpp#L3717-L3720 Now assume we have two load vectors, with `offsets` `[0,4,8,12]` and `[4,8,12,16]`. If we have `vw=16`, then we get the `off_mod` to be `[0,4,8,12]` and `[4,8,12,0]`. The second vector thus has the last element wrap in the modulo space, and it does not pass the alignment checks (`align1 + data_size == align2`): https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/src/hotspot/share/opto/superword.cpp#L1302-L1303 The consequence is that we only pack 3 of the 4 memops. And then the pack gets filtered out here, and vectorization fails: https://github.com/openjdk/jdk/blob/e0658a4ff1bb5c746599f44192281cb959c47f5b/src/hotspot/share/opto/superword.cpp#L1855 One solution for this is to compute the alignment without modulo. The modulo computation of alignment comes from a time where we could only have strictly aligned memory accesses, and so we would not want to ever pack pairs that cross an alignment boundary, where the modulo wraps around. We could address this in a **future RFE**. The second issue is for vectorization is confusing multiple loads that look identical, eg `aI[5]` and `aI[4+1]`. A loop like `b[i] = a[i] + a[i+1]` that was unrolled 4 times will have these loads: `a[i], a[i+1], a[i+1], a[i+2], a[i+2], a[i+3], a[i+3], a[i+4]`. The SuperWord (SLP) algorithm just greedily picks pairs that are adjacent, and has no mechanism to deal with multiple packing options: we can pair `a[i]` with either of the two `a[i+1]`. If we do not perfectly pack them, then the packs will not line up with other packs, and vectorization fails. In the literature I have seen people who solve this problem with integer linear programming, but that would most likely be too expensive for our JIT C2. We just have to accept that SuperWord (SLP) is greedy and cannot pack things optimally in all cases. Lickily, the `Vectorize` approach does solve most of these cases, as it can separate the two loads in `b[i] = a[i] + a[i+1]`, they come from two different nodes in the single-iteration loop. **Benchmark Discussion: 4xx vectorizes does not vectorize at all, even though it should be possible in principle** We combine the issues from 2xx and 3xx: hand-unrolling prevents `Vectorize` from working, and confusion of multiple loads or the modulo alignment computation prevent non-Vectorize from working. ------- **Unifying multiple SuperWord Strategies and beyond** We have now seen examples where sometimes it is better to go with the `Vectorize` flag, and sometimes it is better without it. Would it not be great if we could try out both strategies, and then pick the better one? A very **naive solution**: Just try without `Vectorize` first. If we get a non-empty packset go with it. If it is empty, then try with `Vectorize`. That may work in many cases, but there will also be a few cases where without `Vectorize` we do create a non-empty packset, which is just very very suboptimal. Plus: in the future we may consider expanding the approaches to non-adjacent memory refs, such as strided accesses or even gather/scatter (as long as the dependency checks pass). Then it will be even more possible that both strategies create a non-empty packset, but one of the two strategies creates a much better packset than the other. A **better solution**: try out both approaches, and evaluate them with a cost-model. Also compute the cost of the non-vectorized loop. Then pick the best option. This cost-model will also be helpful to decide if we should vectorize when we have `Reduction` nodes (they can be very expensive) or when introducing vector-shuffles (we probably want to introduce them to allow reverse-order loops, where we need to reverse the vector lane order with a shuffle). My suggestion is this: run both SuperWord approaches until we have the PacksetGraph. At this point we know if we could vectorize, including if any dependency-checks fail (independence checks, cycle check). Then, we evaluate the cost if we were to apply this `PacksetGraph`. We pick the cheapest `PacksetGraph` and apply it. This approach is also extensible (I got a bit inspired by LLVM talks about VPlan). We can rename the `PacksetGraph` to a more genral `VectorTransformGraph` in the following steps: 1. Create multiple `VectorTransformGraph` through **multiple SuperWord strategies** (with and without `Vectorize`). With a cost model pick the best one. 2. Sometimes, there are too many nodes in the loop and we cannot unroll enough times to ensure there are enough parallel operations to fill all elements in the vector registers. That way, we lose a lot of performance. We could consider **widening** the operations in the `VectorTransformGraph`, so that we can indeed make use of the whole vector. 3. We could even create a `VectorTransformGraph` from a **single iteration loop**, and try to **widen** the instructions there. If this succeeds we do not have to unroll before vectorizing. This is essentially a traditional loop vectorizer. Except that we can also run the SuperWord algorith over it first to see if we have already any parallelism in the single iteration loop. And then widen that. This makes it a hybrid vectorizer. Not having to unroll means direct time savings, but also that we could vectorize larger loops in the first place, since we would not hit the node limit for unrolling. 4. Later, we can also incorporate **if-conversion** into this approach. Let the previous points all allow packing/widening control flow. Now, we do if-conversion: either flatten the CFG with the use of `VectorMaskCmp` and `VectorBlend`, or if the branch is highly likely to take one side for all vector elements, we can also use `test_all_zeros` / `test_all_ones` to still branch. Maybe there are even more vectorization approaches that could fit into this `VectorTransformGraph` scheme. The advantage is that it is modular, and we do not affect the C2-graph until we have decided on the best vectorization option via a cost-model. One item I have to spend more time learning and integrating into this plan is `PostLoopMultiversioning`. It seems to use the widening approach. Maybe we can just extend the widening to a vector-masked version. --------- **Example of large loop that is not vectorized** We have a limit of about 50 or 60 nodes for unrolling (`LoopUnrollLimit`). Only vectorizes if we raise the limit. Vectorizing before unrolling could help here. Or partially unroll, SuperWord, and widen more. java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:LoopUnrollLimit=1000 Test.java class Test { static final int RANGE = 1024*2; static final int ITER = 10_000; static void init(int[] data) { for (int i = 0; i < RANGE; i++) { data[i] = i + 1; } } static void test(int[] a, int[] b) { for (int i = 10; i < RANGE-10; i++) { int aa = a[i]; aa = aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa; aa = aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa; aa = aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa; aa = aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa * aa; b[i] = aa; } } public static void main(String[] args) { int[] a = new int[RANGE]; int[] b = new int[RANGE]; init(a); init(b); for (int i = 0; i < ITER; i++) { test(a, b); } } } --------- **Re-reviewing TestPickLastMemoryState.java** We once had some "collateral damage" in `TestPickLastMemoryState.java`, where we had to accept some cases that would not vectorize anymore to ensure correctness in all other cases (https://github.com/openjdk/jdk/pull/12350#issuecomment-1469539789). Let's re-asses how many of them now vectorize: - `f` has a cyclic dependency in the graph (because we do not know that `a != b`): class Test { static final int RANGE = 1024; static final int ITER = 10_000; static void init(int[] data) { for (int i = 0; i < RANGE; i++) { data[i] = i + 1; } } static void test(int[] a, int[] b) { for (int i = 10; i < RANGE-10; i++) { a[i] = b[i - 1]--; // store a[i] -> load b[i] b[i]--; // store b[i] must happend before load b[i - 1] of next iteration } } public static void main(String[] args) { int[] a = new int[RANGE]; int[] b = new int[RANGE]; init(a); init(b); for (int i = 0; i < ITER; i++) { test(a, b); } } } Run it with either: java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=Option,Test::test,Vectorize -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose Test.java java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+TraceNewVectors -XX:+TraceSuperWord -XX:+Verbose Test.java In either case, we do not vectorize. Actually, we already do not create the pair packs for any memops except the `b[i-1]` store, since they detect that we do not have `independent(s1,s2)` for adjacent memop pairs. If we change the loop to: static void test(int[] a, int[] b) { for (int i = 10; i < RANGE-10; i++) { a[i] = b[i - 2]--; // store a[i] -> load b[i] b[i]--; // store b[i] must happend before load b[i - 1] of next iteration } } Now we do not detect the dependence at distance 1, but only later when we check for dependence at further distances. We see lots of warnings because of pack removal `WARNING: Found dependency at distance greater than 1.`. Without the `Vectorize` flag we somehow still manage to vectorize a vector with 2 elements, but that is hardly a success as my machine would allow packing `16` ints in a `512` bit register. That just seems to be an artefact that at distance 1 we do not have dependence. It is not very interesting to add IR verification for that kind of vectorization. - `test1-6` are also relatively complex, and have cyclic dependencies of different kinds. I think we should just keep them as correctness tests for correct results, but not extend them to IR verification tests. ------------- Commit messages: - Merge branch 'master' into JDK-8308606 - remove some outdated comments - Benchmark VectorAlignment - Merge branch 'master' into JDK-8308606 - remove dead code and add offset printing - fix typo - 8308606: C2 SuperWord: remove alignment checks where not required Changes: https://git.openjdk.org/jdk/pull/14096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308606 Stats: 434 lines in 5 files changed: 267 ins; 75 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From roland at openjdk.org Tue May 30 07:24:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 May 2023 07:24:55 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. So which of the 3 predicates will get cleaned up? Are some of the 3 predicates for another loop that doesn't exist anymore? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1567900789 From emanuel.peter at oracle.com Tue May 30 07:45:23 2023 From: emanuel.peter at oracle.com (Emanuel Peter) Date: Tue, 30 May 2023 07:45:23 +0000 Subject: AW: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: Hi Pengfei, great to hear that you are spending time on SuperWord / the auto-vectorization in HotSpot. I agree with your assessment that currently SuperWord is unnecessarily convoluted and has a good bit of legacy code. It would be nice if we could make the code more modular and extensible for future improvements. Is there a chance that we could see the draft already? I am also thinking about extending SuperWord in the future. I am currently trying to clean up as much dead code and bugs as possible to clear the way. I have to see how much time I get to spend on extensions. Here you can find some of my ideas (towards the end of the PR description): https://github.com/openjdk/jdk/pull/14096 It would be good to coordinate a bit so that we can ensure our plans fit together. Best regards, Emanuel ________________________________ Von: Pengfei Li Gesendet: Montag, 29. Mai 2023 05:12 An: hotspot-compiler-dev at openjdk.java.net Cc: epeter at openjdk.org ; Bhateja, Jatin ; nd Betreff: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization Hi, I'm writing to let you know that I just filed "JDK-8308994: C2: Re-implement experimental post loop vectorization". [BACKGROUND] Current post loop vectorization in the C2 compiler has a long history. It was firstly implemented in JDK-8153998 in 2016 as an experimental feature to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, I took over JDK-8183390 to fix and re-enable this feature. Several issues were fixed and AArch64 SVE vector mask support was added in the meanwhile. We (Arm) proposed to make post loop vectorization non-experimental in future JDK releases. So early in this year (2023), we did a lot of tests on this but found more problems inside. [PROBLEMS] Problems include stability, maintainability and performance. 1) Stability issues Multiple C2 crash or mis-compilation issues were filed on JBS, including JDK-8301657, JDK-8301904, JDK-8301944, JDK-8304774, JDK-8308949 and perhaps more. 2) Maintainability issue The original implementation was based on multi-versioned post loops and the logic was mixed in SuperWord. But the algorithm for post loop vectorization is actually *not* SLP. As more and more new features were added in SuperWord, legacy code for post loop vectorization is becoming more and more difficult to maintain. 3) Performance issue Post loop vectorization was expected to bring performance improvement for small-iteration vectorizable loops. But JMH tests show it doesn't. A main reason is that the vector masked post loop is skipped (not executed) if the loop trip count is small due to zero-trip guard of the main loop. That's a major defect of current multi-versioning framework. (See JDK-8307084 for more details.) [ACTIONS] For better stability, maintainability and performance, we now propose to deprecate current multi-versioning framework and completely re-implement the experimental post loop vectorization, for both x86 AVX-512 and AArch64 SVE. Our new proposal is to add a standalone ideal loop phase (outside SuperWord) to do vector mask transformation directly on the original scalar post loop. We have been working on this internally for a while. So far we have finished a draft patch. I will push the patch for review soon after it passes all tests and becomes polished enough. -- Thanks, Pengfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From epeter at openjdk.org Tue May 30 07:54:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 May 2023 07:54:10 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) Message-ID: I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. I added the code above the assert, the comments explain why: https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 Here the graph just before the assert: ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. Testing up to tier6 and stress testing. TODO ------------- Commit messages: - 8308749: failed: regular loops only (counted loop inside infinite loop Changes: https://git.openjdk.org/jdk/pull/14178/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14178&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308749 Stats: 114 lines in 3 files changed: 114 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14178.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14178/head:pull/14178 PR: https://git.openjdk.org/jdk/pull/14178 From epeter at openjdk.org Tue May 30 08:01:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 May 2023 08:01:57 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: <_Pm4QfF6KyuXOqiFqcxuHuBwdrY6RZggOIC7GlSNkRs=.20e78329-a7b0-4b2d-a8a5-2c9afc63067a@github.com> Message-ID: On Thu, 25 May 2023 17:27:50 GMT, Sandhya Viswanathan wrote: >> @sviswa7 Thanks for taking care of this. Looks good, but let me run testing at commit 4. I will report back. > > Thanks a lot @eme64. @sviswa7 testing to tier5 and stress testing looks good. Out of curiosity: do you have a benchmark that shows a speedup with this change? Would be nice to add it. Maybe we could start with a benchmark from https://git.openjdk.org/jdk/pull/13056 and add some more compute-instructions to outweigh the latency of the reduction? Not sure if that is very easy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1567951783 From chagedorn at openjdk.org Tue May 30 08:19:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 08:19:58 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Tue, 30 May 2023 07:22:01 GMT, Roland Westrelin wrote: > > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. > > So which of the 3 predicates will get cleaned up? Are some of the 3 predicates for another loop that doesn't exist anymore? Sorry, the explanation of the bug was not precise enough - there is a missing piece: It should have cleaned up `84 Parse Predicate` and `71 Parse Predicate` in `eliminate_useless_predicates()` but it does not because we also use the broken `ParsePredicates` class to collect them: https://github.com/openjdk/jdk/blob/78aac241b8a3f29111e2901e8b7fbadd502a31a9/src/hotspot/share/opto/loopnode.cpp#L4072-L4093 So, we keep `116 Parse Predicate` and `84 Parse Predicate` as useful predicates while `71 Parse Predicate` is removed before Loop Predication. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1567978589 From roland at openjdk.org Tue May 30 08:30:54 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 May 2023 08:30:54 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Tue, 30 May 2023 08:17:11 GMT, Christian Hagedorn wrote: > The useless predicates were added normally for the loop while the additional `116 Parse Predicate` was added with `maybe_add_predicate_after_if()` (the profiled loop predicate was skipped due to `too_may_trap(reason)` being true). Do I read that correctly that parsing inserts useless/duplicate predicates? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1567994536 From chagedorn at openjdk.org Tue May 30 08:34:54 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 08:34:54 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian After the first IGVN round, after parsing, they become useless because the `If` that triggered the insertion of the predicates with `maybe_add_predicate_after_if()` is folded away. I think it's unfortunate, though, that we eliminate `71 Parse Predicate` and `84 Parse Predicate` instead of `116 Parse Predicate` (so, we cannot add profiled loop predicates anymore). But that might be an edge case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1568000839 From roland at openjdk.org Tue May 30 08:38:00 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 May 2023 08:38:00 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Looks good to me. Thanks for the explanation. Can a test case be added? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14196#pullrequestreview-1450405388 PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1568005183 From aph at openjdk.org Tue May 30 08:45:56 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 30 May 2023 08:45:56 GMT Subject: RFR: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file In-Reply-To: References: Message-ID: On Wed, 24 May 2023 02:16:16 GMT, Fei Gao wrote: > If a match rule belongs to one of the following situations, we can remove extra `UseSVE` Predicate: > > 1. If any src operand type is `pReg`, which is SVE specific, we can remove `Predicate(UseSVE > 0)`. But if only dst operand type is `pReg`, we can't remove `Predicate(UseSVE > 0)`, since the DFA of matcher selects by src operands and instruction cost, not involving dst operand. > > 2. If matcher can use src operand type, i.e., `pReg` or `vReg`, to distinguish sve from neon, we can remove > `Predicate(UseSVE == 0)` for rules on neon. > > 3. When the condition in `Predicate()` is false on current platform, it's definitely impossible to generate the corresponding node pattern from C2. Then we can remove `Predicate()`, like removing `predicate(UseSVE > 0)` for all `PopulateIndex` rules. > > After the patch, the code size of libjvm.so decreased from 25.42M to 25.39M, by 25.3K. > > Testing: > No new failures found on tier 1 - 3. > No significant performance regression compared with master. src/hotspot/cpu/aarch64/aarch64_vector.ad line 450: > 448: > 449: instruct loadV_masked(vReg dst, vmemA mem, pRegGov pg) %{ > 450: predicate(UseSVE > 0); How about // This predicate is unneeded because only SVE has pRegs. // predicate(UseSVE > 0); I still don't like it much, but at least it's clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14112#discussion_r1209938305 From sgehwolf at openjdk.org Tue May 30 08:46:55 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 30 May 2023 08:46:55 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v2] In-Reply-To: References: Message-ID: <3HqZ6E-ix4NKyN1n08HHUb7Qt5DeHlkW-h_c1yNAjec=.353fd10b-8e7f-47c5-8281-3d8c18453dc2@github.com> On Fri, 26 May 2023 23:34:26 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - fix assertion > - new fix with bailout for "if iv is_RangeCheck()) failed: can only be IfNode because RangeCheckNodes always have trap on false projection # # JRE version: OpenJDK Runtime Environment (21.0) (fastdebug build 21-internal-chhagedorn-229583b7613867127e42baca158773bcf9c08c73) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 21-internal-chhagedorn-229583b7613867127e42baca158773bcf9c08c73, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x1298438] IdealLoopTree::is_range_check_if(IfProjNode*, PhaseIdealLoop*, BasicType, Node*, Node*&, Node*&, long&) const+0x278 # # CreateCoredumpOnCrash turned off, no core file dumped # # An error report file with more information is saved as: # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler/scratch/hs_err_pid32813.log # # Compiler replay data is saved as: # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler/scratch/replay_pid32813.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # result: Error. Agent communication error: java.io.EOFException; check console log for any additional details which seems related? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1568019076 From chagedorn at openjdk.org Tue May 30 08:58:56 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 08:58:56 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v2] In-Reply-To: <3HqZ6E-ix4NKyN1n08HHUb7Qt5DeHlkW-h_c1yNAjec=.353fd10b-8e7f-47c5-8281-3d8c18453dc2@github.com> References: <3HqZ6E-ix4NKyN1n08HHUb7Qt5DeHlkW-h_c1yNAjec=.353fd10b-8e7f-47c5-8281-3d8c18453dc2@github.com> Message-ID: On Tue, 30 May 2023 08:44:23 GMT, Severin Gehwolf wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix assertion >> - new fix with bailout for "if iv > This fails the `[compiler/loopopts/TestSkeletonPredicateNegation` test with: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/runner/work/jdk/jdk/src/hotspot/share/opto/loopPredicate.cpp:848), pid=32813, tid=32828 > # assert(!iff->is_RangeCheck()) failed: can only be IfNode because RangeCheckNodes always have trap on false projection > # > # JRE version: OpenJDK Runtime Environment (21.0) (fastdebug build 21-internal-chhagedorn-229583b7613867127e42baca158773bcf9c08c73) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 21-internal-chhagedorn-229583b7613867127e42baca158773bcf9c08c73, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x1298438] IdealLoopTree::is_range_check_if(IfProjNode*, PhaseIdealLoop*, BasicType, Node*, Node*&, Node*&, long&) const+0x278 > # > # CreateCoredumpOnCrash turned off, no core file dumped > # > # An error report file with more information is saved as: > # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler/scratch/hs_err_pid32813.log > # > # Compiler replay data is saved as: > # /home/runner/work/jdk/jdk/build/run-test-prebuilt/test-support/jtreg_test_hotspot_jtreg_tier1_compiler/scratch/replay_pid32813.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > result: Error. Agent communication error: java.io.EOFException; check console log for any additional details > > > which seems related? Thanks @jerboaa for reporting that. I've seen that in my testing over the weekend as well. I'm looking into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1568040332 From epeter at openjdk.org Tue May 30 09:19:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 May 2023 09:19:55 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: Message-ID: On Tue, 23 May 2023 22:35:04 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > change to superword_max_vector_size Thanks for the fix! Looks good. ------------- Marked as reviewed by epeter (Committer). PR Review: https://git.openjdk.org/jdk/pull/14065#pullrequestreview-1450493207 From Pengfei.Li at arm.com Tue May 30 09:35:19 2023 From: Pengfei.Li at arm.com (Pengfei Li) Date: Tue, 30 May 2023 09:35:19 +0000 Subject: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: Hi Emanuel, Thanks for your great work of refactoring and improving SuperWord. I have seen that you have already had good cooperation with Fei Gao (she is sitting next to me and involved in this task as well) in recent patches. I?m currently doing code cleanups and adding necessary comments on the draft patch. I don?t think it will take too much time. So, I tend to push the patch for review a bit later so people can review the code more easily. In general, our patch refactors post loop related logic out from superword.[cpp|hpp]. It won?t have too much conflict with your on-going SuperWord improvements. We will keep you informed once the patch is ready. -- Thanks, Pengfei From: Emanuel Peter Date: Tuesday, May 30, 2023 at 15:45 To: Pengfei Li , hotspot-compiler-dev at openjdk.java.net Cc: epeter at openjdk.org , Bhateja, Jatin , nd Subject: AW: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization Hi Pengfei, great to hear that you are spending time on SuperWord / the auto-vectorization in HotSpot. I agree with your assessment that currently SuperWord is unnecessarily convoluted and has a good bit of legacy code. It would be nice if we could make the code more modular and extensible for future improvements. Is there a chance that we could see the draft already? I am also thinking about extending SuperWord in the future. I am currently trying to clean up as much dead code and bugs as possible to clear the way. I have to see how much time I get to spend on extensions. Here you can find some of my ideas (towards the end of the PR description): https://github.com/openjdk/jdk/pull/14096 It would be good to coordinate a bit so that we can ensure our plans fit together. Best regards, Emanuel ________________________________ Von: Pengfei Li Gesendet: Montag, 29. Mai 2023 05:12 An: hotspot-compiler-dev at openjdk.java.net Cc: epeter at openjdk.org ; Bhateja, Jatin ; nd Betreff: [Heads-up] JDK-8308994: C2: Re-implement experimental post loop vectorization Hi, I'm writing to let you know that I just filed "JDK-8308994: C2: Re-implement experimental post loop vectorization". [BACKGROUND] Current post loop vectorization in the C2 compiler has a long history. It was firstly implemented in JDK-8153998 in 2016 as an experimental feature to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, I took over JDK-8183390 to fix and re-enable this feature. Several issues were fixed and AArch64 SVE vector mask support was added in the meanwhile. We (Arm) proposed to make post loop vectorization non-experimental in future JDK releases. So early in this year (2023), we did a lot of tests on this but found more problems inside. [PROBLEMS] Problems include stability, maintainability and performance. 1) Stability issues Multiple C2 crash or mis-compilation issues were filed on JBS, including JDK-8301657, JDK-8301904, JDK-8301944, JDK-8304774, JDK-8308949 and perhaps more. 2) Maintainability issue The original implementation was based on multi-versioned post loops and the logic was mixed in SuperWord. But the algorithm for post loop vectorization is actually *not* SLP. As more and more new features were added in SuperWord, legacy code for post loop vectorization is becoming more and more difficult to maintain. 3) Performance issue Post loop vectorization was expected to bring performance improvement for small-iteration vectorizable loops. But JMH tests show it doesn't. A main reason is that the vector masked post loop is skipped (not executed) if the loop trip count is small due to zero-trip guard of the main loop. That's a major defect of current multi-versioning framework. (See JDK-8307084 for more details.) [ACTIONS] For better stability, maintainability and performance, we now propose to deprecate current multi-versioning framework and completely re-implement the experimental post loop vectorization, for both x86 AVX-512 and AArch64 SVE. Our new proposal is to add a standalone ideal loop phase (outside SuperWord) to do vector mask transformation directly on the original scalar post loop. We have been working on this internally for a while. So far we have finished a draft patch. I will push the patch for review soon after it passes all tests and becomes polished enough. -- Thanks, Pengfei -------------- next part -------------- An HTML attachment was scrubbed... URL: From chagedorn at openjdk.org Tue May 30 10:30:34 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 10:30:34 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v3] In-Reply-To: References: Message-ID: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: remove negation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14156/files - new: https://git.openjdk.org/jdk/pull/14156/files/229583b7..48ee1e40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14156&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14156&range=01-02 Stats: 82 lines in 4 files changed: 37 ins; 13 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/14156.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14156/head:pull/14156 PR: https://git.openjdk.org/jdk/pull/14156 From chagedorn at openjdk.org Tue May 30 10:30:34 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 10:30:34 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v2] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 23:34:26 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - fix assertion > - new fix with bailout for "if iv References: Message-ID: On Tue, 30 May 2023 10:30:34 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove negation Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14156#pullrequestreview-1450727768 From yzhu at openjdk.org Tue May 30 12:23:23 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Tue, 30 May 2023 12:23:23 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules Message-ID: Merge vector instructs with similar match rules in riscv_v.ad. Tier 1~3 passed on QEMU with RVV supported. ------------- Commit messages: - merge vector instructs with same match rule Changes: https://git.openjdk.org/jdk/pull/14214/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14214&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303417 Stats: 504 lines in 1 file changed: 33 ins; 379 del; 92 mod Patch: https://git.openjdk.org/jdk/pull/14214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14214/head:pull/14214 PR: https://git.openjdk.org/jdk/pull/14214 From duke at openjdk.org Tue May 30 12:49:11 2023 From: duke at openjdk.org (Chang Peng) Date: Tue, 30 May 2023 12:49:11 GMT Subject: Integrated: 8307795: AArch64: Optimize VectorMask.truecount() on Neon In-Reply-To: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: On Mon, 15 May 2023 02:58:46 GMT, Chang Peng wrote: > In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. > > For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. > > However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. > > This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. > > For example, > > > var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); > m.not().trueCount(); > > > will produce following assembly on a Neon machine before this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > xtn v16.4h, v16.4s > xtn v16.8b, v16.8h > neg v16.8b, v16.8b // VectorStoreMask > addv b17, v16.8b > umov w0, v17.b[0] // VectorMask.trueCount() > ... > > After this patch: > > > ... > mvn v16.16b, v16.16b // VectorMask.not() > addv s17, v16.4s > smov x0, v17.b[0] > neg x0, x0 // Optimized VectorMask.trueCount() > ... > > > In this case, we can save two xtn insns. > > Performance: > > Benchmark Before After Unit > testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms > testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms > testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms > > [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4740 > [2]: https://github.com/openjdk/jdk/blo... This pull request has now been integrated. Changeset: f600d036 Author: changpeng1997 Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/f600d0369a1f9ac78e62a328be4bbb598ffef62b Stats: 235 lines in 5 files changed: 235 ins; 0 del; 0 mod 8307795: AArch64: Optimize VectorMask.truecount() on Neon Reviewed-by: aph, eliu ------------- PR: https://git.openjdk.org/jdk/pull/13974 From tonyp at openjdk.org Tue May 30 13:12:09 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Tue, 30 May 2023 13:12:09 GMT Subject: RFR: 8308977: gtest:codestrings fails on riscv In-Reply-To: References: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> Message-ID: <5Te2kDZ_R8j1rbrqSQg9ZLUwF2ivqCuQ2fevX0SxaPY=.0c358441-b533-4d0b-bcce-7d3980a35024@github.com> On Mon, 29 May 2023 01:29:57 GMT, Fei Yang wrote: > Looks good to me. I missed this failure as I forgot to prepare a hsdis-riscv64.so when running the gtest. Thanks. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14189#issuecomment-1568404509 From tonyp at openjdk.org Tue May 30 13:12:12 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Tue, 30 May 2023 13:12:12 GMT Subject: Integrated: 8308977: gtest:codestrings fails on riscv In-Reply-To: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> References: <_BzlsfiIMHWW0xgsuJ33fBcBDe0J_bjW9J0L_hOcg_w=.d034bd2e-ad75-446c-b0bb-39ff1f421e01@github.com> Message-ID: On Fri, 26 May 2023 22:30:11 GMT, Antonios Printezis wrote: > 8308977: gtest:codestrings fails on riscv This pull request has now been integrated. Changeset: 45262822 Author: Antonios Printezis URL: https://git.openjdk.org/jdk/commit/4526282266c5dc6c040c090ef4f3ce791a8c190d Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8308977: gtest:codestrings fails on riscv Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/14189 From jiefu at openjdk.org Tue May 30 13:40:26 2023 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 30 May 2023 13:40:26 GMT Subject: RFR: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java Message-ID: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> Just remove the `static` to fix the build failure. Thanks. ------------- Commit messages: - 8309110: Build failure after JDK-8307795 due to warnings in micro-bechmark StoreMaskTrueCount.java Changes: https://git.openjdk.org/jdk/pull/14218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309110 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14218/head:pull/14218 PR: https://git.openjdk.org/jdk/pull/14218 From thartmann at openjdk.org Tue May 30 13:41:21 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 May 2023 13:41:21 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: On Mon, 29 May 2023 02:20:07 GMT, Chang Peng wrote: >> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. >> >> For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. >> >> However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. >> >> This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. >> >> For example, >> >> >> var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); >> m.not().trueCount(); >> >> >> will produce following assembly on a Neon machine before this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> xtn v16.4h, v16.4s >> xtn v16.8b, v16.8h >> neg v16.8b, v16.8b // VectorStoreMask >> addv b17, v16.8b >> umov w0, v17.b[0] // VectorMask.trueCount() >> ... >> >> After this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> addv s17, v16.4s >> smov x0, v17.b[0] >> neg x0, x0 // Optimized VectorMask.trueCount() >> ... >> >> >> In this case, we can save two xtn insns. >> >> Performance: >> >> Benchmark Before After Unit >> testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms >> testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms >> testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms >> >> [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.... > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update aarch64_vector.ad This change broke the builds: [JDK-8309110](https://bugs.openjdk.org/browse/JDK-8309110). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13974#issuecomment-1568452973 From thartmann at openjdk.org Tue May 30 13:46:09 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 May 2023 13:46:09 GMT Subject: RFR: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java In-Reply-To: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> References: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> Message-ID: On Tue, 30 May 2023 13:31:16 GMT, Jie Fu wrote: > Just remove the `static` to fix the build failure. > Thanks. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14218#pullrequestreview-1450983424 From jiefu at openjdk.org Tue May 30 13:46:11 2023 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 30 May 2023 13:46:11 GMT Subject: RFR: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java In-Reply-To: References: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> Message-ID: On Tue, 30 May 2023 13:40:42 GMT, Tobias Hartmann wrote: > Looks good and trivial. Thanks @TobiHartmann . ------------- PR Comment: https://git.openjdk.org/jdk/pull/14218#issuecomment-1568458529 From jiefu at openjdk.org Tue May 30 13:46:12 2023 From: jiefu at openjdk.org (Jie Fu) Date: Tue, 30 May 2023 13:46:12 GMT Subject: Integrated: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java In-Reply-To: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> References: <0i0eIxqD5VUaxhQZMrvWAD1yko7iITRyhzTgdCXS6xA=.f8f176c2-6ba6-4dfb-b957-08c1e75cbaf7@github.com> Message-ID: On Tue, 30 May 2023 13:31:16 GMT, Jie Fu wrote: > Just remove the `static` to fix the build failure. > Thanks. This pull request has now been integrated. Changeset: 15e02853 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/15e028530ad6408693e9f21fb94daa705b951897 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14218 From jkarthikeyan at openjdk.org Tue May 30 14:02:04 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 30 May 2023 14:02:04 GMT Subject: RFR: 8051725: Improve expansion of Conv2B nodes in the middle-end [v9] In-Reply-To: References: Message-ID: <7sxbpiBMJv6GHkOB3fuhOcuIzcbz1zjL3RvtFJL0e9g=.efe23da4-4714-4e8e-a74d-3fe95b02a9ac@github.com> On Fri, 26 May 2023 05:49:15 GMT, Jasmine Karthikeyan wrote: >> Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: >> >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% >> Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% >> Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) >> Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% >> Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% >> >> Reviews would be greatly appreciated! >> >> Testing: tier1-2 on linux x64, GHA > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'master' into conv2b-x86-lowering > - Fix assertion from not checking int type > - Cleanup from code review > - Changes from code review > - Merge branch 'master' into conv2b-x86-lowering > - Whitespace tweak > - Make transform conditional > - Remove Conv2B from backend as it's macro expanded now > - Re-work transform to happen in macro expansion > - Fix whitespace and add bug tag to IR test > - ... and 5 more: https://git.openjdk.org/jdk/compare/31683722...65e841f3 Thanks a lot for testing, and thanks all for reviews and feedback with this change! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13345#issuecomment-1568486876 From jkarthikeyan at openjdk.org Tue May 30 14:14:16 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 30 May 2023 14:14:16 GMT Subject: Integrated: 8051725: Improve expansion of Conv2B nodes in the middle-end In-Reply-To: References: Message-ID: On Wed, 5 Apr 2023 04:52:14 GMT, Jasmine Karthikeyan wrote: > Hi, I've created optimizations for the expansion of `Conv2B` nodes, especially when followed immediately by an xor of 1. This pattern is fairly common, and can arise from both [cmov idealization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/movenode.cpp#L241) and [diamond-phi optimization](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/cfgnode.cpp#L1571). This change replaces `Conv2B` nodes in the middle-end during post loop opts IGVN with conditional moves on supported platforms (x86_64, aarch64, arm32), allowing the bit flip with `xor` to be subsumed with an inversion of the comparison instead. This change also reduces the overhead of the matcher in the backends, as fewer rules need to be traversed in order to match an ideal node. Performance results from my (Zen 2) machine: > > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > Conv2BRules.testEquals0 avgt 10 47.566 ? 0.346 ns/op / 34.130 ? 0.177 ns/op + 28.2% > Conv2BRules.testNotEquals0 avgt 10 37.167 ? 0.211 ns/op / 34.185 ? 0.258 ns/op + 8.0% > Conv2BRules.testEquals1 avgt 10 35.059 ? 0.280 ns/op / 34.847 ? 0.160 ns/op (unchanged) > Conv2BRules.testEqualsNull avgt 10 56.768 ? 2.600 ns/op / 34.330 ? 0.625 ns/op + 39.5% > Conv2BRules.testNotEqualsNull avgt 10 47.447 ? 1.193 ns/op / 34.142 ? 0.303 ns/op + 28.0% > > Reviews would be greatly appreciated! > > Testing: tier1-2 on linux x64, GHA This pull request has now been integrated. Changeset: fb0b1f0c Author: Jasmine Karthikeyan Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/fb0b1f0c23403020969c968bb916d3cb2df3301a Stats: 410 lines in 13 files changed: 258 ins; 133 del; 19 mod 8051725: Improve expansion of Conv2B nodes in the middle-end Reviewed-by: thartmann, qamai, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/13345 From chagedorn at openjdk.org Tue May 30 14:43:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 May 2023 14:43:08 GMT Subject: RFR: 8307683: Loop Predication wrongly hoists IfNodes without a range check pattern as range check [v3] In-Reply-To: References: Message-ID: <1wTYIPBe-JYK07oC8eq9ZtBfrxcRIeVeHs8jUtrUvIE=.97ce0995-985e-4645-8161-caefb2af5a3f@github.com> On Tue, 30 May 2023 10:30:34 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove negation Thanks Roland for re-reviewing it again and the offline discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1568555772 From xuelei at openjdk.org Tue May 30 17:30:58 2023 From: xuelei at openjdk.org (Xue-Lei Andrew Fan) Date: Tue, 30 May 2023 17:30:58 GMT Subject: RFR: 8308071: [REDO] update for deprecated sprintf for src/utils [v3] In-Reply-To: <24zfRJ2Ir1egB-U5XJd37qJZUliAqAXkKIaHqE8gG-8=.3e0fc1f7-5eae-4fc9-8135-42a40f917a1e@github.com> References: <0Yayi6b8NFU7LzVm-3KP8PgtsI-xkcOOzIMTEt6_vMI=.5fcad730-2f76-40eb-b6e4-2668729e1ba8@github.com> <-qvQkvH8SylX3unheSpOdsjz-mhrnyvqgxtNLKiOmGg=.41f065ea-f856-4436-88d3-8c7b8b01726d@github.com> <24zfRJ2Ir1egB-U5XJd37qJZUliAqAXkKIaHqE8gG-8=.3e0fc1f7-5eae-4fc9-8135-42a40f917a1e@github.com> Message-ID: On Thu, 18 May 2023 15:46:46 GMT, Kim Barrett wrote: >> Updated to use `int` to replace `size_t.`. Thank you for the catching. > > bufsize is size_t, so that's a comparison between signed and unsigned values, which I think some compilers > will warn about. Maybe the preceding check for negative is getting rid of that? But will that still occur in > a slowdebug build, or will the lack of optimization lead to a warning? @kimbarrett Did you have a chance to have another look? Please let me know if you prefer to the update that the returned value of snprintf() is not checked because the memory size has been checked previously. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13995#discussion_r1210598423 From duke at openjdk.org Tue May 30 19:02:42 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 30 May 2023 19:02:42 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) Message-ID: The goal is to develop faster sort routines for x68_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. ------------- Commit messages: - 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) Changes: https://git.openjdk.org/jdk/pull/14227/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309130 Stats: 2907 lines in 18 files changed: 2898 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Tue May 30 20:02:15 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 30 May 2023 20:02:15 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v2] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x68_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove libstdc++ ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/e98e5ef4..923a7cae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Tue May 30 20:10:09 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 30 May 2023 20:10:09 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v3] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x68_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort - remove libstdc++ - 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/923a7cae..6d140d5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=01-02 Stats: 2769 lines in 26 files changed: 2529 ins; 147 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From dholmes at openjdk.org Tue May 30 22:23:08 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 30 May 2023 22:23:08 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4] In-Reply-To: References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> Message-ID: <0qKKe4tsPcFoKhIyryX9PZTbdLdv1kCpmCkQ8r9Zrpc=.904a042e-dd3f-4cd9-916e-69b56447c2ab@github.com> On Mon, 29 May 2023 02:20:07 GMT, Chang Peng wrote: >> In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane width aligned to its type) by VectorLoadMask [1] operation, and convert back to in-memory format by VectorStoreMask[2]. In Neon, a typical VectorStoreMask operation will first narrow given vector registers by xtn insn [3] into byte element type, and then do a vector negate to convert to 0x00/0x01 value for each element. >> >> For most of the vector mask operations, the input mask is in-register format. And a vector mask also works in-register format all through the compilation. But for some operations like VectorMask.trueCount()[4] which counts the elements of true value, the expected input mask is in-memory format. So a VectorStoreMask is generated to convert the mask from in-register format to in-memory format before those operations. >> >> However, for trueCount() these xtn instructions in VectorStoreMask can be saved, since the narrowing operations will not influence the number of active lane (value of 0x01) of its input. >> >> This patch adds an optimized rule `VectorMaskTrueCount (VectorStoreMask mask)` to save the unnecessary narrowing operations. >> >> For example, >> >> >> var m = VectorMask.fromArray(IntVector.SPECIES_PREFERRED, ba, 0); >> m.not().trueCount(); >> >> >> will produce following assembly on a Neon machine before this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> xtn v16.4h, v16.4s >> xtn v16.8b, v16.8h >> neg v16.8b, v16.8b // VectorStoreMask >> addv b17, v16.8b >> umov w0, v17.b[0] // VectorMask.trueCount() >> ... >> >> After this patch: >> >> >> ... >> mvn v16.16b, v16.16b // VectorMask.not() >> addv s17, v16.4s >> smov x0, v17.b[0] >> neg x0, x0 // Optimized VectorMask.trueCount() >> ... >> >> >> In this case, we can save two xtn insns. >> >> Performance: >> >> Benchmark Before After Unit >> testInt 723.822 ? 1.029 1182.375 ? 12.363 ops/ms >> testLong 632.154 ? 0.197 1382.74 ? 2.188 ops/ms >> testShort 788.665 ? 1.852 1152.38 ? 3.77 ops/ms >> >> [1]: https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd2f0213ab0ca3c9/src/hotspot/cpu/aarch64/aarch64_vector.... > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update aarch64_vector.ad What testing was done on this fix before integration? I don't even see Git Hub Actions being run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13974#issuecomment-1569199589 From sviswanathan at openjdk.org Wed May 31 00:47:56 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 00:47:56 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: > This PR fixes the problem with double reduction on x86_64. > > In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: > jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java > The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. > > This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. > > With this PR the vector_reduction_double node is generated. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Add jmh test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14065/files - new: https://git.openjdk.org/jdk/pull/14065/files/ba3b5dfa..1c29051c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14065&range=02-03 Stats: 19 lines in 1 file changed: 19 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14065/head:pull/14065 PR: https://git.openjdk.org/jdk/pull/14065 From sviswanathan at openjdk.org Wed May 31 00:48:59 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 00:48:59 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v3] In-Reply-To: References: <_Pm4QfF6KyuXOqiFqcxuHuBwdrY6RZggOIC7GlSNkRs=.20e78329-a7b0-4b2d-a8a5-2c9afc63067a@github.com> Message-ID: On Tue, 30 May 2023 07:59:37 GMT, Emanuel Peter wrote: >> Thanks a lot @eme64. > > @sviswa7 testing to tier5 and stress testing looks good. > > Out of curiosity: do you have a benchmark that shows a speedup with this change? Would be nice to add it. > Maybe we could start with a benchmark from https://git.openjdk.org/jdk/pull/13056 and add some more compute-instructions to outweigh the latency of the reduction? Not sure if that is very easy. Thanks a lot @eme64. There was an existing jmh benchmark for vector reduction. I have updated it to add double reduction case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1569329896 From sviswanathan at openjdk.org Wed May 31 00:57:56 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 00:57:56 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 00:47:56 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test case The performance numbers on my desktop are: Base runs, no vectorization happens with superword: Benchmark (COUNT) (seed) Mode Cnt Score Error Units VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.795 ? 0.082 ns/op VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 434.154 ? 0.042 ns/op With the PR reduction succeeds and vectorization of the loop happens when superword is enabled: Benchmark (COUNT) (seed) Mode Cnt Score Error Units VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.897 ? 0.137 ns/op VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 405.479 ? 1.896 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1569336052 From duke at openjdk.org Wed May 31 03:01:09 2023 From: duke at openjdk.org (Chang Peng) Date: Wed, 31 May 2023 03:01:09 GMT Subject: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4] In-Reply-To: <0qKKe4tsPcFoKhIyryX9PZTbdLdv1kCpmCkQ8r9Zrpc=.904a042e-dd3f-4cd9-916e-69b56447c2ab@github.com> References: <3PN4wszSZLJa16IvdtCU4c21AomxisfiSzVKDS4mvHs=.c270f6e0-95f4-4cda-b79f-4df24de1e3db@github.com> <0qKKe4tsPcFoKhIyryX9PZTbdLdv1kCpmCkQ8r9Zrpc=.904a042e-dd3f-4cd9-916e-69b56447c2ab@github.com> Message-ID: On Tue, 30 May 2023 22:20:23 GMT, David Holmes wrote: > What testing was done on this fix before integration? I don't even see Git Hub Actions being run. @dholmes-ora I did see earlier that Github Action ran (In the 'Checks' tab) and finished, and I believed the Windows failure is not related to my patch. Perhaps GHA does not cover the jmh build. I ran the full jtreg tests in our internal ci with my first draft patch, but I realized that I forgot to have another run of full jtreg test after updating my patch to the current version. I only ran my new jtreg test before integration. Lesson learnt! Thanks @DamonFool for the quick fix. I am working on JDK-8309129. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13974#issuecomment-1569424140 From fgao at openjdk.org Wed May 31 04:16:06 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 31 May 2023 04:16:06 GMT Subject: RFR: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file In-Reply-To: References: Message-ID: On Tue, 30 May 2023 08:43:08 GMT, Andrew Haley wrote: > How about > > ``` > // This predicate is unneeded because only SVE has pRegs. > // predicate(UseSVE > 0); > ``` > > I still don't like it much, but at least it's clear. Hi @theRealAph, thanks for your kind suggestion. I agree on you said. Compared with current simple rules, the changed rules may be not easy to follow. Since code size is not a big problem now, maintaining the status quo seems better. I'll close the PR. Thanks again for your review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14112#discussion_r1211066892 From fgao at openjdk.org Wed May 31 04:16:08 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 31 May 2023 04:16:08 GMT Subject: Withdrawn: 8308339: AArch64: Remove extra `UseSVE` Predicate in ad file In-Reply-To: References: Message-ID: <805r51CXKwfCirPR-TzmkZMo04Ys6cSKrJ_JV-Ut1T4=.39e2ac35-7ae9-490b-b7ff-54cb9350b484@github.com> On Wed, 24 May 2023 02:16:16 GMT, Fei Gao wrote: > If a match rule belongs to one of the following situations, we can remove extra `UseSVE` Predicate: > > 1. If any src operand type is `pReg`, which is SVE specific, we can remove `Predicate(UseSVE > 0)`. But if only dst operand type is `pReg`, we can't remove `Predicate(UseSVE > 0)`, since the DFA of matcher selects by src operands and instruction cost, not involving dst operand. > > 2. If matcher can use src operand type, i.e., `pReg` or `vReg`, to distinguish sve from neon, we can remove > `Predicate(UseSVE == 0)` for rules on neon. > > 3. When the condition in `Predicate()` is false on current platform, it's definitely impossible to generate the corresponding node pattern from C2. Then we can remove `Predicate()`, like removing `predicate(UseSVE > 0)` for all `PopulateIndex` rules. > > After the patch, the code size of libjvm.so decreased from 25.42M to 25.39M, by 25.3K. > > Testing: > No new failures found on tier 1 - 3. > No significant performance regression compared with master. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14112 From fyang at openjdk.org Wed May 31 06:09:54 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 31 May 2023 06:09:54 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules In-Reply-To: References: Message-ID: On Tue, 30 May 2023 12:11:43 GMT, Yanhong Zhu wrote: > Merge vector instructs with similar match rules in riscv_v.ad. > > Tier 1~3 passed on QEMU with RVV supported. Thanks for the cleanup. One minor comment. src/hotspot/cpu/riscv/riscv_v.ad line 245: > 243: ins_cost(VEC_COST); > 244: effect(TEMP tmp); > 245: format %{ "vrsub.vi $tmp, 0, $src\t#@vabs\n\t" Suggestion: `format %{ "vrsub.vi $tmp, $src, 0\t#@vabs\n\t"` ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14214#pullrequestreview-1452262058 PR Review Comment: https://git.openjdk.org/jdk/pull/14214#discussion_r1211125319 From epeter at openjdk.org Wed May 31 06:33:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 06:33:55 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 00:47:56 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test case Marked as reviewed by epeter (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14065#pullrequestreview-1452300376 From epeter at openjdk.org Wed May 31 06:33:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 06:33:57 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 00:55:22 GMT, Sandhya Viswanathan wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Add jmh test case > > The performance numbers on my desktop are: > Base runs, no vectorization happens with superword: > Benchmark (COUNT) (seed) Mode Cnt Score Error Units > VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.795 ? 0.082 ns/op > VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 434.154 ? 0.042 ns/op > > With the PR reduction succeeds and vectorization of the loop happens when superword is enabled: > Benchmark (COUNT) (seed) Mode Cnt Score Error Units > VectorReduction.NoSuperword.mulRedD 512 0 avgt 4 435.897 ? 0.137 ns/op > VectorReduction.WithSuperword.mulRedD 512 0 avgt 4 405.479 ? 1.896 ns/op @sviswa7 Thanks for adding the benchmark. The win is small, but that was to be expected given that the double reduction has to be performed in a linear order, and hence has quite a large latency. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1569572955 From jwaters at openjdk.org Wed May 31 07:08:55 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 31 May 2023 07:08:55 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Bumping :( ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1569611758 From rcastanedalo at openjdk.org Wed May 31 07:14:18 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 31 May 2023 07:14:18 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: - Merge branch 'master' into JDK-8302673 - Defer op(x, x) to constant/identity propagation early - Merge branch 'master' into JDK-8302673 - Refactor idealization and extracted Identity transformation for clarity - Make auxiliary add operand extraction function return a tuple - Randomize array values in min/max test computation - Merge branch 'master' into JDK-8302673 - Merge branch 'master' into JDK-8302673 - Refine comments - Update copyright header - ... and 12 more: https://git.openjdk.org/jdk/compare/acde5e39...a6db3cc4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13924/files - new: https://git.openjdk.org/jdk/pull/13924/files/9fd482b5..a6db3cc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=00-01 Stats: 134197 lines in 2266 files changed: 102246 ins; 15645 del; 16306 mod Patch: https://git.openjdk.org/jdk/pull/13924.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13924/head:pull/13924 PR: https://git.openjdk.org/jdk/pull/13924 From rcastanedalo at openjdk.org Wed May 31 07:16:57 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 31 May 2023 07:16:57 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 13:25:06 GMT, Emanuel Peter wrote: >> Add a comment that explains that for `top` we will bail out - for that we can check `nullptr`. >> In the other cases, we know that `n == AddI(x, int_con)`. > > You could also consider having a custom "pair" class, so that the "second-output" is more explicit. But maybe just more useful / explicit variable naming would do the trick. Maybe like `add_var` and `add_con`? Thanks, I went with the tuple return option as suggested and also simplified the semantics of the function. Hope it is clearer now! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211195949 From rcastanedalo at openjdk.org Wed May 31 07:17:00 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 31 May 2023 07:17:00 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 10:56:36 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java line 82: >> >>> 80: for (int i = 0; i < a.length; i++) { >>> 81: a[i] = -i; >>> 82: b[i] = i; >> >> That means that `a[i] * b[i] == -i*i`, and get increasingly smaller. I think it would be better if this was a bit more random, and not biased to the maximum always being at the beginning and the minimum at the end. > > Plus, we should try to cover the whole int range, or at least as much as possible. > One solution: just pick two random ints, and then add/subtract them before min/max. Thanks for the suggestion, I randomized the input values now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211194254 From rcastanedalo at openjdk.org Wed May 31 07:26:57 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 31 May 2023 07:26:57 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 17 May 2023 14:30:39 GMT, Emanuel Peter wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Nice work with the tests, it's good to have some specific IR tests there! > > I hope we can also generalize this for `MaxL/MinL` (once we do this [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513)) - I think that is now also going to be easier with your refactoring towards `MaxNode::IdealI`. @eme64 Sorry for the delay, I have addressed your feedback now! Please let me know if you find the new version more readable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1569638291 From rcastanedalo at openjdk.org Wed May 31 07:27:04 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 31 May 2023 07:27:04 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 14:15:09 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8302673 >> - Defer op(x, x) to constant/identity propagation early >> - Merge branch 'master' into JDK-8302673 >> - Refactor idealization and extracted Identity transformation for clarity >> - Make auxiliary add operand extraction function return a tuple >> - Randomize array values in min/max test computation >> - Merge branch 'master' into JDK-8302673 >> - Merge branch 'master' into JDK-8302673 >> - Refine comments >> - Update copyright header >> - ... and 12 more: https://git.openjdk.org/jdk/compare/93a4b339...a6db3cc4 > > src/hotspot/share/opto/addnode.cpp line 1132: > >> 1130: // four possible permutations given by opcode's commutativity) into >> 1131: // opcode(x + opcode(x_off, y_off), z), where opcode is either MinI or MaxI, >> 1132: // if x == y and the additions can't overflow. > > Ok, effectively we have 5, not just 4 cases here: > > opcode(x + x_off, opcode(y + y_off, z)) > opcode(x + x_off, opcode(z, y + y_off)) > opcode(opcode(y + y_off, z), x + x_off) > opcode(opcode(z, y + y_off), x + x_off) > opcode(x + x_off, y + y_off) > > > I find the nested for-loop quite confusing. Maybe packing the inner stuff into a separate function could work? > > > // Check for opcode(x + x_con, y + y_con), no z > if (in(1)->Opcode() == Op_AddI && in(2)->Opcode() == Op_AddI) { > Node* ret = try_fold(opcode, in(1), in(2), nullptr); > if (ret != nullptr) { return ret; } > } > > // Check for these 4 cases, equivalent to opcode3(addx, addy, z) > // opcode(x + x_con, opcode(y + y_con, z)) > // opcode(x + x_con, opcode(z, y + y_con)) > // opcode(opcode(y + y_con, z), x + x_con) > // opcode(opcode(z, y + y_con), x + x_con) > for (uint i = 1; i < 2; i++) { > Node* addx = in(i); > Node* other = in(i == 1 ? 2 : 1); // or just "2-i" > if (addx->Opcode() != Op_AddI || other->Opcode() != opcode) { continue; } > for (uint j = 1; i < 2; j++) { > Node* addy = other->in(j); > Node* z = other->in(j == 1 ? 2 : 1); > if (addy->Opcode() != Op_AddI) { continue; } > // We have opcode3(addx, addy, z) > Node* ret = try_fold(opcode, addx, addy, z); > if (ret != nullptr) { return ret; } > } > } > > Where we have > > Node* try_fold(int opcode, Node* addx, Node* addy, Node* z = nullptr) { > jint addx_con = 0; > jint addy_con = 0; > Node* addx_var = as_add_constant(addx, &addx_con); > Node* addy_var = as_add_constant(addy, &addy_con); > if (addx_var == nullptr || addy_var == nullptr) { > // found a top > return nullptr; > } > // could even check addx_var != addy_var, then we don't have to do that inside... > Node* folded = extract_addition(phase, addx_var, addx_con, addy_var, addy_con, opcode); > if (z != nullptr) { > folded = opcode(folded, z); > } > return folded; > } > > > Maybe this does a few more calls to `as_add_constant` than strictly necessary, but it is a bit easier to understand, right? Thanks for the feedback, I agree the first refactor was not as clear as it should be, possibly because I derived it quite mechanically from the existing code. I have reworked `MaxNode::IdealI()` using more explicit naming, a functional and simplified version of `as_add_with_constant()`, and a clearer separation of the different optimizations that are applied. Hope it is more readable now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211204760 From davleopo at openjdk.org Wed May 31 09:06:18 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Wed, 31 May 2023 09:06:18 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal Message-ID: This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. In the past this test also failed with graal because it was checking for c1/c2 semantics. JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. This lets the test fail again for the unaligned cases because it asserts graal folds them. The fix is to actually assert mismatch on unaligned accesses. ------------- Commit messages: - 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong Changes: https://git.openjdk.org/jdk/pull/14242/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14242&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309104 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/14242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14242/head:pull/14242 PR: https://git.openjdk.org/jdk/pull/14242 From duke at openjdk.org Wed May 31 10:31:15 2023 From: duke at openjdk.org (Chang Peng) Date: Wed, 31 May 2023 10:31:15 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 Message-ID: This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. TEST passed on AArch64: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- ------------- Commit messages: - 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 Changes: https://git.openjdk.org/jdk/pull/14245/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309129 Stats: 63 lines in 3 files changed: 41 ins; 6 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/14245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14245/head:pull/14245 PR: https://git.openjdk.org/jdk/pull/14245 From duke at openjdk.org Wed May 31 10:31:15 2023 From: duke at openjdk.org (Chang Peng) Date: Wed, 31 May 2023 10:31:15 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- @TobiHartmann Hi, could you please help to test and review this patch? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1569927091 From chagedorn at openjdk.org Wed May 31 10:49:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 31 May 2023 10:49:59 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v2] In-Reply-To: <3tF5Nlnp_YCBvjfj38WxdiRrYeOyfA9Wz4wJ8tm5z2o=.4719cce4-0d72-485b-9494-cd97594c42fa@github.com> References: <3tF5Nlnp_YCBvjfj38WxdiRrYeOyfA9Wz4wJ8tm5z2o=.4719cce4-0d72-485b-9494-cd97594c42fa@github.com> Message-ID: On Tue, 23 May 2023 09:08:20 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request incrementally with three additional commits since the last revision: > > - Update Test8211698.java > - Update src/hotspot/share/compiler/compilerOracle.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/compiler/compilerOracle.cpp > > Co-authored-by: Christian Hagedorn Update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1452864244 From thartmann at openjdk.org Wed May 31 10:51:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 10:51:57 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Sure, I submitted testing and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1569959391 From epeter at openjdk.org Wed May 31 10:55:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 10:55:02 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 07:14:18 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - Merge branch 'master' into JDK-8302673 > - Refactor idealization and extracted Identity transformation for clarity > - Make auxiliary add operand extraction function return a tuple > - Randomize array values in min/max test computation > - Merge branch 'master' into JDK-8302673 > - Merge branch 'master' into JDK-8302673 > - Refine comments > - Update copyright header > - ... and 12 more: https://git.openjdk.org/jdk/compare/ec8ac687...a6db3cc4 src/hotspot/share/opto/addnode.cpp line 1182: > 1180: jint inner_off = inner_add_operands.second; > 1181: // Try to extract the inner add. > 1182: Node* add_extracted = extract_add(phase, inner, inner_off, outer, outer_off); Optional: You could also leave `outer_add_operands` and `inner_add_operands` packed in the `Pair`, and pass it as such into `extract_add`. Could reduce the number of lines a bit here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211516982 From rehn at openjdk.org Wed May 31 10:55:58 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 31 May 2023 10:55:58 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules In-Reply-To: References: Message-ID: On Tue, 30 May 2023 12:11:43 GMT, Yanhong Zhu wrote: > Merge vector instructs with similar match rules in riscv_v.ad. > > Tier 1~3 passed on QEMU with RVV supported. Looks good, thanks! (minus arg order in format as pointed out) ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14214#pullrequestreview-1452874967 From epeter at openjdk.org Wed May 31 10:58:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 10:58:08 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 07:14:18 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - Merge branch 'master' into JDK-8302673 > - Refactor idealization and extracted Identity transformation for clarity > - Make auxiliary add operand extraction function return a tuple > - Randomize array values in min/max test computation > - Merge branch 'master' into JDK-8302673 > - Merge branch 'master' into JDK-8302673 > - Refine comments > - Update copyright header > - ... and 12 more: https://git.openjdk.org/jdk/compare/3ac984c4...a6db3cc4 src/hotspot/share/opto/addnode.cpp line 1192: > 1190: } else { > 1191: return new MaxINode(add_transformed, inner_other); > 1192: } Could you make use of `MaxNode::build_min_max`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211520962 From epeter at openjdk.org Wed May 31 11:10:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 11:10:57 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 31 May 2023 07:23:51 GMT, Roberto Casta?eda Lozano wrote: >> Nice work with the tests, it's good to have some specific IR tests there! >> >> I hope we can also generalize this for `MaxL/MinL` (once we do this [JDK-8307513](https://bugs.openjdk.org/browse/JDK-8307513)) - I think that is now also going to be easier with your refactoring towards `MaxNode::IdealI`. > > @eme64 Sorry for the delay, I have addressed your feedback now! Please let me know if you find the new version more readable. @robcasloz it looks much better, thanks for refactoring :) I have left a few more comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1569986804 From epeter at openjdk.org Wed May 31 11:11:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 11:11:03 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 07:14:18 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: > > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - Merge branch 'master' into JDK-8302673 > - Refactor idealization and extracted Identity transformation for clarity > - Make auxiliary add operand extraction function return a tuple > - Randomize array values in min/max test computation > - Merge branch 'master' into JDK-8302673 > - Merge branch 'master' into JDK-8302673 > - Refine comments > - Update copyright header > - ... and 12 more: https://git.openjdk.org/jdk/compare/36eaed9c...a6db3cc4 src/hotspot/share/opto/addnode.cpp line 1141: > 1139: } > 1140: return ConstAddOperands(x, c_type->is_int()->get_con()); > 1141: } This is what it was on my last review: // Return: // , if n is of the form x + C, where 'C' is a non-TOP constant; // , if n is of the form x + C, where 'C' is a TOP constant; // otherwise. static Node* constant_add_input(Node* n, jint* con) { if (n->Opcode() == Op_AddI && n->in(2)->is_Con()) { const Type* t = n->in(2)->bottom_type(); if (t == Type::TOP) { return nullptr; } *con = t->is_int()->get_con(); n = n->in(1); } return n; } Here, you used to also allow packing just a single `n`, and leave the constant as `zero`. Did you remove this possibility on purpose? Now `n` must be an `AddI`. This used to allow cases like this to be folded: `max(max(a, b), a + 1) -> max(a + max(0, 1), b)` Or am I missing something? Do you have tests for this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1211536265 From eastigeevich at openjdk.org Wed May 31 12:00:56 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 31 May 2023 12:00:56 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: <52Szrc2FBF-UQ2fFYL9YSdqQK2XwIDYEl5xdW2SOLJk=.2c108ff0-4b5f-4866-8499-b4f92dded098@github.com> On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- src/hotspot/cpu/aarch64/aarch64_vector.ad line 5528: > 5526: Assembler::SIMD_Arrangement arrangement = Assembler::esize2arrangement(esize, > 5527: /* isQ */ length_in_bytes == 16); > 5528: if (arrangement == __ T2D || arrangement == __ T2S) { I see `jdk/incubator/vector/Float64VectorTests.java` covers the case `arrangement == __ T2S`. Is there a test covering the case `arrangement == __ T2D`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14245#discussion_r1211596675 From dnsimon at openjdk.org Wed May 31 12:15:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 31 May 2023 12:15:55 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal In-Reply-To: References: Message-ID: On Wed, 31 May 2023 08:46:19 GMT, David Leopoldseder wrote: > This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. > > In the past this test also failed with graal because it was checking for c1/c2 semantics. > JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. > > However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. > This lets the test fail again for the unaligned cases because it asserts graal folds them. > > The fix is to actually assert mismatch on unaligned accesses. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14242#pullrequestreview-1453043470 From thartmann at openjdk.org Wed May 31 12:59:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 12:59:01 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v3] In-Reply-To: References: Message-ID: On Tue, 30 May 2023 10:30:34 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove negation Nice analysis. Looks good to me. As we discussed offline, let's run some sanity performance testing (we can already integrate this). src/hotspot/share/opto/loopPredicate.cpp line 870: > 868: // Example: > 869: // Loop: "for (int i = -1; i < 1000; i++)" > 870: // init = "scale*iv + offset" in first loop iteration = 1*-1 + 0 = -1 Suggestion: // init = "scale*iv + offset" in the first loop iteration = 1*-1 + 0 = -1 test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java line 143: > 141: Math.ceil(34); // Never taken and unloaded -> trap > 142: } catch (Exception e) { > 143: // False Proj of RangeCheckNod Suggestion: // False Proj of RangeCheckNode ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14156#pullrequestreview-1453133922 PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1211674413 PR Review Comment: https://git.openjdk.org/jdk/pull/14156#discussion_r1211670902 From thartmann at openjdk.org Wed May 31 13:09:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 13:09:54 GMT Subject: RFR: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit [v2] In-Reply-To: References: Message-ID: On Fri, 26 May 2023 13:57:04 GMT, Emanuel Peter wrote: >> In SuperWord::output we create a CountedLoopReserveKit, so that we can reverse edits to the loop, in case something goes wrong. As far as I understand all of these conditions should never occur, prior condition checking in SuperWord should have already verified that. We should at least add asserts so that we can catch such failures and fix them, and do not just silently bail out of SuperWord (reverse the graph to before SuperWord and continue compilation). >> >> `DoReserveCopyInSuperWord` enables `do_reserve_copy()`. It is a product flag and default true. If it is disabled, and there is such a failure we just hit a `ShouldNotReachHere()`. >> >> There was one occurance I could not assert for: `vmask = create_post_loop_vmask();`. Read more below, there is actually a but there. >> >> **Testing** >> >> TODO testing up to tier6 plus stress testing. >> (it already passed tier3 and stress testing) >> >> **Discussion** >> >> Do we really want to keep the `DoReserveCopyInSuperWord` flag (product, always true), which enables the use of `CountedLoopReserveKit`? It means that we always duplicate the loop (and the loops can be rather large because they were unrolled before SuperWord). It seems a bit of an edge case to want to bail out of SuperWord, but not of the whole compilation. >> We can later decide if it makes sense to clone the whole loop via CountedLoopReserveKit (the loops can be large!), or if we should just have a regular compilation bailout instead (could simplify the code and reduce overhead of loop cloning). >> >> Plus: it seems the checks and bailouts are very selectively applied. I don't see why we would nullptr check some "vector_opd" but not all of them. So if we decide to keep it, we should probably apply it more consistently. >> >> What do you think? >> >> ------ >> >> **Bug: bad combination of -XX:+PostLoopMultiversioning -XX:-DoReserveCopyInSuperWord** >> >> I filed it here: [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) >> >> `PostLoopMultiversioning` unrolls the post-loop with the use of a vmask. Read more about post-loop vectorization here https://github.com/openjdk/jdk/pull/6828. But in `create_post_loop_vmask` we have some conditions which have to hold, and if they fail we get a `nullptr`, and bail out of SuperWord, via `CountedLoopReserveKit`. >> >> But if we turn off `DoReserveCopyInSuperWord`, this is not cought, and we hit an assert. >> >> Generally, this looks a bit unclean, what we have now: we should do the checks of `create_post_loop_vm... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > remove rce post loop vmask assert -> it is legit there Looks good to me. Please file a follow-up RFE to investigate if we should remove `DoReserveCopyInSuperWord` completely. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14168#pullrequestreview-1453166331 From epeter at openjdk.org Wed May 31 13:17:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 13:17:57 GMT Subject: RFR: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit [v2] In-Reply-To: <83U6cThtLr0jnVlOEkgBy367XJAm2oOvHFPX5BbPxD0=.ab8c74b2-26a1-40c3-9382-c7ef7dd1db29@github.com> References: <83U6cThtLr0jnVlOEkgBy367XJAm2oOvHFPX5BbPxD0=.ab8c74b2-26a1-40c3-9382-c7ef7dd1db29@github.com> Message-ID: <0tjGtVzgPiBKhvdlsWWk7axpxt7SqaZTNkBfHfUzo5M=.4c85c0f5-e792-4a79-b5fc-7653ed816951@github.com> On Fri, 26 May 2023 15:55:27 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> remove rce post loop vmask assert -> it is legit there > > Marked as reviewed by kvn (Reviewer). @vnkozlov @TobiHartmann Thanks for the reviews! I filed the follow-up RFE https://bugs.openjdk.org/browse/JDK-8309204 It depends on [JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14168#issuecomment-1570215458 From thartmann at openjdk.org Wed May 31 13:19:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 13:19:00 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v2] In-Reply-To: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> References: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> Message-ID: On Sat, 27 May 2023 16:34:15 GMT, Daohan Qu wrote: >> This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). >> >> It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. >> >> For exmple, >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry >> >> will become >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;* invocation entry (also synchronization entry if synchronized) > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Update output again Couldn't we detect if the method is synchronized and adjust the comment accordingly? ------------- PR Review: https://git.openjdk.org/jdk/pull/14192#pullrequestreview-1453192284 From epeter at openjdk.org Wed May 31 13:21:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 May 2023 13:21:04 GMT Subject: Integrated: 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit In-Reply-To: References: Message-ID: On Fri, 26 May 2023 04:43:20 GMT, Emanuel Peter wrote: > In SuperWord::output we create a CountedLoopReserveKit, so that we can reverse edits to the loop, in case something goes wrong. As far as I understand all of these conditions should never occur, prior condition checking in SuperWord should have already verified that. We should at least add asserts so that we can catch such failures and fix them, and do not just silently bail out of SuperWord (reverse the graph to before SuperWord and continue compilation). > > `DoReserveCopyInSuperWord` enables `do_reserve_copy()`. It is a product flag and default true. If it is disabled, and there is such a failure we just hit a `ShouldNotReachHere()`. > > There was one occurance I could not assert for: `vmask = create_post_loop_vmask();`. Read more below, there is actually a but there. > > **Testing** > > TODO testing up to tier6 plus stress testing. > (it already passed tier3 and stress testing) > > **Discussion** > > Do we really want to keep the `DoReserveCopyInSuperWord` flag (product, always true), which enables the use of `CountedLoopReserveKit`? It means that we always duplicate the loop (and the loops can be rather large because they were unrolled before SuperWord). It seems a bit of an edge case to want to bail out of SuperWord, but not of the whole compilation. > We can later decide if it makes sense to clone the whole loop via CountedLoopReserveKit (the loops can be large!), or if we should just have a regular compilation bailout instead (could simplify the code and reduce overhead of loop cloning). > > Plus: it seems the checks and bailouts are very selectively applied. I don't see why we would nullptr check some "vector_opd" but not all of them. So if we decide to keep it, we should probably apply it more consistently. > > What do you think? > > ------ > > **Bug: bad combination of -XX:+PostLoopMultiversioning -XX:-DoReserveCopyInSuperWord** > > I filed it here: [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) > > `PostLoopMultiversioning` unrolls the post-loop with the use of a vmask. Read more about post-loop vectorization here https://github.com/openjdk/jdk/pull/6828. But in `create_post_loop_vmask` we have some conditions which have to hold, and if they fail we get a `nullptr`, and bail out of SuperWord, via `CountedLoopReserveKit`. > > But if we turn off `DoReserveCopyInSuperWord`, this is not cought, and we hit an assert. > > Generally, this looks a bit unclean, what we have now: we should do the checks of `create_post_loop_vmask` before `SuperWord::output`,... This pull request has now been integrated. Changeset: 25b98030 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/25b98030569d863e605f398d5f97211008c58ca3 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8308917: C2 SuperWord::output: assert before bailout with CountedLoopReserveKit Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14168 From thartmann at openjdk.org Wed May 31 13:21:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 13:21:58 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v2] In-Reply-To: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> References: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> Message-ID: On Sat, 27 May 2023 16:34:15 GMT, Daohan Qu wrote: >> This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). >> >> It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. >> >> For exmple, >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry >> >> will become >> >> 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;* invocation entry (also synchronization entry if synchronized) > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Update output again @dean-long also had some suggestions in [JDK-8201516](https://bugs.openjdk.org/browse/JDK-8201516) and might want to have a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14192#issuecomment-1570221483 From thartmann at openjdk.org Wed May 31 13:24:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 May 2023 13:24:00 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal In-Reply-To: References: Message-ID: On Wed, 31 May 2023 08:46:19 GMT, David Leopoldseder wrote: > This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. > > In the past this test also failed with graal because it was checking for c1/c2 semantics. > JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. > > However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. > This lets the test fail again for the unaligned cases because it asserts graal folds them. > > The fix is to actually assert mismatch on unaligned accesses. test/hotspot/jtreg/compiler/unsafe/UnsafeGetStableArrayElement.java line 204: > 202: > 203: // Trigger compilation, give (jar) graal enough time to warmup. > 204: for (int i = 0; i < 20_000 * 500; i++) { The number of iterations seems rather excessive. Is it really needed since we run with `-Xbatch`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14242#discussion_r1211710213 From dzhang at openjdk.org Wed May 31 15:13:55 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 31 May 2023 15:13:55 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules In-Reply-To: References: Message-ID: On Tue, 30 May 2023 12:11:43 GMT, Yanhong Zhu wrote: > Merge vector instructs with similar match rules in riscv_v.ad. > > Tier 1~3 passed on QEMU with RVV supported. LGTM, thanks! ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/14214#pullrequestreview-1453494850 From aph at openjdk.org Wed May 31 15:27:57 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 31 May 2023 15:27:57 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v3] In-Reply-To: References: Message-ID: On Tue, 30 May 2023 20:10:09 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort > - remove libstdc++ > - 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1570447615 From chagedorn at openjdk.org Wed May 31 15:55:18 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 31 May 2023 15:55:18 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v2] In-Reply-To: References: Message-ID: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - typos - add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14196/files - new: https://git.openjdk.org/jdk/pull/14196/files/806bdba5..68964d7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=00-01 Stats: 91 lines in 1 file changed: 91 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14196.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14196/head:pull/14196 PR: https://git.openjdk.org/jdk/pull/14196 From chagedorn at openjdk.org Wed May 31 15:55:19 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 31 May 2023 15:55:19 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Thanks Roland for your review! I was able to come up with a test that fails with the same graph pattern. I've pushed an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1570488711 From chagedorn at openjdk.org Wed May 31 16:04:25 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 31 May 2023 16:04:25 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v3] In-Reply-To: References: Message-ID: <7-jhMuUY2Gwwer1msBJlmu5OBzq9EvHmYsfQu4qLAMQ=.9b80a7cf-63de-4b44-8f41-797534f724a9@github.com> > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: remove line breaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14196/files - new: https://git.openjdk.org/jdk/pull/14196/files/68964d7a..13c7c6d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14196.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14196/head:pull/14196 PR: https://git.openjdk.org/jdk/pull/14196 From dnsimon at openjdk.org Wed May 31 16:13:55 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 31 May 2023 16:13:55 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal In-Reply-To: References: Message-ID: On Wed, 31 May 2023 13:21:02 GMT, Tobias Hartmann wrote: >> This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. >> >> In the past this test also failed with graal because it was checking for c1/c2 semantics. >> JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. >> >> However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. >> This lets the test fail again for the unaligned cases because it asserts graal folds them. >> >> The fix is to actually assert mismatch on unaligned accesses. > > test/hotspot/jtreg/compiler/unsafe/UnsafeGetStableArrayElement.java line 204: > >> 202: >> 203: // Trigger compilation, give (jar) graal enough time to warmup. >> 204: for (int i = 0; i < 20_000 * 500; i++) { > > The number of iterations seems rather excessive. Is it really needed since we run with `-Xbatch`? Good point. David, if this test passes on libgraal then there's no need to increase the test time (I assume that with `-Xbatch`, the `* 500` does noticeably increase the test time). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14242#discussion_r1211972529 From duke at openjdk.org Wed May 31 16:37:56 2023 From: duke at openjdk.org (Daohan Qu) Date: Wed, 31 May 2023 16:37:56 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v2] In-Reply-To: References: <6wyhvO-8OHzCY3yTorzYtIjR4Xvu2cVNX9pqvY9CZGQ=.c257dc76-7544-44b0-94ee-5c89a60c82ed@github.com> Message-ID: <_CmMZzwORIiMgCzTRkd8Sh2SigiPyVR7L8GIb15XU6M=.dd3880a5-ddfc-4ebd-8a4a-cf1b23da26d8@github.com> On Wed, 31 May 2023 13:18:45 GMT, Tobias Hartmann wrote: >> Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: >> >> Update output again > > @dean-long also had some suggestions in [JDK-8201516](https://bugs.openjdk.org/browse/JDK-8201516) and might want to have a look. Hi, @TobiHartmann thanks for your advice. It seems that this PR should do more than a simple change of code comment. I will investigate it further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14192#issuecomment-1570556656 From dcubed at openjdk.org Wed May 31 16:54:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 31 May 2023 16:54:32 GMT Subject: Integrated: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 16:48:06 GMT, Joe Darcy wrote: >> A couple of trivial ProblemListings: >> [JDK-8309230](https://bugs.openjdk.org/browse/JDK-8309230) ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 >> [JDK-8309231](https://bugs.openjdk.org/browse/JDK-8309231) ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java > > Marked as reviewed by darcy (Reviewer). @jddarcy - Thanks for the fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14250#issuecomment-1570578322 From dcubed at openjdk.org Wed May 31 16:54:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 31 May 2023 16:54:32 GMT Subject: Integrated: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 Message-ID: A couple of trivial ProblemListings: [JDK-8309230](https://bugs.openjdk.org/browse/JDK-8309230) ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 [JDK-8309231](https://bugs.openjdk.org/browse/JDK-8309231) ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java ------------- Commit messages: - 8309231: ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java - 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 Changes: https://git.openjdk.org/jdk/pull/14250/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14250&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309230 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14250.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14250/head:pull/14250 PR: https://git.openjdk.org/jdk/pull/14250 From darcy at openjdk.org Wed May 31 16:54:32 2023 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 31 May 2023 16:54:32 GMT Subject: Integrated: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 16:40:54 GMT, Daniel D. Daugherty wrote: > A couple of trivial ProblemListings: > [JDK-8309230](https://bugs.openjdk.org/browse/JDK-8309230) ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 > [JDK-8309231](https://bugs.openjdk.org/browse/JDK-8309231) ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java Marked as reviewed by darcy (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14250#pullrequestreview-1453713285 From dcubed at openjdk.org Wed May 31 16:54:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 31 May 2023 16:54:32 GMT Subject: Integrated: 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 In-Reply-To: References: Message-ID: <2OqqZel510pZtVD9yrFJWq0bp9QG6gtye9d9_gIr_gw=.de9d9ad5-004e-43c2-9b67-121a5b0c50c4@github.com> On Wed, 31 May 2023 16:40:54 GMT, Daniel D. Daugherty wrote: > A couple of trivial ProblemListings: > [JDK-8309230](https://bugs.openjdk.org/browse/JDK-8309230) ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 > [JDK-8309231](https://bugs.openjdk.org/browse/JDK-8309231) ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java This pull request has now been integrated. Changeset: 45473ef2 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/45473ef23520271954fa7196a5be588f88337aaf Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod 8309230: ProblemList jdk/incubator/vector/Float64VectorTests.java on aarch64 8309231: ProblemList vmTestbase/nsk/jvmti/scenarios/jni_interception/JI05/ji05t001/TestDescription.java Reviewed-by: darcy ------------- PR: https://git.openjdk.org/jdk/pull/14250 From davleopo at openjdk.org Wed May 31 16:59:55 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Wed, 31 May 2023 16:59:55 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal In-Reply-To: References: Message-ID: On Wed, 31 May 2023 16:11:12 GMT, Doug Simon wrote: >> test/hotspot/jtreg/compiler/unsafe/UnsafeGetStableArrayElement.java line 204: >> >>> 202: >>> 203: // Trigger compilation, give (jar) graal enough time to warmup. >>> 204: for (int i = 0; i < 20_000 * 500; i++) { >> >> The number of iterations seems rather excessive. Is it really needed since we run with `-Xbatch`? > > Good point. David, if this test passes on libgraal then there's no need to increase the test time (I assume that with `-Xbatch`, the `* 500` does noticeably increase the test time). yeah, I used it for local testing. Ill remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14242#discussion_r1212028885 From sviswanathan at openjdk.org Wed May 31 17:01:01 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 17:01:01 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 00:47:56 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test case @vnkozlov Could you also please review this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1570589226 From azvegint at openjdk.org Wed May 31 17:00:59 2023 From: azvegint at openjdk.org (Alexander Zvegintsev) Date: Wed, 31 May 2023 17:00:59 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Please note that the associated test is now in the problem list, see #14250 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1570589862 From dcubed at openjdk.org Wed May 31 17:01:00 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 31 May 2023 17:01:00 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: <0gjW5e4qh5vqDqMV6W5sqw0-BxnMvRW0HOhlSBhJ4yc=.87f8884b-3dc5-4852-a482-6f8dd0609249@github.com> On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- jdk/incubator/vector/Float64VectorTests.java has been ProblemListed. Please update this PR and remove the ProblemListing update before integrating this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1570590106 From kvn at openjdk.org Wed May 31 19:33:46 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 May 2023 19:33:46 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 00:47:56 GMT, Sandhya Viswanathan wrote: >> This PR fixes the problem with double reduction on x86_64. >> >> In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: >> jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java >> The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. >> >> This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. >> >> With this PR the vector_reduction_double node is generated. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Add jmh test case Looks good. You may regress for next case on KNL? when using `max vector size` (code is in x86.ad) which I think is fine. case Op_MinReductionV: case Op_MaxReductionV: if (UseAVX > 2 && (!VM_Version::supports_avx512dq() && size_in_bits == 512)) { return false; } ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14065#pullrequestreview-1454013018 From kvn at openjdk.org Wed May 31 20:00:05 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 May 2023 20:00:05 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Fri, 26 May 2023 13:45:23 GMT, Emanuel Peter wrote: > I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. > > I added the code above the assert, the comments explain why: > > https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 > > Here the graph just before the assert: > ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) > > `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` > `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. > `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. > > Testing up to tier6 and stress testing. TODO test/hotspot/jtreg/compiler/loopopts/TestCountedLoopInsideInfiniteLoop.jasm line 24: > 22: */ > 23: > 24: super public class TestCountedLoopInsideInfiniteLoop May be add comment why you put this into separate file and not make inner class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14178#discussion_r1212246056 From sviswanathan at openjdk.org Wed May 31 22:43:18 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 22:43:18 GMT Subject: RFR: 8300865: C2: product reduction in ProdRed_Double is not vectorized [v4] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 19:31:07 GMT, Vladimir Kozlov wrote: > Looks good. You may regress for next case on KNL? when using `max vector size` (code is in x86.ad) which I think is fine. > > ``` > case Op_MinReductionV: > case Op_MaxReductionV: > if (UseAVX > 2 && (!VM_Version::supports_avx512dq() && size_in_bits == 512)) { > return false; > } > ``` Yes, that should be ok. Thanks a lot for the review @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14065#issuecomment-1571056501 From sviswanathan at openjdk.org Wed May 31 22:43:21 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 May 2023 22:43:21 GMT Subject: Integrated: 8300865: C2: product reduction in ProdRed_Double is not vectorized In-Reply-To: References: Message-ID: On Fri, 19 May 2023 23:27:32 GMT, Sandhya Viswanathan wrote: > This PR fixes the problem with double reduction on x86_64. > > In the test compiler.loopopts.superword.ProdRed_Double, the product reduction loop in prodReductionImplement() was not getting vectorized when run as follows: > jtreg -XX:CompileCommand=PrintAssembly,compiler.loopopts.superword.ProdRed_Double::prodReductionImplement compiler/loopopts/superword/ProdRed_Double.java > The print assembly generated in the pid-xxx.log output in JTwork/scratch directory was not showing any vector_reduction_double node. > > This was happening as the ReductionNode::implemented was passed a vector size of one element. For the vector reduction implemented we need to check with at least vector size of two elements. > > With this PR the vector_reduction_double node is generated. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: f9ad7df4 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/f9ad7df4dafa0a2da38e8cbb4150049fb04f4327 Stats: 25 lines in 3 files changed: 22 ins; 2 del; 1 mod 8300865: C2: product reduction in ProdRed_Double is not vectorized Reviewed-by: fgao, epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14065