From sviswanathan at openjdk.org Thu Jun 1 01:10:16 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 1 Jun 2023 01:10:16 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: On Tue, 30 May 2023 23:31:09 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments src/hotspot/cpu/x86/assembler_x86.cpp line 3559: > 3557: > 3558: void Assembler::vmovsd(XMMRegister dst, XMMRegister src, XMMRegister src2) { > 3559: assert(UseAVX > 0, "Requires some form ov AVX"); Typo "Requires some form **of** AVX" src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 125: > 123: __ vmovsd(xmm5, xmm18, xmm20); > 124: __ movq(xmm17, rax); > 125: __ vandpd(xmm0, xmm5, xmm17, Assembler::AVX_512bit); This and others below all should be Assembler::AVX_128bit. No need for AVX_512bit here. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 134: > 132: > 133: // q = DP_DIV_RZ(a, b); > 134: __ vmovsd(xmm5, xmm18, xmm1); This and other usage of vmovsd with blending two registers could be avoided. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 164: > 162: __ mov64(rax, 0x7FEFFFFFFFFFFFFF); > 163: __ movq(Address(rsp, 0x20), rax); > 164: __ movsd(xmm2, Address(rsp, 0x20)); You could directly do: __ movsd(xmm2, ExternalAddress((address)CONST_MAX), rax); src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 171: > 169: __ mov64(rax, 0x7FF0000000000000); > 170: __ movq(Address(rsp, 0x20), rax); > 171: __ movsd(xmm2, Address(rsp, 0x20)); You could directly do: __ movsd(xmm2, ExternalAddress((address)CONST_INF), rax); src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 179: > 177: __ mov64(rax, 0x7FE0000000000000); > 178: __ movq(Address(rsp, 0x20), rax); > 179: __ movsd(xmm21, Address(rsp, 0x20)); You could directly do: __ movsd(xmm2, ExternalAddress((address)CONST_e307), rax); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1210963286 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212455393 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212458387 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457127 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457314 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212457651 From duke at openjdk.org Thu Jun 1 01:15:57 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 1 Jun 2023 01:15:57 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix license ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/6d140d5b..30a50d99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=02-03 Stats: 12 lines in 4 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Thu Jun 1 01:19:09 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 1 Jun 2023 01:19:09 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v3] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 15:25:35 GMT, Andrew Haley wrote: > What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10. Thanks for the suggestion. Please see the performance for small array sizes below: | Arrays.sort benchmark | Array Size | Baseline | AVX512 Sort | Speedup | | --- | --- | --- | --- | --- | | ArraysSort.intSort | 10 | 0.029 | 0.018 | 1.6 | | ArraysSort.intSort | 25 | 0.086 | 0.032 | 2.7 | | ArraysSort.intSort | 50 | 0.236 | 0.056 | 4.2 | | ArraysSort.intSort | 75 | 0.409 | 0.111 | 3.7 | | ArraysSort.longSort | 10 | 0.031 | 0.033 | 0.9 | | ArraysSort.longSort | 25 | 0.09 | 0.061 | 1.5 | | ArraysSort.longSort | 50 | 0.228 | 0.127 | 1.8 | | ArraysSort.longSort | 75 | 0.382 | 0.28 | 1.4 | | ArraysSort.doubleSort | 10 | 0.037 | 0.043 | 0.9 | | ArraysSort.doubleSort | 25 | 0.129 | 0.066 | 2.0 | | ArraysSort.doubleSort | 50 | 0.267 | 0.115 | 2.3 | | ArraysSort.doubleSort | 75 | 0.549 | 0.219 | 2.5 | | ArraysSort.floatSort | 10 | 0.034 | 0.034 | 1.0 | | ArraysSort.floatSort | 25 | 0.088 | 0.053 | 1.7 | | ArraysSort.floatSort | 50 | 0.284 | 0.077 | 3.7 | | ArraysSort.floatSort | 75 | 0.484 | 0.126 | 3.8 | ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1571178035 From yzhu at openjdk.org Thu Jun 1 01:52:30 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 1 Jun 2023 01:52:30 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules [v2] In-Reply-To: References: Message-ID: > Merge vector instructs with similar match rules in riscv_v.ad. > > Tier 1~3 passed on QEMU with RVV supported. Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: modify format in vabs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14214/files - new: https://git.openjdk.org/jdk/pull/14214/files/712b3594..996d94a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14214&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14214&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14214.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14214/head:pull/14214 PR: https://git.openjdk.org/jdk/pull/14214 From yzhu at openjdk.org Thu Jun 1 01:52:30 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 1 Jun 2023 01:52:30 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 06:00:13 GMT, Fei Yang wrote: >> Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> modify format in vabs > > src/hotspot/cpu/riscv/riscv_v.ad line 245: > >> 243: ins_cost(VEC_COST); >> 244: effect(TEMP tmp); >> 245: format %{ "vrsub.vi $tmp, 0, $src\t#@vabs\n\t" > > Suggestion: `format %{ "vrsub.vi $tmp, $src, 0\t#@vabs\n\t"` Thanks for the review. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14214#discussion_r1212477072 From duke at openjdk.org Thu Jun 1 01:56:11 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 1 Jun 2023 01:56:11 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 16:58:11 GMT, Alexander Zvegintsev wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Please note that the associated test is now in the problem list, see #14250 @azvegint @dcubed-ojdk Thanks, I will remove it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1571205982 From duke at openjdk.org Thu Jun 1 02:07:05 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 1 Jun 2023 02:07:05 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: <52Szrc2FBF-UQ2fFYL9YSdqQK2XwIDYEl5xdW2SOLJk=.2c108ff0-4b5f-4866-8499-b4f92dded098@github.com> References: <52Szrc2FBF-UQ2fFYL9YSdqQK2XwIDYEl5xdW2SOLJk=.2c108ff0-4b5f-4866-8499-b4f92dded098@github.com> Message-ID: On Wed, 31 May 2023 11:58:21 GMT, Evgeny Astigeevich wrote: > I see `jdk/incubator/vector/Float64VectorTests.java` covers the case `arrangement == __ T2S`. Is there a test covering the case `arrangement == __ T2D`? Thanks for your review. I don't find such a case. I will add a test case in TestVectorMaskTrueCount.java to cover `arrangement == __ T2D`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14245#discussion_r1212486286 From fyang at openjdk.org Thu Jun 1 02:40:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 1 Jun 2023 02:40:08 GMT Subject: RFR: 8303417: RISC-V: Merge vector instructs with similar match rules [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 01:52:30 GMT, Yanhong Zhu wrote: >> Merge vector instructs with similar match rules in riscv_v.ad. >> >> Tier 1~3 passed on QEMU with RVV supported. > > Yanhong Zhu has updated the pull request incrementally with one additional commit since the last revision: > > modify format in vabs Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14214#pullrequestreview-1454449118 From yzhu at openjdk.org Thu Jun 1 02:47:12 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 1 Jun 2023 02:47:12 GMT Subject: Integrated: 8303417: RISC-V: Merge vector instructs with similar match rules In-Reply-To: References: Message-ID: On Tue, 30 May 2023 12:11:43 GMT, Yanhong Zhu wrote: > Merge vector instructs with similar match rules in riscv_v.ad. > > Tier 1~3 passed on QEMU with RVV supported. This pull request has now been integrated. Changeset: 6c7225f8 Author: Yanhong Zhu Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/6c7225f819a729b1ef6f8b2769da4b50d879455d Stats: 504 lines in 1 file changed: 33 ins; 379 del; 92 mod 8303417: RISC-V: Merge vector instructs with similar match rules Reviewed-by: fyang, rehn, dzhang ------------- PR: https://git.openjdk.org/jdk/pull/14214 From qamai at openjdk.org Thu Jun 1 02:49:04 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 Jun 2023 02:49:04 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4] In-Reply-To: References: Message-ID: <4yGUKJv-LfIcklHot992876QlCgsr_jzP5km9JMRwOc=.fbfd57e1-97a0-4023-89db-ef2db5d92559@github.com> On Thu, 1 Jun 2023 01:15:57 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix license I notice that zmm_t ymm_vector::max(zmm_t x, zmm_t y) { return _mm256_max_ps(x, y); } This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1571243839 From thartmann at openjdk.org Thu Jun 1 05:12:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 1 Jun 2023 05:12:14 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Testing of v00 passed. I'll re-run testing once you updated the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1571347687 From thartmann at openjdk.org Thu Jun 1 05:16:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 1 Jun 2023 05:16:06 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v3] In-Reply-To: <7-jhMuUY2Gwwer1msBJlmu5OBzq9EvHmYsfQu4qLAMQ=.9b80a7cf-63de-4b44-8f41-797534f724a9@github.com> References: <7-jhMuUY2Gwwer1msBJlmu5OBzq9EvHmYsfQu4qLAMQ=.9b80a7cf-63de-4b44-8f41-797534f724a9@github.com> Message-ID: On Wed, 31 May 2023 16:04:25 GMT, Christian Hagedorn wrote: >> The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): >> >> ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) >> >> We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: >> https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 >> >> The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. >> >> Testing: tier1-4, hs-precheckin-comp, hs-stress-comp >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove line breaks Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14196#pullrequestreview-1454569847 From dzhang at openjdk.org Thu Jun 1 05:51:31 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 1 Jun 2023 05:51:31 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V Message-ID: [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work also for ASCII-compatible encodings, which helps speeding up various CharsetEncoders. Implementing a similar intrinsic should be considered on RISC-V as well. The instruct log with -XX:+PrintOptoAssembly output looks like: 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 ## Testing: qemu w/ UseRVV: - [x] Tier1 tests (release) - [x] Tier2 tests (release) - [x] Tier3 tests (release) - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java ------------- Commit messages: - 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V Changes: https://git.openjdk.org/jdk/pull/14256/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14256&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309254 Stats: 48 lines in 4 files changed: 30 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/14256.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14256/head:pull/14256 PR: https://git.openjdk.org/jdk/pull/14256 From duke at openjdk.org Thu Jun 1 06:22:43 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 1 Jun 2023 06:22:43 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v2] In-Reply-To: References: Message-ID: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into fix_truecount - 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. TEST passed on AArch64: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Change-Id: I2a224a24b83bbbb9289648d88351de6adb24b760 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14245/files - new: https://git.openjdk.org/jdk/pull/14245/files/8fc40208..7736013f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=00-01 Stats: 2107 lines in 159 files changed: 774 ins; 840 del; 493 mod Patch: https://git.openjdk.org/jdk/pull/14245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14245/head:pull/14245 PR: https://git.openjdk.org/jdk/pull/14245 From duke at openjdk.org Thu Jun 1 06:33:56 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 1 Jun 2023 06:33:56 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v3] In-Reply-To: References: Message-ID: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Update ProblemList and test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14245/files - new: https://git.openjdk.org/jdk/pull/14245/files/7736013f..c15e7b8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=01-02 Stats: 35 lines in 2 files changed: 14 ins; 10 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14245/head:pull/14245 PR: https://git.openjdk.org/jdk/pull/14245 From epeter at openjdk.org Thu Jun 1 06:39:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 Jun 2023 06:39:08 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Wed, 31 May 2023 19:57:36 GMT, Vladimir Kozlov wrote: >> I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. >> >> I added the code above the assert, the comments explain why: >> >> https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 >> >> Here the graph just before the assert: >> ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) >> >> `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` >> `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. >> `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. >> >> Testing up to tier6 and stress testing. TODO > > test/hotspot/jtreg/compiler/loopopts/TestCountedLoopInsideInfiniteLoop.jasm line 24: > >> 22: */ >> 23: >> 24: super public class TestCountedLoopInsideInfiniteLoop > > May be add comment why you put this into separate file and not make inner class. @vnkozlov Is that even possible, to make jasm code inside a java file? I just repeated the pattern of other jasm tests in the same directory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14178#discussion_r1212655566 From fyang at openjdk.org Thu Jun 1 06:51:04 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 1 Jun 2023 06:51:04 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 05:45:14 GMT, Dingli Zhang wrote: > [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work > also for ASCII-compatible encodings, which helps speeding up various > CharsetEncoders. Implementing a similar intrinsic should be considered on > RISC-V as well. > > The instruct log with -XX:+PrintOptoAssembly output looks like: > > 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 > > > ## Testing: > qemu w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java src/hotspot/cpu/riscv/riscv_v.ad line 2847: > 2845: TEMP v0, TEMP v1, TEMP v2, TEMP v3, TEMP tmp); > 2846: > 2847: format %{ "Encode ISO array $src, $dst, $len -> $result # KILL $src, $dst, $len, $tmp, V0-V3"%} Nit: missing space on LHS of `%}` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14256#discussion_r1212667154 From luhenry at openjdk.org Thu Jun 1 07:08:07 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 1 Jun 2023 07:08:07 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 05:45:14 GMT, Dingli Zhang wrote: > [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work > also for ASCII-compatible encodings, which helps speeding up various > CharsetEncoders. Implementing a similar intrinsic should be considered on > RISC-V as well. > > The instruct log with -XX:+PrintOptoAssembly output looks like: > > 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 > > > ## Testing: > qemu w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java src/hotspot/cpu/riscv/riscv_v.ad line 2863: > 2861: TEMP v0, TEMP v1, TEMP v2, TEMP v3, TEMP tmp); > 2862: > 2863: format %{ "Encode ASCII array $src, $dst, $len -> $result # KILL $src, $dst, $len, $tmp, V0-V3"%} Nit: same as https://github.com/openjdk/jdk/pull/14256/files#r1212667154, missing a space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14256#discussion_r1212686480 From luhenry at openjdk.org Thu Jun 1 07:12:07 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 1 Jun 2023 07:12:07 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 05:45:14 GMT, Dingli Zhang wrote: > [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work > also for ASCII-compatible encodings, which helps speeding up various > CharsetEncoders. Implementing a similar intrinsic should be considered on > RISC-V as well. > > The instruct log with -XX:+PrintOptoAssembly output looks like: > > 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 > > > ## Testing: > qemu w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14256#pullrequestreview-1454739659 From jwaters at openjdk.org Thu Jun 1 07:12:39 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 1 Jun 2023 07:12:39 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v2] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert changes to jaccesswalker and add proper cast to offending callsite ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/b925f5ee..628be6b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=00-01 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Thu Jun 1 07:14:55 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 1 Jun 2023 07:14:55 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v3] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Nevermind ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/628be6b2..29b93688 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From davleopo at openjdk.org Thu Jun 1 07:17:14 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Thu, 1 Jun 2023 07:17:14 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal [v2] In-Reply-To: References: Message-ID: > This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. > > In the past this test also failed with graal because it was checking for c1/c2 semantics. > JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. > > However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. > This lets the test fail again for the unaligned cases because it asserts graal folds them. > > The fix is to actually assert mismatch on unaligned accesses. David Leopoldseder has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal ------------- Changes: https://git.openjdk.org/jdk/pull/14242/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14242&range=01 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14242.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14242/head:pull/14242 PR: https://git.openjdk.org/jdk/pull/14242 From davleopo at openjdk.org Thu Jun 1 07:17:16 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Thu, 1 Jun 2023 07:17:16 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 16:57:25 GMT, David Leopoldseder wrote: >> Good point. David, if this test passes on libgraal then there's no need to increase the test time (I assume that with `-Xbatch`, the `* 500` does noticeably increase the test time). > > yeah, I used it for local testing. Ill remove it. removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14242#discussion_r1212693578 From dzhang at openjdk.org Thu Jun 1 07:24:09 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 1 Jun 2023 07:24:09 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: > [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work > also for ASCII-compatible encodings, which helps speeding up various > CharsetEncoders. Implementing a similar intrinsic should be considered on > RISC-V as well. > > The instruct log with -XX:+PrintOptoAssembly output looks like: > > 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 > > > ## Testing: > qemu w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Add missing space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14256/files - new: https://git.openjdk.org/jdk/pull/14256/files/00eb80b6..a42fa50b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14256&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14256&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14256.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14256/head:pull/14256 PR: https://git.openjdk.org/jdk/pull/14256 From dzhang at openjdk.org Thu Jun 1 07:24:10 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 1 Jun 2023 07:24:10 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 06:47:48 GMT, Fei Yang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing space > > src/hotspot/cpu/riscv/riscv_v.ad line 2847: > >> 2845: TEMP v0, TEMP v1, TEMP v2, TEMP v3, TEMP tmp); >> 2846: >> 2847: format %{ "Encode ISO array $src, $dst, $len -> $result # KILL $src, $dst, $len, $tmp, V0-V3"%} > > Nit: missing space on LHS of `%}` Thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14256#discussion_r1212699989 From dzhang at openjdk.org Thu Jun 1 07:24:13 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Thu, 1 Jun 2023 07:24:13 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:05:47 GMT, Ludovic Henry wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing space > > src/hotspot/cpu/riscv/riscv_v.ad line 2863: > >> 2861: TEMP v0, TEMP v1, TEMP v2, TEMP v3, TEMP tmp); >> 2862: >> 2863: format %{ "Encode ASCII array $src, $dst, $len -> $result # KILL $src, $dst, $len, $tmp, V0-V3"%} > > Nit: same as https://github.com/openjdk/jdk/pull/14256/files#r1212667154, missing a space. Thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14256#discussion_r1212700396 From chagedorn at openjdk.org Thu Jun 1 07:44:16 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 1 Jun 2023 07:44:16 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v4] In-Reply-To: References: Message-ID: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Simplify test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14196/files - new: https://git.openjdk.org/jdk/pull/14196/files/13c7c6d5..25df83dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14196&range=02-03 Stats: 32 lines in 1 file changed: 12 ins; 12 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14196.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14196/head:pull/14196 PR: https://git.openjdk.org/jdk/pull/14196 From chagedorn at openjdk.org Thu Jun 1 07:44:18 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 1 Jun 2023 07:44:18 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v3] In-Reply-To: <7-jhMuUY2Gwwer1msBJlmu5OBzq9EvHmYsfQu4qLAMQ=.9b80a7cf-63de-4b44-8f41-797534f724a9@github.com> References: <7-jhMuUY2Gwwer1msBJlmu5OBzq9EvHmYsfQu4qLAMQ=.9b80a7cf-63de-4b44-8f41-797534f724a9@github.com> Message-ID: On Wed, 31 May 2023 16:04:25 GMT, Christian Hagedorn wrote: >> The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): >> >> ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) >> >> We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: >> https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 >> >> The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. >> >> Testing: tier1-4, hs-precheckin-comp, hs-stress-comp >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove line breaks Thanks Tobias for your review! I've simplified the test some more but the fix remains the same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14196#issuecomment-1571517233 From yzhu at openjdk.org Thu Jun 1 07:58:08 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 1 Jun 2023 07:58:08 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:24:09 GMT, Dingli Zhang wrote: >> [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work >> also for ASCII-compatible encodings, which helps speeding up various >> CharsetEncoders. Implementing a similar intrinsic should be considered on >> RISC-V as well. >> >> The instruct log with -XX:+PrintOptoAssembly output looks like: >> >> 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 >> >> >> ## Testing: >> qemu w/ UseRVV: >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add missing space Looks good. ------------- Marked as reviewed by yzhu (Author). PR Review: https://git.openjdk.org/jdk/pull/14256#pullrequestreview-1454827974 From chagedorn at openjdk.org Thu Jun 1 08:04:20 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 1 Jun 2023 08:04:20 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v4] In-Reply-To: References: Message-ID: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopPredicate.cpp Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14156/files - new: https://git.openjdk.org/jdk/pull/14156/files/48ee1e40..30386691 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14156&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14156&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14156.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14156/head:pull/14156 PR: https://git.openjdk.org/jdk/pull/14156 From chagedorn at openjdk.org Thu Jun 1 08:04:22 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 1 Jun 2023 08:04:22 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v3] In-Reply-To: References: Message-ID: On Tue, 30 May 2023 10:30:34 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove negation Thanks Tobias for reviewing the new fix again! That's a good idea. I've run some standard benchmarks and could not find any regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1571550252 From fyang at openjdk.org Thu Jun 1 08:06:07 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 1 Jun 2023 08:06:07 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:24:09 GMT, Dingli Zhang wrote: >> [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work >> also for ASCII-compatible encodings, which helps speeding up various >> CharsetEncoders. Implementing a similar intrinsic should be considered on >> RISC-V as well. >> >> The instruct log with -XX:+PrintOptoAssembly output looks like: >> >> 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 >> >> >> ## Testing: >> qemu w/ UseRVV: >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add missing space Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14256#pullrequestreview-1454841963 From chagedorn at openjdk.org Thu Jun 1 08:08:20 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 1 Jun 2023 08:08:20 GMT Subject: Integrated: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition In-Reply-To: References: Message-ID: On Thu, 25 May 2023 16:48:35 GMT, Christian Hagedorn wrote: > [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: > https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 > > This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. > > But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. > > Thanks, > Christian This pull request has now been integrated. Changeset: dfd3da3f Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/dfd3da3f52480f68f653beb1e720691f8232ace7 Stats: 261 lines in 5 files changed: 224 ins; 9 del; 28 mod 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/14156 From duke at openjdk.org Thu Jun 1 08:56:41 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 1 Jun 2023 08:56:41 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v3] In-Reply-To: References: Message-ID: > This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). > > It is a trivial patch that fixes a misleading code comment at method entry printed by `-XX:+PrintAssembly`. > > For exmple, > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;*synchronization entry > > will become > > 0x0000ffffa409da88: stp x29, x30, [sp, #16] ;* invocation entry (also synchronization entry if synchronized) Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: Use more precise info and revert insignificant changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14192/files - new: https://git.openjdk.org/jdk/pull/14192/files/362fc750..af3c98be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14192&range=01-02 Stats: 8 lines in 4 files changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14192/head:pull/14192 PR: https://git.openjdk.org/jdk/pull/14192 From duke at openjdk.org Thu Jun 1 09:05:05 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 1 Jun 2023 09:05:05 GMT Subject: RFR: 8303451: Synchronization entry in C2 debug info is misleading [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 08:56:41 GMT, Daohan Qu wrote: >> This should fix [JDK-8303451](https://bugs.openjdk.org/browse/JDK-8303451). >> >> It tries to correct some misleading code comments printed by `-XX:+PrintAssembly`. > > Daohan Qu has updated the pull request incrementally with one additional commit since the last revision: > > Use more precise info and revert insignificant changes Thanks to your helpful discussions, the current change looks better but is still far from perfect. I think `UnwindBci` is used in an exception-throwing case, but there seem to be no more appropriate bci used for the state "at method exit while not unlocked yet". I will continue looking into this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14192#issuecomment-1571642533 From rcastanedalo at openjdk.org Thu Jun 1 09:15:45 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Jun 2023 09:15:45 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v3] In-Reply-To: References: Message-ID: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: - Complete test battery with remaining no-add cases - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state - Add tests to exercise the case without inner additions - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13924/files - new: https://git.openjdk.org/jdk/pull/13924/files/a6db3cc4..3b4d7993 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=01-02 Stats: 113 lines in 3 files changed: 61 ins; 38 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13924.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13924/head:pull/13924 PR: https://git.openjdk.org/jdk/pull/13924 From rcastanedalo at openjdk.org Thu Jun 1 09:22:12 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Jun 2023 09:22:12 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:55:07 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8302673 >> - Defer op(x, x) to constant/identity propagation early >> - Merge branch 'master' into JDK-8302673 >> - Refactor idealization and extracted Identity transformation for clarity >> - Make auxiliary add operand extraction function return a tuple >> - Randomize array values in min/max test computation >> - Merge branch 'master' into JDK-8302673 >> - Merge branch 'master' into JDK-8302673 >> - Refine comments >> - Update copyright header >> - ... and 12 more: https://git.openjdk.org/jdk/compare/f05dea97...a6db3cc4 > > src/hotspot/share/opto/addnode.cpp line 1192: > >> 1190: } else { >> 1191: return new MaxINode(add_transformed, inner_other); >> 1192: } > > Could you make use of `MaxNode::build_min_max`? Done (extracting common functionality into `build_min_max_int()`), thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1212855100 From rcastanedalo at openjdk.org Thu Jun 1 09:27:14 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Jun 2023 09:27:14 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v2] In-Reply-To: References: Message-ID: On Wed, 31 May 2023 11:06:26 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 22 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8302673 >> - Defer op(x, x) to constant/identity propagation early >> - Merge branch 'master' into JDK-8302673 >> - Refactor idealization and extracted Identity transformation for clarity >> - Make auxiliary add operand extraction function return a tuple >> - Randomize array values in min/max test computation >> - Merge branch 'master' into JDK-8302673 >> - Merge branch 'master' into JDK-8302673 >> - Refine comments >> - Update copyright header >> - ... and 12 more: https://git.openjdk.org/jdk/compare/d7cb4210...a6db3cc4 > > src/hotspot/share/opto/addnode.cpp line 1141: > >> 1139: } >> 1140: return ConstAddOperands(x, c_type->is_int()->get_con()); >> 1141: } > > This is what it was on my last review: > > > // Return: > // , if n is of the form x + C, where 'C' is a non-TOP constant; > // , if n is of the form x + C, where 'C' is a TOP constant; > // otherwise. > static Node* constant_add_input(Node* n, jint* con) { > if (n->Opcode() == Op_AddI && n->in(2)->is_Con()) { > const Type* t = n->in(2)->bottom_type(); > if (t == Type::TOP) { > return nullptr; > } > *con = t->is_int()->get_con(); > n = n->in(1); > } > return n; > } > > Here, you used to also allow packing just a single `n`, and leave the constant as `zero`. Did you remove this possibility on purpose? Now `n` must be an `AddI`. > > This used to allow cases like this to be folded: > `max(max(a, b), a + 1) -> max(a + max(0, 1), b)` > > Or am I missing something? Do you have tests for this case? Good catch, thanks Emanuel! The simplification of `as_add_with_constant()` was too aggressive, and the lost optimizations were not caught by any test or noticeable regression on the standard Java benchmark suites. I added more test cases to catch them now and reverted `as_add_with_constant()` to return `` for non-additions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1212860862 From rcastanedalo at openjdk.org Thu Jun 1 09:31:08 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 1 Jun 2023 09:31:08 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: <1HLZtK1lXOwrix38E1xl53pnEQ7oPH_Huzp3rKJ3YNQ=.7ee35d8f-d00d-493d-8fb5-fe9fdbade009@github.com> On Wed, 31 May 2023 11:07:42 GMT, Emanuel Peter wrote: >> @eme64 Sorry for the delay, I have addressed your feedback now! Please let me know if you find the new version more readable. > > @robcasloz it looks much better, thanks for refactoring :) > I have left a few more comments. @eme64 Thanks for your thorough review, I addressed your new comments and suggestions. Please let me know if there is anything else you would like to change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1571682387 From epeter at openjdk.org Thu Jun 1 10:33:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 Jun 2023 10:33:10 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 09:15:45 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components Now it looks much cleaner, thanks for the new changes! Thanks for the extra tests. src/hotspot/share/opto/addnode.cpp line 1186: > 1184: Node* add_transformed = phase->transform(add_extracted); > 1185: Node* inner_other = inner_op->in(inner_add_index == 1 ? 2 : 1); > 1186: return build_min_max_int(add_transformed, inner_other, opcode == Op_MaxI); Did something prevent you from directly using `MaxNode::build_min_max`? ------------- Marked as reviewed by epeter (Committer). PR Review: https://git.openjdk.org/jdk/pull/13924#pullrequestreview-1455132142 PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1212945144 From thartmann at openjdk.org Thu Jun 1 11:01:11 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 1 Jun 2023 11:01:11 GMT Subject: RFR: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:17:14 GMT, David Leopoldseder wrote: >> This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. >> >> In the past this test also failed with graal because it was checking for c1/c2 semantics. >> JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. >> >> However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. >> This lets the test fail again for the unaligned cases because it asserts graal folds them. >> >> The fix is to actually assert mismatch on unaligned accesses. > > David Leopoldseder has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong > values with Graal Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14242#pullrequestreview-1455184613 From jbhateja at openjdk.org Thu Jun 1 11:43:12 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 1 Jun 2023 11:43:12 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: On Tue, 30 May 2023 23:31:09 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Hi @asgibbons , Kindly also include the results for following benchmark test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java Best Regards, Jatin Hi @asgibbons , Kindly also include the results for following benchmark test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java Best Regards, Jatin src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 234: > 232: // { > 233: // q = DP_DIV_RZ(a, bs); > 234: __ bind(L_1237); should be ok to do loop alignment padding, though may low trip count loop. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 306: > 304: > 305: Label L_104a, L_11bd, L_10c1, L_1090, L_11b9, L_10e7, L_11af, L_111c, L_10f3, L_116e, L_112a; > 306: Label L_1173, L_1157, L_117f, L_11a0; For the sake of clarity, can we segregate AVX2 functionality into a separate routine and indent the block. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 321: > 319: __ movl(rcx, rax); > 320: __ orl(rcx, 0x7f80); > 321: __ movl(Address(rsp, 0x04), rcx); It may rarely happen that scope of MXCSR change is beyond couple of instruction, hence we simply load the needed settings and later on re-load std MXCSR settings from default location `ldmxcsr(ExternalAddress(StubRoutines::x86::addr_mxcsr_std()));` src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 396: > 394: __ vdivpd(xmm0, xmm4, xmm3, Assembler::AVX_128bit); > 395: // q = DP_TRUNC(q); > 396: __ vroundsd(xmm0, xmm0, xmm0, 3); vroundsd can be removed if we defer MXCSR reinitialization beyond it. ------------- PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1455184476 PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1455256066 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212979736 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212982077 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1212995674 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1213009324 From jwaters at openjdk.org Thu Jun 1 11:49:24 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 1 Jun 2023 11:49:24 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Fix the code that is actually warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/29b93688..5fa2d3eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Thu Jun 1 11:59:05 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 1 Jun 2023 11:59:05 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning While looking through the code for this I've come to realize that a staggering amount of code in the accessibility binaries specify longs where unsigned longs would be much more appropriate (see the one example in this PR for instance), wonder if this should also be fixed in the long term too ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1571912910 From dnsimon at openjdk.org Thu Jun 1 12:03:35 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 1 Jun 2023 12:03:35 GMT Subject: RFR: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures Message-ID: This PR fixes JVMCI code installation such that if a `Dependencies::call_site_target_value` failure is detected (e.g. because the target of a `MutableCallSite` was changed concurrently with the JVMCI compilation), the decompilation count for the method is not incremented. That is, this PR does for JVMCI what [JDK-8173338](https://bugs.openjdk.org/browse/JDK-8173338) did for CI. ------------- Commit messages: - do not update decompile count when an invalid dependency at code installation is a call site dependency Changes: https://git.openjdk.org/jdk/pull/14222/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14222&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308954 Stats: 18 lines in 2 files changed: 13 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14222.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14222/head:pull/14222 PR: https://git.openjdk.org/jdk/pull/14222 From aph at openjdk.org Thu Jun 1 12:34:21 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 1 Jun 2023 12:34:21 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 01:15:57 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix license test/micro/org/openjdk/bench/java/util/ArraysSort.java line 59: > 57: > 58: > 59: @Param({"100", "1000", "10000", "100000"}) Suggestion: @Param({"10","25","50","75","100", "1000", "10000", "100000"}) Short arrays are important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1213079759 From sgibbons at openjdk.org Thu Jun 1 13:43:11 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 1 Jun 2023 13:43:11 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: On Thu, 1 Jun 2023 01:05:52 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 134: > >> 132: >> 133: // q = DP_DIV_RZ(a, b); >> 134: __ vmovsd(xmm5, xmm18, xmm1); > > This and other usage of vmovsd with blending two registers could be avoided. I don't know what you mean. Can you elaborate please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1213174220 From fjiang at openjdk.org Thu Jun 1 13:49:12 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 1 Jun 2023 13:49:12 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:24:09 GMT, Dingli Zhang wrote: >> [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work >> also for ASCII-compatible encodings, which helps speeding up various >> CharsetEncoders. Implementing a similar intrinsic should be considered on >> RISC-V as well. >> >> The instruct log with -XX:+PrintOptoAssembly output looks like: >> >> 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 >> >> >> ## Testing: >> qemu w/ UseRVV: >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Add missing space Looks good, thanks. ------------- Marked as reviewed by fjiang (Author). PR Review: https://git.openjdk.org/jdk/pull/14256#pullrequestreview-1455519931 From eastigeevich at openjdk.org Thu Jun 1 13:53:07 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 1 Jun 2023 13:53:07 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 06:33:56 GMT, Chang Peng wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update ProblemList and test case test/hotspot/jtreg/compiler/vectorapi/TestVectorMaskTrueCount.java line 62: > 60: } > 61: > 62: static int maskAndTrueCount(boolean[] a, boolean[] b, int idx, int SPECIES_length) { `SPECIES_length` confuses me. The first confusing thing is `SPECIES` in capital letters. The second is the name itself. From the code I see it has meaning of `count`. Maybe using `count` instead would be better? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14245#discussion_r1213188813 From duke at openjdk.org Thu Jun 1 14:01:06 2023 From: duke at openjdk.org (Francesco Nigro) Date: Thu, 1 Jun 2023 14:01:06 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 01:16:22 GMT, Srinivas Vamsi Parasa wrote: >> What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10. > >> What happens to really short arrays? Your patch should include macro benchmarks for e.g. 50 and 10. > > Thanks for the suggestion. Please see the performance for small array sizes below: > > | Arrays.sort benchmark | Array Size | Baseline | AVX512 Sort | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.intSort | 10 | 0.029 | 0.018 | 1.6 | > | ArraysSort.intSort | 25 | 0.086 | 0.032 | 2.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.056 | 4.2 | > | ArraysSort.intSort | 75 | 0.409 | 0.111 | 3.7 | > | ArraysSort.longSort | 10 | 0.031 | 0.033 | 0.9 | > | ArraysSort.longSort | 25 | 0.09 | 0.061 | 1.5 | > | ArraysSort.longSort | 50 | 0.228 | 0.127 | 1.8 | > | ArraysSort.longSort | 75 | 0.382 | 0.28 | 1.4 | > | ArraysSort.doubleSort | 10 | 0.037 | 0.043 | 0.9 | > | ArraysSort.doubleSort | 25 | 0.129 | 0.066 | 2.0 | > | ArraysSort.doubleSort | 50 | 0.267 | 0.115 | 2.3 | > | ArraysSort.doubleSort | 75 | 0.549 | 0.219 | 2.5 | > | ArraysSort.floatSort | 10 | 0.034 | 0.034 | 1.0 | > | ArraysSort.floatSort | 25 | 0.088 | 0.053 | 1.7 | > | ArraysSort.floatSort | 50 | 0.284 | 0.077 | 3.7 | > | ArraysSort.floatSort | 75 | 0.484 | 0.126 | 3.8 | Hi @vamsi-parasa ! Given https://bugs.openjdk.org/browse/JDK-8295496 I have noticed how much important is to add benchmark cases where offset and length parameters change and/or differ from the usual 0 and the whole array length. Equally important is to warmup with different combinations of them in order to "pollute" the JIT existing decisions, making the compiled method (and stubs) to appear more similar to what users would observe in a real world scenario. Playing with the benchmark parameters like this, together with the advice of @theRealAph to try with small inputs (that matters a lot) would unveil any perf difference with the current impl. In addition, I understand by https://github.com/openjdk/jdk/pull/14227/files#diff-1929ace9ae6df116e2fa2a718ed3924d9dae9a2daea454ca9a78177c21477aa3R5237 that's still not the case for such, at this implementation stage, hence mine is a wish for the final round impl for this PR. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1572108986 From sgibbons at openjdk.org Thu Jun 1 14:40:10 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 1 Jun 2023 14:40:10 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: On Thu, 1 Jun 2023 11:40:21 GMT, Jatin Bhateja wrote: > Hi @asgibbons , Kindly also include the results for following benchmark test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java > > Best Regards, Jatin Benchmark Mode Cnt Score Error Units DremFrem.calcDoubleJava avgt 25 16.551 ? 0.025 ns/op DremFrem.calcFloatJava avgt 25 17.197 ? 0.166 ns/op DremFrem.cornercaseDoubleJava avgt 25 5.469 ? 0.005 ns/op DremFrem.cornercaseFloatJava avgt 25 5.472 ? 0.004 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1572179085 From kvn at openjdk.org Thu Jun 1 15:51:06 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Jun 2023 15:51:06 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Fri, 26 May 2023 13:45:23 GMT, Emanuel Peter wrote: > I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. > > I added the code above the assert, the comments explain why: > > https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 > > Here the graph just before the assert: > ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) > > `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` > `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. > `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. > > Testing up to tier6 and stress testing. TODO Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14178#pullrequestreview-1455798453 From kvn at openjdk.org Thu Jun 1 15:51:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Jun 2023 15:51:09 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 06:36:27 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/TestCountedLoopInsideInfiniteLoop.jasm line 24: >> >>> 22: */ >>> 23: >>> 24: super public class TestCountedLoopInsideInfiniteLoop >> >> May be add comment why you put this into separate file and not make inner class. > > @vnkozlov Is that even possible, to make jasm code inside a java file? I just repeated the pattern of other jasm tests in the same directory. My bad. Missed that it is jasm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14178#discussion_r1213358292 From duke at openjdk.org Thu Jun 1 15:58:42 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 1 Jun 2023 15:58:42 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v5] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update test/micro/org/openjdk/bench/java/util/ArraysSort.java Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/30a50d99..a7c2b6e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From sgibbons at openjdk.org Thu Jun 1 16:03:13 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 1 Jun 2023 16:03:13 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: <2m8KCrlZkRTg4pBAwASL4FKc_MFtL-POWmhG0ebAiwQ=.36cdb946-abb1-4938-aaa3-327775b26d65@github.com> On Thu, 1 Jun 2023 11:40:21 GMT, Jatin Bhateja wrote: > Hi @asgibbons , Kindly also include the results for following benchmark test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java > > Best Regards, Jatin Current top-of-tree results: Benchmark Mode Cnt Score Error Units DremFrem.calcDoubleJava avgt 25 7.034 ? 0.001 ns/op DremFrem.calcFloatJava avgt 25 7.011 ? 0.001 ns/op DremFrem.cornercaseDoubleJava avgt 25 5.514 ? 0.006 ns/op DremFrem.cornercaseFloatJava avgt 25 5.510 ? 0.003 ns/op My changes: Benchmark Mode Cnt Score Error Units DremFrem.calcDoubleJava avgt 25 3.165 ? 0.001 ns/op DremFrem.calcFloatJava avgt 25 4.381 ? 0.001 ns/op DremFrem.cornercaseDoubleJava avgt 25 5.512 ? 0.002 ns/op DremFrem.cornercaseFloatJava avgt 25 5.524 ? 0.009 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1572324290 From kvn at openjdk.org Thu Jun 1 16:05:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 Jun 2023 16:05:13 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 09:15:45 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13924#pullrequestreview-1455826554 From duke at openjdk.org Thu Jun 1 17:22:32 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 1 Jun 2023 17:22:32 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix license in one file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/a7c2b6e9..1dc9589e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=04-05 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From prr at openjdk.org Thu Jun 1 17:46:06 2023 From: prr at openjdk.org (Phil Race) Date: Thu, 1 Jun 2023 17:46:06 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning I'm not sure I understand the logic here. I would not want to move to using Java typedefs in places where the windows APIs specify the types they are expecting. If something comes in from a JNI down-call we should convert it to the type expected by Windows using the name expected by Windows. Also "compilation" isn't nearly good enough. How is this being tested ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1572518377 From duke at openjdk.org Thu Jun 1 17:58:07 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 1 Jun 2023 17:58:07 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4] In-Reply-To: <4yGUKJv-LfIcklHot992876QlCgsr_jzP5km9JMRwOc=.fbfd57e1-97a0-4023-89db-ef2db5d92559@github.com> References: <4yGUKJv-LfIcklHot992876QlCgsr_jzP5km9JMRwOc=.fbfd57e1-97a0-4023-89db-ef2db5d92559@github.com> Message-ID: On Thu, 1 Jun 2023 02:46:20 GMT, Quan Anh Mai wrote: > I notice that > > ``` > zmm_t ymm_vector::max(zmm_t x, zmm_t y) { > return _mm256_max_ps(x, y); > } > ``` > > This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different. Hi @merykitty The algorithm is working for double as expected (i.e. implementing the total order). For example, for the input below: ` double[] arrayUnsorted = {-0.0, Double.NaN, 15.75, Double.POSITIVE_INFINITY, -234.4869, Double.NEGATIVE_INFINITY, +0.0, 100.045}; ` It's showing the correct output after sorting as expected: `[-Infinity, -234.4869, -0.0, 0.0, 15.75, 100.045, Infinity, NaN]` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1572531920 From never at openjdk.org Thu Jun 1 19:27:14 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 1 Jun 2023 19:27:14 GMT Subject: RFR: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures In-Reply-To: References: Message-ID: On Tue, 30 May 2023 14:23:23 GMT, Doug Simon wrote: > This PR fixes JVMCI code installation such that if a `Dependencies::call_site_target_value` failure is detected (e.g. because the target of a `MutableCallSite` was changed concurrently with the JVMCI compilation), the decompilation count for the method is not incremented. > > That is, this PR does for JVMCI what [JDK-8173338](https://bugs.openjdk.org/browse/JDK-8173338) did for CI. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14222#pullrequestreview-1456155397 From dnsimon at openjdk.org Thu Jun 1 19:27:15 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 1 Jun 2023 19:27:15 GMT Subject: RFR: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures In-Reply-To: References: Message-ID: On Tue, 30 May 2023 14:23:23 GMT, Doug Simon wrote: > This PR fixes JVMCI code installation such that if a `Dependencies::call_site_target_value` failure is detected (e.g. because the target of a `MutableCallSite` was changed concurrently with the JVMCI compilation), the decompilation count for the method is not incremented. > > That is, this PR does for JVMCI what [JDK-8173338](https://bugs.openjdk.org/browse/JDK-8173338) did for CI. Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14222#issuecomment-1572648664 From dnsimon at openjdk.org Thu Jun 1 19:27:16 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 1 Jun 2023 19:27:16 GMT Subject: Integrated: 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures In-Reply-To: References: Message-ID: <3xGCCVS6-iWBm1JJFpUWwHg0dGXy8FDapOAtiXrl4TE=.0a3d9aee-42f6-4c45-bba5-9ab5b2a2080e@github.com> On Tue, 30 May 2023 14:23:23 GMT, Doug Simon wrote: > This PR fixes JVMCI code installation such that if a `Dependencies::call_site_target_value` failure is detected (e.g. because the target of a `MutableCallSite` was changed concurrently with the JVMCI compilation), the decompilation count for the method is not incremented. > > That is, this PR does for JVMCI what [JDK-8173338](https://bugs.openjdk.org/browse/JDK-8173338) did for CI. This pull request has now been integrated. Changeset: 2bb19724 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/2bb1972483abadaf7957fff1654a1c141fc48109 Stats: 18 lines in 2 files changed: 13 ins; 0 del; 5 mod 8308954: [JVMCI] code installation increments decompile_count for call_site_target_value failures Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/14222 From sgibbons at openjdk.org Thu Jun 1 21:18:52 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 1 Jun 2023 21:18:52 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v3] In-Reply-To: References: Message-ID: <3-C-x5eRi42jFZRHfL4euEAMoZowoqGtkU7E1DOIc2Q=.007976ae-2010-4ae7-aadb-ec14d1d0930a@github.com> > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Change to more efficient algorithm for AVX512 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/351afa38..e1131955 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=01-02 Stats: 257 lines in 2 files changed: 240 ins; 6 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From gcao at openjdk.org Fri Jun 2 02:15:26 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 2 Jun 2023 02:15:26 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes Message-ID: Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 ### Testing: qemu with UseRVV: - [ ] Tier1 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) ------------- Commit messages: - RISC-V: Improve PrintOptoAssembly output of vector nodes Changes: https://git.openjdk.org/jdk/pull/14279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309332 Stats: 91 lines in 1 file changed: 0 ins; 21 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/14279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14279/head:pull/14279 PR: https://git.openjdk.org/jdk/pull/14279 From yzhu at openjdk.org Fri Jun 2 02:43:11 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Fri, 2 Jun 2023 02:43:11 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 02:08:29 GMT, Gui Cao wrote: > Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. > > While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. > > [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 > > ### Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) > - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) src/hotspot/cpu/riscv/riscv_v.ad line 1074: > 1072: match(Set dst (NegVL src)); > 1073: ins_cost(VEC_COST); > 1074: format %{ "vneg $dst, $src, $src" %} Hi, for vneg and vfneg, I think `vneg/vfneg $dst, $src` is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1213856837 From gcao at openjdk.org Fri Jun 2 02:58:06 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 2 Jun 2023 02:58:06 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v2] In-Reply-To: References: Message-ID: > Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. > > While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. > > [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 > > ### Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) > - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Update vneg/vfneg instruct format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14279/files - new: https://git.openjdk.org/jdk/pull/14279/files/abfd6ca1..9e0430c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14279&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14279&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14279/head:pull/14279 PR: https://git.openjdk.org/jdk/pull/14279 From gcao at openjdk.org Fri Jun 2 02:58:08 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 2 Jun 2023 02:58:08 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v2] In-Reply-To: References: Message-ID: <4NcydBkV2qqXq7Gci7YEGbTLcI3q_CXLd-CJj4wXxYU=.8af453e4-e2bf-42fe-99c8-72606a5d8e20@github.com> On Fri, 2 Jun 2023 02:35:32 GMT, Yanhong Zhu wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Update vneg/vfneg instruct format > > src/hotspot/cpu/riscv/riscv_v.ad line 1074: > >> 1072: match(Set dst (NegVL src)); >> 1073: ins_cost(VEC_COST); >> 1074: format %{ "vneg $dst, $src, $src" %} > > Hi, for vneg and vfneg, I think `vneg/vfneg $dst, $src` is better. Hi, Thanks for your review. and has been fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1213863377 From yzhu at openjdk.org Fri Jun 2 02:58:30 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Fri, 2 Jun 2023 02:58:30 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v2] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 02:58:06 GMT, Gui Cao wrote: >> Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. >> >> While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. >> >> [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 >> >> ### Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) >> - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update vneg/vfneg instruct format Marked as reviewed by yzhu (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/14279#pullrequestreview-1456576260 From rcastanedalo at openjdk.org Fri Jun 2 04:04:07 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 04:04:07 GMT Subject: RFR: 8309295: C2: MaxNode::signed_min() returns nullptr for int operands Message-ID: This *trivial* changeset ensures that `MaxNode::signed_min()` returns a new `MinI` node, instead of `nullptr`, when receiving int arguments. This path is currently dead in the VM, but fixing it makes the functionality reusable (for example by JDK-8302673, currently [under review](https://github.com/openjdk/jdk/pull/13924)), more readable, and less error-prone. #### Testing - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Fix signed int min case in MaxNode::build_min_max() Changes: https://git.openjdk.org/jdk/pull/14272/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14272&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309295 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14272.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14272/head:pull/14272 PR: https://git.openjdk.org/jdk/pull/14272 From duke at openjdk.org Fri Jun 2 04:18:08 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 2 Jun 2023 04:18:08 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v4] In-Reply-To: References: <4yGUKJv-LfIcklHot992876QlCgsr_jzP5km9JMRwOc=.fbfd57e1-97a0-4023-89db-ef2db5d92559@github.com> Message-ID: On Thu, 1 Jun 2023 17:55:19 GMT, Srinivas Vamsi Parasa wrote: >> I notice that >> >> zmm_t ymm_vector::max(zmm_t x, zmm_t y) { >> return _mm256_max_ps(x, y); >> } >> >> This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different. > >> I notice that >> >> ``` >> zmm_t ymm_vector::max(zmm_t x, zmm_t y) { >> return _mm256_max_ps(x, y); >> } >> ``` >> >> This is not quite right, `Arrays.sort` uses the total order imposed by `Double.compare` to sort the array, while `_mm256_max_ps(x, y)` does `x > y ? x : y` which is different. > > Hi @merykitty > The algorithm is working for double as expected (i.e. implementing the total order). For example, for the input below: > ` double[] arrayUnsorted = {-0.0, Double.NaN, 15.75, Double.POSITIVE_INFINITY, -234.4869, Double.NEGATIVE_INFINITY, +0.0, 100.045}; > ` > It's showing the correct output after sorting as expected: > `[-Infinity, -234.4869, -0.0, 0.0, 15.75, 100.045, Infinity, NaN]` > Hi @vamsi-parasa ! Given https://bugs.openjdk.org/browse/JDK-8295496 I have noticed how much important is to add benchmark cases where offset and length parameters change and/or differ from the usual 0 and the whole array length. Equally important is to warmup with different combinations of them in order to "pollute" the JIT existing decisions, making the compiled method (and stubs) to appear more similar to what users would observe in a real world scenario. Playing with the benchmark parameters like this, together with the advice of @theRealAph to try with small inputs (that matters a lot) would unveil any perf difference with the current impl. In addition, I understand by https://github.com/openjdk/jdk/pull/14227/files#diff-1929ace9ae6df116e2fa2a718ed3924d9dae9a2daea454ca9a78177c21477aa3R5237 that's still not the case for such, at this implementation stage, hence mine is a wish for the final round impl for this PR. ? Hi @franz1981, thank you for the suggestions! The algorithm was tested to sort only a part of the array with non-zero offsets and length. I will upstream those benchmarks/tests as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1573121022 From thartmann at openjdk.org Fri Jun 2 05:06:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Jun 2023 05:06:05 GMT Subject: RFR: 8309295: C2: MaxNode::signed_min() returns nullptr for int operands In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 19:18:04 GMT, Roberto Casta?eda Lozano wrote: > This *trivial* changeset ensures that `MaxNode::signed_min()` returns a new `MinI` node, instead of `nullptr`, when receiving int arguments. This path is currently dead in the VM, but fixing it makes the functionality reusable (for example by JDK-8302673, currently [under review](https://github.com/openjdk/jdk/pull/13924)), more readable, and less error-prone. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14272#pullrequestreview-1456651693 From rcastanedalo at openjdk.org Fri Jun 2 06:37:13 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 06:37:13 GMT Subject: RFR: 8309295: C2: MaxNode::signed_min() returns nullptr for int operands In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 05:03:20 GMT, Tobias Hartmann wrote: > Looks good and trivial. Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14272#issuecomment-1573225050 From rcastanedalo at openjdk.org Fri Jun 2 06:37:15 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 06:37:15 GMT Subject: Integrated: 8309295: C2: MaxNode::signed_min() returns nullptr for int operands In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 19:18:04 GMT, Roberto Casta?eda Lozano wrote: > This *trivial* changeset ensures that `MaxNode::signed_min()` returns a new `MinI` node, instead of `nullptr`, when receiving int arguments. This path is currently dead in the VM, but fixing it makes the functionality reusable (for example by JDK-8302673, currently [under review](https://github.com/openjdk/jdk/pull/13924)), more readable, and less error-prone. > > #### Testing > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). This pull request has now been integrated. Changeset: 60f3b87d Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/60f3b87d96bcb827a217ea74a53bbcb9c0a51892 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod 8309295: C2: MaxNode::signed_min() returns nullptr for int operands Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14272 From epeter at openjdk.org Fri Jun 2 07:02:41 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 Jun 2023 07:02:41 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 Message-ID: This is the fix to a regression caused in the CMoveV fix JDK-8306302. I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. **Solution** However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. **Testing** I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. **Running... but so far all good** ------------- Commit messages: - 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 Changes: https://git.openjdk.org/jdk/pull/14268/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14268&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309268 Stats: 135 lines in 3 files changed: 132 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14268/head:pull/14268 PR: https://git.openjdk.org/jdk/pull/14268 From rcastanedalo at openjdk.org Fri Jun 2 07:10:08 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 07:10:08 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 09:15:45 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components Thanks for reviewing, Emanuel and Vladimir! I noticed that commit 29922ea in this PR accidentally fixes a latent issue in `MaxNode::build_min_max()`. I have reported and fixed the issue separately in [JDK-8309295](https://bugs.openjdk.org/browse/JDK-8309295). Will rebase this PR and submit a new version after re-testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1573258700 From rcastanedalo at openjdk.org Fri Jun 2 07:18:09 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 07:18:09 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 10:29:30 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: >> >> - Complete test battery with remaining no-add cases >> - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state >> - Add tests to exercise the case without inner additions >> - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components > > src/hotspot/share/opto/addnode.cpp line 1186: > >> 1184: Node* add_transformed = phase->transform(add_extracted); >> 1185: Node* inner_other = inner_op->in(inner_add_index == 1 ? 2 : 1); >> 1186: return build_min_max_int(add_transformed, inner_other, opcode == Op_MaxI); > > Did something prevent you from directly using `MaxNode::build_min_max`? Technically I think that should be possible, but I find that extracting the core logic into a separate function is more readable (makes it straightforward to understand the effect of the call in `MaxNode::IdealI()`) and efficient (avoids a redundant application of `PhaseGVN::transform()` to the newly created `MinI`/`MaxI` nodes). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13924#discussion_r1214010963 From fyang at openjdk.org Fri Jun 2 09:42:21 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 2 Jun 2023 09:42:21 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v2] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 02:58:06 GMT, Gui Cao wrote: >> Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. >> >> While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. >> >> [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 >> >> ### Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) >> - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update vneg/vfneg instruct format Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/riscv_v.ad line 344: > 342: match(Set dst_src1 (AddVL (Binary dst_src1 src2) v0)); > 343: ins_cost(VEC_COST); > 344: format %{ "vadd_masked $dst_src1, $src2, $v0" %} Suggestion: `format %{ "vadd_masked $dst_src1, $dst_src1, $src2, $v0" %}` src/hotspot/cpu/riscv/riscv_v.ad line 359: > 357: match(Set dst_src1 (AddVD (Binary dst_src1 src2) v0)); > 358: ins_cost(VEC_COST); > 359: format %{ "vadd_fp_masked $dst_src1, $src2, $v0" %} Suggestion: `format %{ "vadd_fp_masked $dst_src1, $dst_src1, $src2, $v0" %}` src/hotspot/cpu/riscv/riscv_v.ad line 410: > 408: match(Set dst_src1 (SubVL (Binary dst_src1 src2) v0)); > 409: ins_cost(VEC_COST); > 410: format %{ "vsub_masked $dst_src1, $src2, $v0" %} Suggestion: `format %{ "vsub_masked $dst_src1, $dst_src1, $src2, $v0" %}` src/hotspot/cpu/riscv/riscv_v.ad line 424: > 422: match(Set dst_src1 (SubVD (Binary dst_src1 src2) v0)); > 423: ins_cost(VEC_COST); > 424: format %{ "vsub_fp_masked $dst_src1, $src2, $v0" %} Suggestion: `format %{ "vsub_fp_masked $dst_src1, $dst_src1, $src2, $v0" %}` src/hotspot/cpu/riscv/riscv_v.ad line 553: > 551: match(Set dst_src1 (DivVD (Binary dst_src1 src2) v0)); > 552: ins_cost(VEC_COST); > 553: format %{ "vdiv_fp_masked $dst_src1, $src2, $v0" %} Suggestion: `format %{ "vdiv_fp_masked $dst_src1, $dst_src1, $src2, $v0" %}` src/hotspot/cpu/riscv/riscv_v.ad line 792: > 790: match(Set dst_src1 (FmaVF dst_src1 (Binary src2 (NegVF src3)))); > 791: ins_cost(VEC_COST); > 792: format %{ "vfmlsF $dst_src1, $src2, $src3" %} Suggestion: `format %{ "vfmlsF $dst_src1, $dst_src1, $src2, $src3" %}` src/hotspot/cpu/riscv/riscv_v.ad line 808: > 806: match(Set dst_src1 (FmaVD dst_src1 (Binary src2 (NegVD src3)))); > 807: ins_cost(VEC_COST); > 808: format %{ "vfmlsD $dst_src1, $src2, $src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 843: > 841: match(Set dst_src1 (FmaVF (NegVF dst_src1) (Binary src2 (NegVF src3)))); > 842: ins_cost(VEC_COST); > 843: format %{ "vfnmlaF $dst_src1, $src2, $src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 859: > 857: match(Set dst_src1 (FmaVD (NegVD dst_src1) (Binary src2 (NegVD src3)))); > 858: ins_cost(VEC_COST); > 859: format %{ "vfnmlaD $dst_src1, $src2, $src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 892: > 890: match(Set dst_src1 (FmaVF (NegVF dst_src1) (Binary src2 src3))); > 891: ins_cost(VEC_COST); > 892: format %{ "vfnmlsF $dst_src1, $src2, $src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 906: > 904: match(Set dst_src1 (FmaVD (NegVD dst_src1) (Binary src2 src3))); > 905: ins_cost(VEC_COST); > 906: format %{ "vfnmlsD $dst_src1, $src2, $src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 941: > 939: match(Set dst_src1 (AddVL dst_src1 (MulVL src2 src3))); > 940: ins_cost(VEC_COST); > 941: format %{ "vmla $dst_src1, src2, src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 977: > 975: match(Set dst_src1 (SubVL dst_src1 (MulVL src2 src3))); > 976: ins_cost(VEC_COST); > 977: format %{ "vmls $dst_src1, src2, src3" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 1044: > 1042: match(Set dst_src1 (MulVL (Binary dst_src1 src2) v0)); > 1043: ins_cost(VEC_COST); > 1044: format %{ "vmul_masked $dst_src1, $src2, $v0" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 1058: > 1056: match(Set dst_src1 (MulVD (Binary dst_src1 src2) v0)); > 1057: ins_cost(VEC_COST); > 1058: format %{ "vmul_fp_masked $dst_src1, $src2, $v0" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 1378: > 1376: effect(TEMP tmp); > 1377: ins_cost(VEC_COST); > 1378: format %{ "reduce_addF $src1_dst, $src2\t# KILL $tmp" %} Similar here. src/hotspot/cpu/riscv/riscv_v.ad line 1393: > 1391: effect(TEMP tmp); > 1392: ins_cost(VEC_COST); > 1393: format %{ "reduce_addD $src1_dst, $src2\t# KILL $tmp" %} Similar here. ------------- PR Review: https://git.openjdk.org/jdk/pull/14279#pullrequestreview-1456996124 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214146356 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214146867 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214147354 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214147929 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214148773 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214149484 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214150847 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214150931 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151243 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151369 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151467 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151581 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151646 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151780 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214151919 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214153194 PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1214153344 From thartmann at openjdk.org Fri Jun 2 11:38:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 2 Jun 2023 11:38:08 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:44:16 GMT, Christian Hagedorn wrote: >> The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): >> >> ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) >> >> We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: >> https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 >> >> The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. >> >> Testing: tier1-4, hs-precheckin-comp, hs-stress-comp >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test Still looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14196#pullrequestreview-1457179410 From jwaters at openjdk.org Fri Jun 2 11:55:05 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 2 Jun 2023 11:55:05 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 17:43:02 GMT, Phil Race wrote: > I'm not sure I understand the logic here. I would not want to move to using Java typedefs in places where the windows APIs specify the types they are expecting. If something comes in from a JNI down-call we should convert it to the type expected by Windows using the name expected by Windows. I can change the jints in this PR to regular ints if required. As listed above, the native Windows API routines that the java.desktop code calls are actually expecting ints, so our existing declarations of passing longs to them are also wrong regardless, even without the Java typedefs Actually, now that I revisit this issue (shown in the list above), the only actual calls in this change that _don't_ take Java typedefs are the calls to ::Arc and ::Pie, so this is less of a problem than initially expected > Also "compilation" isn't nearly good enough. How is this being tested? `-permissive-` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1573611091 From roland at openjdk.org Fri Jun 2 12:03:14 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 2 Jun 2023 12:03:14 GMT Subject: RFR: 8308892: Bad graph detected in build_loop_late after JDK-8305635 [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:44:16 GMT, Christian Hagedorn wrote: >> The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): >> >> ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) >> >> We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: >> https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 >> >> The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. >> >> Testing: tier1-4, hs-precheckin-comp, hs-stress-comp >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14196#pullrequestreview-1457219195 From chagedorn at openjdk.org Fri Jun 2 12:03:16 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 Jun 2023 12:03:16 GMT Subject: Integrated: 8308892: Bad graph detected in build_loop_late after JDK-8305635 In-Reply-To: References: Message-ID: On Sun, 28 May 2023 22:06:42 GMT, Christian Hagedorn wrote: > The cleanup done in [JDK-8305635](https://bugs.openjdk.org/browse/JDK-8305635) wrongly identifies unrelated Parse Predicates which are not cleaned up, yet. It just walks from the entry of the loop up and tries to find each of the three Parse Predicates once but in no particular order. This order insensitive walk is wrong as seen in the following graph (from the attached replay file of this bug): > > ![image](https://github.com/openjdk/jdk/assets/17833009/32f73fcb-1d36-40d6-938c-2d282a98ea52) > > We first find `116 Parse Predicate` for Loop Predicates, then `84 Parse Predicate` for Profiled Loop Predicates and then stop when finding `71 Parse Predicate` for Loop Predicates because we've already found a Parse Predicate for Loop Predicates already. We then wrongly create Loop Predicates (above `116 Parse Predicate`) which are below newly created Profiled Loop Predicates (above `84 Parse Predicate`). This could lead to a bad graph because of data dependencies that rely on the fact that Loop Predicates are above Profiled Loop Predicates: > https://github.com/openjdk/jdk/blob/547a8b40b324917e66c71409b31421feacce79d7/src/hotspot/share/opto/loopPredicate.cpp#L1529-L1543 > > The fix is straight forward to make the assignment of Parse Predicate projections in `ParsePredicates` aware of the relative ordering constraint. Note that this class will be refactored again in [JDK-8305636](https://bugs.openjdk.org/browse/JDK-8305636). But I think properly fixing this first is better than waiting for JDK-8305636 to go in. > > Testing: tier1-4, hs-precheckin-comp, hs-stress-comp > > Thanks, > Christian This pull request has now been integrated. Changeset: 7dbdad50 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/7dbdad50a616abb01d60ddd53c1bc464cf8c5eb3 Stats: 97 lines in 2 files changed: 95 ins; 0 del; 2 mod 8308892: Bad graph detected in build_loop_late after JDK-8305635 Reviewed-by: rcastanedalo, roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14196 From rcastanedalo at openjdk.org Fri Jun 2 12:27:33 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 12:27:33 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v4] In-Reply-To: References: Message-ID: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Re-apply extraction of min/max building after JDK-8309295 - Merge branch 'master' into JDK-8302673 - Abort idealization if any of the adds has a TOP input - Revert extraction of min/max building - Complete test battery with remaining no-add cases - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state - Add tests to exercise the case without inner additions - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components - Merge branch 'master' into JDK-8302673 - Defer op(x, x) to constant/identity propagation early - ... and 20 more: https://git.openjdk.org/jdk/compare/7b0a3360...cfcc16fd ------------- Changes: https://git.openjdk.org/jdk/pull/13924/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13924&range=03 Stats: 513 lines in 5 files changed: 339 ins; 107 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/13924.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13924/head:pull/13924 PR: https://git.openjdk.org/jdk/pull/13924 From rcastanedalo at openjdk.org Fri Jun 2 12:37:09 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 12:37:09 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Wed, 31 May 2023 11:07:42 GMT, Emanuel Peter wrote: >> @eme64 Sorry for the delay, I have addressed your feedback now! Please let me know if you find the new version more readable. > > @robcasloz it looks much better, thanks for refactoring :) > I have left a few more comments. The latest version (v4) merges the fix from [JDK-8309295](https://bugs.openjdk.org/browse/JDK-8309295) and prevents unnecessary idealization of `MinI`/`MaxI` nodes with transitive TOP constant inputs, which matches the original logic more closely. @eme64 @vnkozlov please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1573665663 From rcastanedalo at openjdk.org Fri Jun 2 13:12:10 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 13:12:10 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 In-Reply-To: References: Message-ID: <3Y7bRv1-7l5AsQx4Bg4I68y-7I-5-tebyU1q3AHoMpE=.59260b07-0d3e-4e10-a2eb-979b89a7d65a@github.com> On Thu, 1 Jun 2023 16:19:41 GMT, Emanuel Peter wrote: > This is the fix to a regression caused in the CMoveV fix JDK-8306302. > > I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. > > **Solution** > > However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. > > **Testing** > I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. > > I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. > **Running... but so far all good** Looks good, I just have a few minor suggestions. src/hotspot/share/opto/superword.cpp line 3710: > 3708: } > 3709: if (nn->is_Cmp() && nn->in(0) == nullptr) { > 3710: // One of the inputs must be in_bb, pick that velt_type Could you turn this comment into an assertion? test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 1: > 1: /* Please add an Oracle copyright entry to the header of this file, see e.g. https://github.com/openjdk/jdk/blob/cb1e5e3f0fb499ce3420a57a08fb9ec434809d13/test/hotspot/jtreg/compiler/c2/irTests/TestSuperwordFailsUnrolling.java#L2-L3 test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 873: > 871: Asserts.assertEquals(rD[i], cmoveDGTforD(aD[i], bD[i], cD[i], dD[i])); > 872: } > 873: Nit: remove one of the empty lines. test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 875: > 873: > 874: > 875: // Use some constaints in the comparison Suggestion: // Use some constants in the comparison ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14268#pullrequestreview-1457334808 PR Review Comment: https://git.openjdk.org/jdk/pull/14268#discussion_r1214331074 PR Review Comment: https://git.openjdk.org/jdk/pull/14268#discussion_r1214337644 PR Review Comment: https://git.openjdk.org/jdk/pull/14268#discussion_r1214333569 PR Review Comment: https://git.openjdk.org/jdk/pull/14268#discussion_r1214333159 From epeter at openjdk.org Fri Jun 2 13:23:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 Jun 2023 13:23:34 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: References: Message-ID: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> > This is the fix to a regression caused in the CMoveV fix JDK-8306302. > > I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. > > **Solution** > > However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. > > **Testing** > I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. > > I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. > **Running... but so far all good** Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Roberto's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14268/files - new: https://git.openjdk.org/jdk/pull/14268/files/61eae9c1..775d7ca1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14268&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14268&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14268/head:pull/14268 PR: https://git.openjdk.org/jdk/pull/14268 From rcastanedalo at openjdk.org Fri Jun 2 13:42:05 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 2 Jun 2023 13:42:05 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> References: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> Message-ID: On Fri, 2 Jun 2023 13:23:34 GMT, Emanuel Peter wrote: >> This is the fix to a regression caused in the CMoveV fix JDK-8306302. >> >> I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. >> >> **Solution** >> >> However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. >> >> **Testing** >> I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. >> >> I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. >> **Running... but so far all good** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Roberto's suggestions Thanks for addressing my suggestions, looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14268#pullrequestreview-1457486307 From kvn at openjdk.org Fri Jun 2 16:33:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Jun 2023 16:33:09 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> References: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> Message-ID: <02JmGVNojF-VU6DS9dqfDASSqoeSl7JT-ww2Ewl3hUk=.7fb78055-7d96-4bbc-a5f0-03ab6e581ed1@github.com> On Fri, 2 Jun 2023 13:23:34 GMT, Emanuel Peter wrote: >> This is the fix to a regression caused in the CMoveV fix JDK-8306302. >> >> I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. >> >> **Solution** >> >> However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. >> >> **Testing** >> I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. >> >> I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. >> **Running... but so far all good** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Roberto's suggestions Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14268#pullrequestreview-1457971472 From kvn at openjdk.org Fri Jun 2 16:45:17 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Jun 2023 16:45:17 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v4] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 12:27:33 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Re-apply extraction of min/max building after JDK-8309295 > - Merge branch 'master' into JDK-8302673 > - Abort idealization if any of the adds has a TOP input > - Revert extraction of min/max building > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - ... and 20 more: https://git.openjdk.org/jdk/compare/7b0a3360...cfcc16fd Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13924#pullrequestreview-1457997987 From kvn at openjdk.org Fri Jun 2 16:47:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 Jun 2023 16:47:09 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> References: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> Message-ID: On Fri, 2 Jun 2023 13:23:34 GMT, Emanuel Peter wrote: >> This is the fix to a regression caused in the CMoveV fix JDK-8306302. >> >> I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. >> >> **Solution** >> >> However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. >> >> **Testing** >> I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. >> >> I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. >> **Running... but so far all good** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Roberto's suggestions Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14268#pullrequestreview-1458004375 From sgibbons at openjdk.org Fri Jun 2 17:20:48 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 2 Jun 2023 17:20:48 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v4] In-Reply-To: References: Message-ID: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Code cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/e1131955..904d6d94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=02-03 Stats: 427 lines in 1 file changed: 92 ins; 326 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From tonyp at openjdk.org Fri Jun 2 18:01:09 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Fri, 2 Jun 2023 18:01:09 GMT Subject: RFR: 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes Message-ID: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes ------------- Commit messages: - 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes Changes: https://git.openjdk.org/jdk/pull/14288/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14288&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308726 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14288/head:pull/14288 PR: https://git.openjdk.org/jdk/pull/14288 From sviswanathan at openjdk.org Fri Jun 2 18:34:13 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 Jun 2023 18:34:13 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v4] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 17:20:48 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Code cleanup src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 430: > 428: __ jcc(Assembler::aboveEqual, L_112a); > 429: // res = y + y; > 430: __ vaddsd(xmm0, xmm0, xmm1); Should this be: __ vaddsd(xmm0, xmm1, xmm1); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1214687840 From sgibbons at openjdk.org Fri Jun 2 18:43:10 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 2 Jun 2023 18:43:10 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v4] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 18:31:24 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Code cleanup > > src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 430: > >> 428: __ jcc(Assembler::aboveEqual, L_112a); >> 429: // res = y + y; >> 430: __ vaddsd(xmm0, xmm0, xmm1); > > Should this be: __ vaddsd(xmm0, xmm1, xmm1); This is correct as written according to the disassembly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1214694297 From sgibbons at openjdk.org Fri Jun 2 19:27:58 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 2 Jun 2023 19:27:58 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v5] In-Reply-To: References: Message-ID: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Correct transliteration issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/904d6d94..26a821f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From sgibbons at openjdk.org Fri Jun 2 19:28:00 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 2 Jun 2023 19:28:00 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v4] In-Reply-To: References: Message-ID: <4LrL31I4nHlqJ1Ab3tWxYveKMx-eQJm_nyymf3Kz_f0=.b4f9ad46-1781-4cf2-afc2-00810b9aa630@github.com> On Fri, 2 Jun 2023 18:39:59 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 430: >> >>> 428: __ jcc(Assembler::aboveEqual, L_112a); >>> 429: // res = y + y; >>> 430: __ vaddsd(xmm0, xmm0, xmm1); >> >> Should this be: __ vaddsd(xmm0, xmm1, xmm1); > > This is correct as written according to the disassembly. After further investigation, you are correct. Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1214738035 From sgibbons at openjdk.org Fri Jun 2 22:52:48 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 2 Jun 2023 22:52:48 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Indentation; spread source into assembly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/26a821f9..85999cd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=04-05 Stats: 447 lines in 1 file changed: 0 ins; 26 del; 421 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From sviswanathan at openjdk.org Fri Jun 2 23:25:10 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 Jun 2023 23:25:10 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 22:52:48 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Indentation; spread source into assembly @asgibbons Thanks for taking care of all the review comments. The PR looks good to me now. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1458591764 From dzhang at openjdk.org Sat Jun 3 01:47:05 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Sat, 3 Jun 2023 01:47:05 GMT Subject: RFR: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V [v2] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 07:09:42 GMT, Ludovic Henry wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing space > > Marked as reviewed by luhenry (Committer). @luhenry @yhzhu20 @RealFYang @feilongjiang Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14256#issuecomment-1574542879 From dzhang at openjdk.org Sat Jun 3 02:30:18 2023 From: dzhang at openjdk.org (Dingli Zhang) Date: Sat, 3 Jun 2023 02:30:18 GMT Subject: Integrated: 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V In-Reply-To: References: Message-ID: <13mA-K_rjXZ9eMwWau_KJPvoVzCHSuXw_T3HJukv1dg=.b00e11e8-750b-4290-853e-6a115403215e@github.com> On Thu, 1 Jun 2023 05:45:14 GMT, Dingli Zhang wrote: > [JDK-8274242](https://bugs.openjdk.org/browse/JDK-8274242) propose to extend the x86 ISO-8859-1 encoding intrinsic to work > also for ASCII-compatible encodings, which helps speeding up various > CharsetEncoders. Implementing a similar intrinsic should be considered on > RISC-V as well. > > The instruct log with -XX:+PrintOptoAssembly output looks like: > > 06e + Encode ISO array R12, R11, R13 -> R10 # KILL R12, R11, R13, R7, V0-V3 > > > ## Testing: > qemu w/ UseRVV: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/hotspot/jtreg/compiler/intrinsics/string/TestEncodeIntrinsics.java This pull request has now been integrated. Changeset: 61bb014a Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/61bb014a8692305c705a4cf0361e319275c35ca3 Stats: 48 lines in 4 files changed: 30 ins; 0 del; 18 mod 8309254: Implement fast-path for ASCII-compatible CharsetEncoders on RISC-V Reviewed-by: luhenry, yzhu, fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/14256 From gcao at openjdk.org Sat Jun 3 02:40:03 2023 From: gcao at openjdk.org (Gui Cao) Date: Sat, 3 Jun 2023 02:40:03 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v3] In-Reply-To: References: Message-ID: > Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. > > While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. > > [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 > > ### Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) > - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Update dst_src1/dst_src in format - Merge branch 'master' into JDK-8309332 - Update vneg/vfneg instruct format - RISC-V: Improve PrintOptoAssembly output of vector nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14279/files - new: https://git.openjdk.org/jdk/pull/14279/files/9e0430c0..94a2a21b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14279&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14279&range=01-02 Stats: 7063 lines in 134 files changed: 5813 ins; 771 del; 479 mod Patch: https://git.openjdk.org/jdk/pull/14279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14279/head:pull/14279 PR: https://git.openjdk.org/jdk/pull/14279 From gcao at openjdk.org Sat Jun 3 02:40:06 2023 From: gcao at openjdk.org (Gui Cao) Date: Sat, 3 Jun 2023 02:40:06 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v2] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 09:31:35 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Update vneg/vfneg instruct format > > src/hotspot/cpu/riscv/riscv_v.ad line 344: > >> 342: match(Set dst_src1 (AddVL (Binary dst_src1 src2) v0)); >> 343: ins_cost(VEC_COST); >> 344: format %{ "vadd_masked $dst_src1, $src2, $v0" %} > > Suggestion: `format %{ "vadd_masked $dst_src1, $dst_src1, $src2, $v0" %}` Thanks for the review, and has been fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14279#discussion_r1215137062 From fyang at openjdk.org Sat Jun 3 02:52:07 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 3 Jun 2023 02:52:07 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v3] In-Reply-To: References: Message-ID: On Sat, 3 Jun 2023 02:40:03 GMT, Gui Cao wrote: >> Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. >> >> While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. >> >> [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 >> >> ### Testing: >> qemu with UseRVV: >> >> - [ ] Tier1 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) >> - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Update dst_src1/dst_src in format > - Merge branch 'master' into JDK-8309332 > - Update vneg/vfneg instruct format > - RISC-V: Improve PrintOptoAssembly output of vector nodes Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14279#pullrequestreview-1458806375 From epeter at openjdk.org Sat Jun 3 12:20:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 3 Jun 2023 12:20:10 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v4] In-Reply-To: References: Message-ID: <9yK0qFsfTPc6LuQVVlSJzuEJ3P2TF9LT7i5DIyXn4kg=.fa8a11c5-e869-434a-91c0-bfc37bd20fcb@github.com> On Fri, 2 Jun 2023 12:27:33 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Re-apply extraction of min/max building after JDK-8309295 > - Merge branch 'master' into JDK-8302673 > - Abort idealization if any of the adds has a TOP input > - Revert extraction of min/max building > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - ... and 20 more: https://git.openjdk.org/jdk/compare/7b0a3360...cfcc16fd Looks good, thanks for all the refactoring! ------------- Marked as reviewed by epeter (Committer). PR Review: https://git.openjdk.org/jdk/pull/13924#pullrequestreview-1459278087 From vkempik at openjdk.org Sat Jun 3 12:42:08 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 3 Jun 2023 12:42:08 GMT Subject: RFR: 8309405: RISC-V: is_deopt may produce unaligned memory read Message-ID: Please review this simple fix, a continuation of JDK-8291550. Doing some profiling for trp_lam event (misaligned load emulation) on fpga I've found some more misaligned loads, pretty rare but still happens. Here, is_deopt directly dereferences memory address, but with RVC enabled, a single 4-byte intruction could be 2-bytes, but not 4-bytes aligned. So is_deopt should use ld_instr to be on safe side. Testing: tbd, gonna test tier1 and update this PR. ------------- Commit messages: - fix is_deopt to prevent misaligned access Changes: https://git.openjdk.org/jdk/pull/14299/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14299&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309405 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14299.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14299/head:pull/14299 PR: https://git.openjdk.org/jdk/pull/14299 From tonyp at openjdk.org Sat Jun 3 12:57:28 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Sat, 3 Jun 2023 12:57:28 GMT Subject: RFR: 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes [v2] In-Reply-To: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> References: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> Message-ID: > 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes Antonios Printezis has updated the pull request incrementally with one additional commit since the last revision: fixed minor typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14288/files - new: https://git.openjdk.org/jdk/pull/14288/files/11657978..58b0e798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14288&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14288&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14288.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14288/head:pull/14288 PR: https://git.openjdk.org/jdk/pull/14288 From jbhateja at openjdk.org Sun Jun 4 15:36:21 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 Jun 2023 15:36:21 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 22:52:48 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Indentation; spread source into assembly src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3941: > 3939: generate_libm_stubs(); > 3940: > 3941: if ((UseAVX >= 1) && (VM_Version::supports_avx512vlbwdq() || VM_Version::supports_fma())) { We can relax this to supports_evex instead of vlbwdq. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 79: > 77: __ enter(); // required for proper stackwalking of RuntimeStub frame > 78: > 79: if (VM_Version::supports_avx512vlbwdq()) { // AVX512 version We can relax this to supports_evex(). src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 121: > 119: // // |x|, |y| > 120: // a = DP_AND(x, DP_CONST(7fffffffffffffff)); > 121: __ movq(xmm0, xmm0); Redundatn move. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 122: > 120: // a = DP_AND(x, DP_CONST(7fffffffffffffff)); > 121: __ movq(xmm0, xmm0); > 122: __ mov64(rax, 0x7FFFFFFFFFFFFFFF); ULL suffice missing long constant. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 123: > 121: __ movq(xmm0, xmm0); > 122: __ mov64(rax, 0x7FFFFFFFFFFFFFFF); > 123: __ evpbroadcastq(xmm3, rax, Assembler::AVX_128bit); Replace broadcast with cheaper PINSRQ. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 134: > 132: __ evdivsd(xmm0, xmm6, xmm5, Assembler::EVEX_RZ); > 133: // q = DP_ROUND_RZ(q); > 134: __ movq(xmm0, xmm0); Redundant movq src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 145: > 143: __ jcc(Assembler::equal, L_5280); > 144: // if (eq >= 0x7fefffffu) goto SPECIAL_FMOD; > 145: __ cmpl(rax, 0x7feffffe); Comment mention comparison against 0x7feffff. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 160: > 158: __ jcc(Assembler::below, L_5300); > 159: __ movsd(xmm0, ExternalAddress((address)CONST_INF), rax); > 160: // return DP_FNMA(b, q, a); // NaN Misplaced comment for NaN already present at 204 src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 168: > 166: __ jmp(L_exit); > 167: // if (!eq) return x + sgn_a; > 168: __ align32(); Redundant alignment ? src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 192: > 190: __ evdivsd(xmm2, xmm6, xmm5, Assembler::EVEX_RZ); > 191: // q = DP_ROUND_RZ(q); > 192: __ movq(xmm2, xmm2); Redundant move. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 215: > 213: __ evdivsd(xmm0, xmm6, xmm2, Assembler::EVEX_RZ); > 214: // q = DP_ROUND_RZ(q); > 215: __ movq(xmm0, xmm0); Redundant move. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 264: > 262: __ evdivsd(xmm0, xmm7, xmm2, Assembler::EVEX_RZ); > 263: // q = DP_ROUND_RZ(q); > 264: __ movq(xmm0, xmm0); Redundant move. src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 305: > 303: // // sign(x) > 304: // sgn_a = DP_XOR(x, a); > 305: __ mov64(rcx, 0x8000000000000000); ULL suffice in long constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216815105 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216810945 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216613005 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216612764 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216616520 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216617108 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216622400 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216733715 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216736764 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216756042 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216767150 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216779440 PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216630301 From jbhateja at openjdk.org Sun Jun 4 15:36:22 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 Jun 2023 15:36:22 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> Message-ID: On Thu, 1 Jun 2023 11:00:43 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 306: > >> 304: >> 305: Label L_104a, L_11bd, L_10c1, L_1090, L_11b9, L_10e7, L_11af, L_111c, L_10f3, L_116e, L_112a; >> 306: Label L_1173, L_1157, L_117f, L_11a0; > > For the sake of clarity, can we segregate AVX2 functionality into a separate routine and indent the block. Will be good to have this part in a separate routines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216632754 From stuefe at openjdk.org Sun Jun 4 15:45:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 4 Jun 2023 15:45:39 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates Message-ID: We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. Tests: manually ran tests on linux aarch64, OsX aarch64 ------------- Commit messages: - fix build errors - no need for LITable - hardcode-aarch64-logical-imms Changes: https://git.openjdk.org/jdk/pull/14304/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14304&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309410 Stats: 1504 lines in 5 files changed: 1463 ins; 33 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14304.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14304/head:pull/14304 PR: https://git.openjdk.org/jdk/pull/14304 From jsjolen at openjdk.org Sun Jun 4 15:45:39 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 4 Jun 2023 15:45:39 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 11:32:55 GMT, Thomas Stuefe wrote: > We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. > > If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. > > I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. > > Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. > > Tests: manually ran tests on linux aarch64, OsX aarch64 Hi Thomas, The idea makes sense, but can't we do this with `constexpr` only? Looking at the intiailization code, this seems doable. Only issue would be sorting, as that'd have to be reimplemented as a contexpr function. No big table, no test that ensures that the table hasn't drifted away from code that's actually dead, seems like a big win! Here's a sketch in constexpr which I think serves as a PoC that this could work: https://godbolt.org/z/o3PTbanYY And for completeness here's the full source code replicated in this comment: ```c++ #include struct LITableStruct { // couldn't bother with imports for sized (u)ints static const unsigned int tbl_sz = 1 << 13; int LITable[tbl_sz]; unsigned long InvLITable[tbl_sz]; // We can dereference pointers, compare and set them in constexpr // (in other words, we can write a sorting routine) constexpr void comp_and_swap(int* a, int* b) { if (*a < *b) { *a = *b; } } // Magic incantation! constexpr LITableStruct() : LITable(), InvLITable() { // comp and swap compiles int a[2] = {0, 1}; comp_and_swap(&a[0], &a[1]); comp_and_swap(&LITable[0], &LITable[255]); // loop and assign no problem for (unsigned int idx = 0; idx < tbl_sz; idx++) { LITable[idx] = LITable[0] + 255; // just an if statement depending on some other state // bit twiddling and other nonsense if (LITable[0] == 1337 || LITable[67] & 5 | 37) { LITable[2500] = 73; } } } }; static struct LITableStruct my_table{}; int main() { printf("Wow, an int: %d\n", my_table.LITable[0]); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1575578446 From stuefe at openjdk.org Sun Jun 4 15:45:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 4 Jun 2023 15:45:39 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 13:53:21 GMT, Johan Sj?len wrote: >> We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. >> >> If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. >> >> I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. >> >> Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. >> >> Tests: manually ran tests on linux aarch64, OsX aarch64 > > Hi Thomas, > > The idea makes sense, but can't we do this with `constexpr` only? Looking at the intiailization code, this seems doable. Only issue would be sorting, as that'd have to be reimplemented as a contexpr function. No big table, no test that ensures that the table hasn't drifted away from code that's actually dead, seems like a big win! > > Here's a sketch in constexpr which I think serves as a PoC that this could work: https://godbolt.org/z/o3PTbanYY > > And for completeness here's the full source code replicated in this comment: > > ```c++ > #include > struct LITableStruct { > // couldn't bother with imports for sized (u)ints > static const unsigned int tbl_sz = 1 << 13; > int LITable[tbl_sz]; > unsigned long InvLITable[tbl_sz]; > > // We can dereference pointers, compare and set them in constexpr > // (in other words, we can write a sorting routine) > constexpr void comp_and_swap(int* a, int* b) { > if (*a < *b) { > *a = *b; > } > } > // Magic incantation! > constexpr LITableStruct() > : LITable(), InvLITable() { > // comp and swap compiles > int a[2] = {0, 1}; > comp_and_swap(&a[0], &a[1]); > comp_and_swap(&LITable[0], &LITable[255]); > > // loop and assign no problem > for (unsigned int idx = 0; idx < tbl_sz; idx++) { > LITable[idx] = LITable[0] + 255; > // just an if statement depending on some other state > // bit twiddling and other nonsense > if (LITable[0] == 1337 || LITable[67] & 5 | 37) { > LITable[2500] = 73; > } > } > } > }; > > static struct LITableStruct my_table{}; > int main() { > printf("Wow, an int: %d\n", my_table.LITable[0]); > } Hi @jdksjolen, > The idea makes sense, but can't we do this with constexpr only? Maybe , but - I don't have much more time to spend on this; this was a little Sunday thing, but it shouldn't spiral out into a C++ research project (since I would have to babysit every compiler to make sure it generates what I assume it generates) - I don't see why: we'd try to cajole the C++ compiler into generating a table that we just as well may write out plainly. Easier to understand, faster to compile, less reliance on the abilities of the C++ compiler. The end result is identical - a hard-coded table in the text segment. - My variant is easy to review, since I did not change the old generator code. We could throw the testing code out at some point in the future. Cheers, Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1575613640 From jbhateja at openjdk.org Sun Jun 4 15:55:10 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 Jun 2023 15:55:10 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v2] In-Reply-To: <2m8KCrlZkRTg4pBAwASL4FKc_MFtL-POWmhG0ebAiwQ=.36cdb946-abb1-4938-aaa3-327775b26d65@github.com> References: <2NOPy1QG4rGLMmXNTv_6E6WCKdRCLg466z_tGqo3xeE=.183282f8-8068-4bc0-941b-81b9a29138be@github.com> <2m8KCrlZkRTg4pBAwASL4FKc_MFtL-POWmhG0ebAiwQ=.36cdb946-abb1-4938-aaa3-327775b26d65@github.com> Message-ID: On Thu, 1 Jun 2023 16:00:19 GMT, Scott Gibbons wrote: >> Hi @asgibbons , >> Kindly also include the results for following benchmark >> test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java >> >> Best Regards, >> Jatin > >> Hi @asgibbons , Kindly also include the results for following benchmark test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java >> >> Best Regards, Jatin > > Current top-of-tree results: > > Benchmark Mode Cnt Score Error Units > DremFrem.calcDoubleJava avgt 25 7.034 ? 0.001 ns/op > DremFrem.calcFloatJava avgt 25 7.011 ? 0.001 ns/op > DremFrem.cornercaseDoubleJava avgt 25 5.514 ? 0.006 ns/op > DremFrem.cornercaseFloatJava avgt 25 5.510 ? 0.003 ns/op > > > My changes: > > Benchmark Mode Cnt Score Error Units > DremFrem.calcDoubleJava avgt 25 2.916 ? 0.001 ns/op > DremFrem.calcFloatJava avgt 25 4.011 ? 0.001 ns/op > DremFrem.cornercaseDoubleJava avgt 25 5.518 ? 0.008 ns/op > DremFrem.cornercaseFloatJava avgt 25 5.515 ? 0.007 ns/op Hi @asgibbons , It will be good to back the special case handlings in the patch with a test case. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1575618276 From sgibbons at openjdk.org Sun Jun 4 17:21:11 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sun, 4 Jun 2023 17:21:11 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: <6AEj5wZji9ONqIxU-fTfIn6TF9bEttbQDGUerF79u-U=.b779a37d-45d0-445e-9ab9-b74f412f2038@github.com> On Sun, 4 Jun 2023 12:05:52 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Indentation; spread source into assembly > > src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 145: > >> 143: __ jcc(Assembler::equal, L_5280); >> 144: // if (eq >= 0x7fefffffu) goto SPECIAL_FMOD; >> 145: __ cmpl(rax, 0x7feffffe); > > Comment mention comparison against 0x7feffff. This is an artifact of block reordering by the compiler and should be correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216929285 From sgibbons at openjdk.org Sun Jun 4 17:34:11 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sun, 4 Jun 2023 17:34:11 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 11:56:48 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Indentation; spread source into assembly > > src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 121: > >> 119: // // |x|, |y| >> 120: // a = DP_AND(x, DP_CONST(7fffffffffffffff)); >> 121: __ movq(xmm0, xmm0); > > Redundatn move. I do not believe these are redundant, as the upper quadword of the register is cleared as a side-effect of the vmovq. I do not believe the icx compiler would insert random redundant vmovq instructions at this optimization level. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1216942237 From sgibbons at openjdk.org Sun Jun 4 17:42:43 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sun, 4 Jun 2023 17:42:43 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v7] In-Reply-To: References: Message-ID: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/85999cd1..1b44cd62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From jsjolen at openjdk.org Sun Jun 4 18:51:04 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sun, 4 Jun 2023 18:51:04 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 13:53:21 GMT, Johan Sj?len wrote: >> We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. >> >> If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. >> >> I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. >> >> Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. >> >> Tests: manually ran tests on linux aarch64, OsX aarch64 > > Hi Thomas, > > The idea makes sense, but can't we do this with `constexpr` only? Looking at the intiailization code, this seems doable. Only issue would be sorting, as that'd have to be reimplemented as a contexpr function. No big table, no test that ensures that the table hasn't drifted away from code that's actually dead, seems like a big win! > > Here's a sketch in constexpr which I think serves as a PoC that this could work: https://godbolt.org/z/o3PTbanYY > > And for completeness here's the full source code replicated in this comment: > > ```c++ > #include > struct LITableStruct { > // couldn't bother with imports for sized (u)ints > static const unsigned int tbl_sz = 1 << 13; > int LITable[tbl_sz]; > unsigned long InvLITable[tbl_sz]; > > // We can dereference pointers, compare and set them in constexpr > // (in other words, we can write a sorting routine) > constexpr void comp_and_swap(int* a, int* b) { > if (*a < *b) { > *a = *b; > } > } > // Magic incantation! > constexpr LITableStruct() > : LITable(), InvLITable() { > // comp and swap compiles > int a[2] = {0, 1}; > comp_and_swap(&a[0], &a[1]); > comp_and_swap(&LITable[0], &LITable[255]); > > // loop and assign no problem > for (unsigned int idx = 0; idx < tbl_sz; idx++) { > LITable[idx] = LITable[0] + 255; > // just an if statement depending on some other state > // bit twiddling and other nonsense > if (LITable[0] == 1337 || LITable[67] & 5 | 37) { > LITable[2500] = 73; > } > } > } > }; > > static struct LITableStruct my_table{}; > int main() { > printf("Wow, an int: %d\n", my_table.LITable[0]); > } > Hi @jdksjolen, > > > The idea makes sense, but can't we do this with constexpr only? > > Maybe , but I don't see why: we'd try to cajole the C++ compiler into generating a table that we just as well may write out plainly. Easier to understand, faster to compile, less reliance on the abilities of the C++ compiler. The result is identical - a hard-coded table in the text segment. > > Cheers, Thomas Hi Thomas, Especially good point on ensuring that all compilers do what we expect them to do, I didn't consider that. I'm just Sunday reviewing today, I'll give you a complete review tomorrow. Cheers, Johan ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1575672057 From gcao at openjdk.org Mon Jun 5 00:19:10 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 5 Jun 2023 00:19:10 GMT Subject: RFR: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes [v3] In-Reply-To: References: Message-ID: On Sat, 3 Jun 2023 02:49:15 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Update dst_src1/dst_src in format >> - Merge branch 'master' into JDK-8309332 >> - Update vneg/vfneg instruct format >> - RISC-V: Improve PrintOptoAssembly output of vector nodes > > Marked as reviewed by fyang (Reviewer). @RealFYang @yhzhu20 Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14279#issuecomment-1575846548 From gcao at openjdk.org Mon Jun 5 00:22:12 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 5 Jun 2023 00:22:12 GMT Subject: Integrated: 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 02:08:29 GMT, Gui Cao wrote: > Hi, Currently in the vector node implementation, some of the instruction formats are part of the rvv assembly instructions, for which the assembly instructions are also incomplete, such as the missing vsetvli assembly instruction(vsetvli is used to set the element width, number, etc), This makes the compiler assembly output by -XX:+PrintOptoAssembly difficult to understand. And if every assembly instruction is reflected, it will be redundant and increase the maintenance work in the future. And referring to other CPUs, such as the ARM64, it is straightforward to use the instruct function name to simplify [1]. > > While this won't affect release build, we should fix this for debug build. We can use the -XX:+PrintOptoAssembly parameter to print the compiler assembly output by -XX:+PrintOptoAssembly and view the assembly logic in conjunction with the source code of the specific vector node. > > [1] https://github.com/openjdk/jdk/blob/4460429d7a50b9a7a99058ef4e5ae36fb30b956f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2842-L2846 > > ### Testing: > qemu with UseRVV: > > - [ ] Tier1 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) > - [x] test/jdk/jdk/incubator/vector/Int256VectorTests.java (fastdebug with -XX:+PrintOptoAssembly) This pull request has now been integrated. Changeset: 08c91c22 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/08c91c22126d9fdf06eff4df7a04dcde32003b61 Stats: 106 lines in 1 file changed: 0 ins; 21 del; 85 mod 8309332: RISC-V: Improve PrintOptoAssembly output of vector nodes Reviewed-by: yzhu, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14279 From fyang at openjdk.org Mon Jun 5 00:45:04 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Jun 2023 00:45:04 GMT Subject: RFR: 8309405: RISC-V: is_deopt may produce unaligned memory read In-Reply-To: References: Message-ID: On Sat, 3 Jun 2023 12:34:20 GMT, Vladimir Kempik wrote: > Please review this simple fix, a continuation of JDK-8291550. > > Doing some profiling for trp_lam event (misaligned load emulation) on fpga I've found some more misaligned loads, pretty rare but still happens. > Here, is_deopt directly dereferences memory address, but with RVC enabled, a single 4-byte intruction could be 2-bytes, but not 4-bytes aligned. > So is_deopt should use ld_instr to be on safe side. > > Testing: tier1 on hifive clean. Looks fine. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14299#pullrequestreview-1461471102 From fyang at openjdk.org Mon Jun 5 03:54:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Jun 2023 03:54:03 GMT Subject: RFR: 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes [v2] In-Reply-To: References: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> Message-ID: On Sat, 3 Jun 2023 12:57:28 GMT, Antonios Printezis wrote: >> 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes > > Antonios Printezis has updated the pull request incrementally with one additional commit since the last revision: > > fixed minor typo Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14288#pullrequestreview-1461579741 From thartmann at openjdk.org Mon Jun 5 05:39:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jun 2023 05:39:10 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> References: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> Message-ID: On Fri, 2 Jun 2023 13:23:34 GMT, Emanuel Peter wrote: >> This is the fix to a regression caused in the CMoveV fix JDK-8306302. >> >> I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. >> >> **Solution** >> >> However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. >> >> **Testing** >> I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. >> >> I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. >> **Running... but so far all good** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Roberto's suggestions Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14268#pullrequestreview-1461709992 From gcao at openjdk.org Mon Jun 5 06:17:23 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 5 Jun 2023 06:17:23 GMT Subject: RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes Message-ID: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- ### AddReductionVF/AddReductionVD We can use Float256VectorTests.java Double256VectorTests.java to emit these nodes and the compilation log is as follows: #### AddReductionVF Before this patch: 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 0f6 # castII of R19, #@castII 0f6 addw R10, R19, zr #@convI2L_reg_reg 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm 0fc add R11, R31, R10 # ptr, #@addP_reg_reg 100 addi R11, R11, #16 # ptr, #@addP_reg_imm 102 loadV V1, [R11] # vector (rvv) 10a spill F0 -> F1 # spill size = 32 10e reduce_addF F1, F1, V1 # KILL V2 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 After this patch(Saving a spill operation): 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 0f6 # castII of R19, #@castII 0f6 addw R10, R19, zr #@convI2L_reg_reg 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm 0fc add R11, R31, R10 # ptr, #@addP_reg_reg 100 addi R11, R11, #16 # ptr, #@addP_reg_imm 102 loadV V1, [R11] # vector (rvv) 10a reduce_addF F1, F0, V1 # KILL V2 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 #### AddReductionVD Before this patch: 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 0f4 # castII of R9, #@castII 0f4 addw R10, R9, zr #@convI2L_reg_reg 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm 0fa add R11, R30, R10 # ptr, #@addP_reg_reg 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm 100 loadV V1, [R11] # vector (rvv) 108 spill F0 -> F1 # spill size = 64 10c reduce_addD F1, F1, V1 # KILL V2 11c bgeu R9, R31, B61 #@cmpU_branch P=0.000001 C=-1.000000 After this patch(Saving a spill operation): 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 0f4 # castII of R9, #@castII 0f4 addw R10, R9, zr #@convI2L_reg_reg 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm 0fa add R11, R30, R10 # ptr, #@addP_reg_reg 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm 100 loadV V1, [R11] # vector (rvv) 108 reduce_addD F1, F0, V1 # KILL V2 118 bgeu R9, R31, B61 #@cmpU_branch P=0.000001 C=-1.000000 - [x] Tier1 tests (release) - [x] Tier2 tests (release) - [ ] Tier3 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes Changes: https://git.openjdk.org/jdk/pull/14308/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14308&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309419 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14308.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14308/head:pull/14308 PR: https://git.openjdk.org/jdk/pull/14308 From thartmann at openjdk.org Mon Jun 5 06:27:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jun 2023 06:27:05 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v3] In-Reply-To: References: Message-ID: <7VzgzO7gUeBXJ8pho3xgcSpJxfePrS4Msgc419Arou0=.63acce13-a7a1-4bc5-9370-b7f5d0265279@github.com> On Thu, 1 Jun 2023 06:33:56 GMT, Chang Peng wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update ProblemList and test case Looks good to me. Testing passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14245#pullrequestreview-1461792062 From duke at openjdk.org Mon Jun 5 06:41:04 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 06:41:04 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v4] In-Reply-To: References: Message-ID: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into fix_truecount - Update ProblemList and test case - Merge branch 'openjdk:master' into fix_truecount - 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. TEST passed on AArch64: hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Change-Id: I2a224a24b83bbbb9289648d88351de6adb24b760 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14245/files - new: https://git.openjdk.org/jdk/pull/14245/files/c15e7b8e..a5342348 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=02-03 Stats: 14235 lines in 186 files changed: 12502 ins; 891 del; 842 mod Patch: https://git.openjdk.org/jdk/pull/14245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14245/head:pull/14245 PR: https://git.openjdk.org/jdk/pull/14245 From duke at openjdk.org Mon Jun 5 06:41:06 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 06:41:06 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v4] In-Reply-To: <7VzgzO7gUeBXJ8pho3xgcSpJxfePrS4Msgc419Arou0=.63acce13-a7a1-4bc5-9370-b7f5d0265279@github.com> References: <7VzgzO7gUeBXJ8pho3xgcSpJxfePrS4Msgc419Arou0=.63acce13-a7a1-4bc5-9370-b7f5d0265279@github.com> Message-ID: <1LcKrby8SXGCyzc4YWF8puAH6Mb1ODQc1zwbMWLCvBg=.6ad09804-5a70-43da-9f16-d33db5a552d7@github.com> On Mon, 5 Jun 2023 06:24:44 GMT, Tobias Hartmann wrote: > Looks good to me. Testing passed. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1576128916 From epeter at openjdk.org Mon Jun 5 06:46:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jun 2023 06:46:18 GMT Subject: RFR: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 [v2] In-Reply-To: References: <5gWeS_m5wf5Sp3nffxRKGVX5vI_pPHTOjBdRXma9X6g=.03d596c3-fa51-4aff-b276-ac085e6ce815@github.com> Message-ID: On Fri, 2 Jun 2023 13:38:59 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Roberto's suggestions > > Thanks for addressing my suggestions, looks good! Thanks @robcasloz for the review and suggestions! Thanks @TobiHartmann @vnkozlov for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14268#issuecomment-1576136499 From epeter at openjdk.org Mon Jun 5 06:46:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 Jun 2023 06:46:20 GMT Subject: Integrated: 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 16:19:41 GMT, Emanuel Peter wrote: > This is the fix to a regression caused in the CMoveV fix JDK-8306302. > > I had implicitly assumed that all `Cmp` in the loop also have their `in(1)` inside the loop (`in_bb`). This is not always true, and hence we hit the assert. > > **Solution** > > However, we know that at least one of the two inputs of a `Cmp` must also be in the loop, else the `Cmp` would float outside the loop. So if `in(1)` is not in the loop then we can just pick `in(2)` for the `velt_type`. > > **Testing** > I added 2 regression tests that were provided in the bug, and also extended `TestVectorConditionalMove.java` (though it is currently problemlisted because of an IR framework bug). This extension also triggered the assert, and now properly vectorizes. > > I tested up to tier6 and stress testing. I ran the tests both with `TestVectorConditionalMove` problemlisted and without the problemlisting. This pull request has now been integrated. Changeset: 22a9a86b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/22a9a86be088a3e92b231e7180a134f63716cc87 Stats: 135 lines in 3 files changed: 132 ins; 0 del; 3 mod 8309268: C2: "assert(in_bb(n)) failed: must be" after JDK-8306302 Reviewed-by: rcastanedalo, kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14268 From duke at openjdk.org Mon Jun 5 06:52:21 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 06:52:21 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v5] In-Reply-To: References: Message-ID: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Update test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14245/files - new: https://git.openjdk.org/jdk/pull/14245/files/a5342348..48aa7298 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14245&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14245/head:pull/14245 PR: https://git.openjdk.org/jdk/pull/14245 From duke at openjdk.org Mon Jun 5 06:55:11 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 06:55:11 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v3] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 13:50:19 GMT, Evgeny Astigeevich wrote: >> Chang Peng has updated the pull request incrementally with one additional commit since the last revision: >> >> Update ProblemList and test case > > test/hotspot/jtreg/compiler/vectorapi/TestVectorMaskTrueCount.java line 62: > >> 60: } >> 61: >> 62: static int maskAndTrueCount(boolean[] a, boolean[] b, int idx, int SPECIES_length) { > > `SPECIES_length` confuses me. The first confusing thing is `SPECIES` in capital letters. The second is the name itself. From the code I see it has meaning of `count`. Maybe using `count` instead would be better? Thanks, I have modified the name of this parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14245#discussion_r1217603756 From luhenry at openjdk.org Mon Jun 5 07:11:04 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 5 Jun 2023 07:11:04 GMT Subject: RFR: 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes [v2] In-Reply-To: References: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> Message-ID: On Sat, 3 Jun 2023 12:57:28 GMT, Antonios Printezis wrote: >> 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes > > Antonios Printezis has updated the pull request incrementally with one additional commit since the last revision: > > fixed minor typo Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14288#pullrequestreview-1461849783 From rcastanedalo at openjdk.org Mon Jun 5 07:12:29 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 Jun 2023 07:12:29 GMT Subject: RFR: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int [v4] In-Reply-To: References: Message-ID: On Fri, 2 Jun 2023 12:27:33 GMT, Roberto Casta?eda Lozano wrote: >> The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: >> 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. >> 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. >> >> Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: >> >> ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) >> >> Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: >> >> ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) >> >> The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. >> >> #### Testing >> >> ##### Functionality >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> >> ##### Performance >> >> - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Re-apply extraction of min/max building after JDK-8309295 > - Merge branch 'master' into JDK-8302673 > - Abort idealization if any of the adds has a TOP input > - Revert extraction of min/max building > - Complete test battery with remaining no-add cases > - Handle case without inner additions by restoring 'as_add_with_constant' to its previous state > - Add tests to exercise the case without inner additions > - Extract MinI/MaxI construction; pass around ConstAddOperands instead of individual components > - Merge branch 'master' into JDK-8302673 > - Defer op(x, x) to constant/identity propagation early > - ... and 20 more: https://git.openjdk.org/jdk/compare/7b0a3360...cfcc16fd Emanuel, Vladimir, thanks again for reviewing, and particularly thanks Emanuel for the useful feedback! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13924#issuecomment-1576168760 From rcastanedalo at openjdk.org Mon Jun 5 07:12:31 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 Jun 2023 07:12:31 GMT Subject: Integrated: 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:18:21 GMT, Roberto Casta?eda Lozano wrote: > The canonicalization of MinI/MaxI chains into right-spline graphs within `MinINode/MaxINode::Ideal()` inhibits the vectorization of reductions of these nodes. This changeset reworks `MinINode/MaxINode::Ideal()` to perform the same algebraic optimizations without the need for canonicalization, re-enabling auto-vectorization of MinI/MaxI reductions. This is achieved by handling all four permutations of the targeted Ideal subgraph induced by the commutativity of MinI/MaxI directly. The algorithm (for the MaxI case, the MinI case is analogous) tries to apply two Ideal graph rewrites in the following order, where `c0` and `c1` are constants and `MAX` is a compile-time operation: > 1. `max(x + c0, max(x + c1, z))` (or a permutation of it) to `max(x + MAX(c0, c1), z)`. > 2. `max(x + c0, x + c1)` (or a permutation of it) to `x + MAX(c0, c1)`. > > Here is an example of the four permutations handled in step 1 with `x = RShiftI`, `c0 = 100` or `150`, `c1 = 150` or `100`, and `z = ConI (#int:200)`: > > ![two-level-idealization](https://github.com/openjdk/jdk/assets/8792647/bf60a2c3-39cd-4f0d-965d-c711723e374c) > > Here is an example of the two permutations handled in step 2 with `x = RShiftI`, `c0 = 10` or `11`, and `c1 = 11` or `10`: > > ![one-level-idealization](https://github.com/openjdk/jdk/assets/8792647/0a1fe85b-3f30-46bc-8817-d90b3eff946c) > > The changeset implements `MinINode/MaxINode::Ideal()` in a common method `MaxNode::IdealI()`, since the algorithm is symmetric for both node types. The changeset also extends the existing MinI/MaxI Idealization tests with positive tests for all targeted permutations and negative tests, and adds a new test (contributed by @jbhateja) to assert that MinI/MaxI reductions are vectorized. > > #### Testing > > ##### Functionality > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > > ##### Performance > > - Tested performance on a set of standard benchmark suites (DaCapo, SPECjbb2015, SPECjvm2008). No significant change was observed. This pull request has now been integrated. Changeset: 3fa776d6 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/3fa776d66a8eb117410025bca870b2e7f3f00517 Stats: 513 lines in 5 files changed: 339 ins; 107 del; 67 mod 8302673: [SuperWord] MaxReduction and MinReduction should vectorize for int Co-authored-by: Jatin Bhateja Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13924 From luhenry at openjdk.org Mon Jun 5 07:33:14 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 5 Jun 2023 07:33:14 GMT Subject: RFR: 8309405: RISC-V: is_deopt may produce unaligned memory read In-Reply-To: References: Message-ID: On Sat, 3 Jun 2023 12:34:20 GMT, Vladimir Kempik wrote: > Please review this simple fix, a continuation of JDK-8291550. > > Doing some profiling for trp_lam event (misaligned load emulation) on fpga I've found some more misaligned loads, pretty rare but still happens. > Here, is_deopt directly dereferences memory address, but with RVC enabled, a single 4-byte intruction could be 2-bytes, but not 4-bytes aligned. > So is_deopt should use ld_instr to be on safe side. > > Testing: tier1 on hifive clean. Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14299#pullrequestreview-1461878920 From vkempik at openjdk.org Mon Jun 5 07:33:15 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 5 Jun 2023 07:33:15 GMT Subject: Integrated: 8309405: RISC-V: is_deopt may produce unaligned memory read In-Reply-To: References: Message-ID: On Sat, 3 Jun 2023 12:34:20 GMT, Vladimir Kempik wrote: > Please review this simple fix, a continuation of JDK-8291550. > > Doing some profiling for trp_lam event (misaligned load emulation) on fpga I've found some more misaligned loads, pretty rare but still happens. > Here, is_deopt directly dereferences memory address, but with RVC enabled, a single 4-byte intruction could be 2-bytes, but not 4-bytes aligned. > So is_deopt should use ld_instr to be on safe side. > > Testing: tier1 on hifive clean. This pull request has now been integrated. Changeset: a02d8001 Author: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/a02d8001fa43b379bee3803cda06a15a64d99ac2 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8309405: RISC-V: is_deopt may produce unaligned memory read Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/14299 From davleopo at openjdk.org Mon Jun 5 07:58:16 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Mon, 5 Jun 2023 07:58:16 GMT Subject: Integrated: 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal In-Reply-To: References: Message-ID: On Wed, 31 May 2023 08:46:19 GMT, David Leopoldseder wrote: > This PR fixes the UnsafeGetStableArrayElement test when run with the Graal compiler. > > In the past this test also failed with graal because it was checking for c1/c2 semantics. > JDK-8264135 introduced changes to the UnsafeGetStableArrayElement test to account for the scenario when the test is run with the Graal compiler. If Graal is used it will assert that constants are folded by asserting matching instead of mismatch. > > However, we had changes in Graal since then, since JDK-8275645 Graal no longer constant folds unaligned reads. > This lets the test fail again for the unaligned cases because it asserts graal folds them. > > The fix is to actually assert mismatch on unaligned accesses. This pull request has now been integrated. Changeset: 11fb5b22 Author: David Leopoldseder Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/11fb5b2209124bbf1100657e340ba5aebc3820d7 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod 8309104: [JVMCI] compiler/unsafe/UnsafeGetStableArrayElement test asserts wrong values with Graal Reviewed-by: dnsimon, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14242 From aph at openjdk.org Mon Jun 5 08:45:05 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 5 Jun 2023 08:45:05 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 11:32:55 GMT, Thomas Stuefe wrote: > We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. > > If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. > > I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. > > Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. > > Tests: manually ran tests on linux aarch64, OsX aarch64 We don't need a lookup table, because the boolean function is invertible. LLVM does it, for example. I never bothered to do it because it doesn't seem worth the effort. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1576377927 From fyang at openjdk.org Mon Jun 5 08:54:05 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 Jun 2023 08:54:05 GMT Subject: RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes In-Reply-To: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> References: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Message-ID: <8qTi1Ev6rxASDPsGoEAbx0DC1H3AOv3FZgnYpfWsp3E=.a1d3f0a0-82a2-43ce-9d8f-bc83e027b946@github.com> On Mon, 5 Jun 2023 06:09:55 GMT, Gui Cao wrote: > Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 > [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- > > ### AddReductionVF/AddReductionVD > We can use Float256VectorTests.java Double256VectorTests.java to > emit these nodes and the compilation log is as follows: > #### AddReductionVF > Before this patch: > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a spill F0 -> F1 # spill size = 32 > 10e reduce_addF F1, F1, V1 # KILL V2 > 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > After this patch(Saving a spill operation): > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a reduce_addF F1, F0, V1 # KILL V2 > 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > #### AddReductionVD > Before this patch: > > 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f4 # castII of R9, #@castII > 0f4 addw R10, R9, zr #@convI2L_reg_reg > 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm > 0fa add R11, R30, R10 # ptr, #@addP_reg_reg > 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm > 100 loadV V1, [R11] # ve... Looks reasonable to me. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14308#pullrequestreview-1462046926 From aph at openjdk.org Mon Jun 5 08:59:06 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 5 Jun 2023 08:59:06 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: <8wHNyWgy3gNczZmlNpFwWNGR3sgSIFOlUJTtnIAn5Mw=.98e0d6f1-e3eb-47c3-979b-450efb548a2c@github.com> On Sun, 4 Jun 2023 11:32:55 GMT, Thomas Stuefe wrote: > We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. > > If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. > > I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. > > Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. > > Tests: manually ran tests on linux aarch64, OsX aarch64 Whether this is a win depends on a bunch of things that are hard to predict, such as the time take to page in the table from mass storage. If we're really going to revisit this, and I'm not at all sure we should, we should be thorough. Can we compute the immediates directly? Yes. Can we replace the binary search with a hash table? Yes. If we have a precomputed table, must we include the program used to generate it? Yes. Is the extra code complexity justified? Don't know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1576398218 From xgong at openjdk.org Mon Jun 5 09:29:10 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 5 Jun 2023 09:29:10 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 06:52:21 GMT, Chang Peng wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update test case LGTM! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/14245#pullrequestreview-1462117398 From stuefe at openjdk.org Mon Jun 5 09:38:05 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 5 Jun 2023 09:38:05 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: <8wHNyWgy3gNczZmlNpFwWNGR3sgSIFOlUJTtnIAn5Mw=.98e0d6f1-e3eb-47c3-979b-450efb548a2c@github.com> References: <8wHNyWgy3gNczZmlNpFwWNGR3sgSIFOlUJTtnIAn5Mw=.98e0d6f1-e3eb-47c3-979b-450efb548a2c@github.com> Message-ID: On Mon, 5 Jun 2023 08:56:14 GMT, Andrew Haley wrote: > Whether this is a win depends on a bunch of things that are hard to predict, such as the time take to page in the table from mass storage. If we're really going to revisit this, and I'm not at all sure we should, we should be thorough. The table is ~64K, and we only use it very sparsely, since we only look up a small subset of possible encodings. So we'd only ever page in a small subset of pages. Computing the table at runtime involves paging in the whole space for both tables to populate them, apart from the bit fiddling. I assumed that would be more expensive. Measuring this would be an effort though. > If we're really going to revisit this, and I'm not at all sure we should, we should be thorough. My proposal was an incremental improvement and it deliberately does not touch the table format and the decoding. If you think it is not worth it, I'll close it - this was just a small improvement and there is no time for more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1576461199 From aph at openjdk.org Mon Jun 5 09:50:07 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 5 Jun 2023 09:50:07 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: <8wHNyWgy3gNczZmlNpFwWNGR3sgSIFOlUJTtnIAn5Mw=.98e0d6f1-e3eb-47c3-979b-450efb548a2c@github.com> Message-ID: On Mon, 5 Jun 2023 09:35:20 GMT, Thomas Stuefe wrote: > My proposal was an incremental improvement and it deliberately does not touch the table format and the decoding. If you think it is not worth it, I'll close it - this was just a small improvement and there is no time for more. I really don't know one way or another. I am sure, however, that for this patch to be complete it _must_ include the code used to generate the table, and that's a significant bump in complexity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1576477496 From duke at openjdk.org Mon Jun 5 09:50:07 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 09:50:07 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 06:52:21 GMT, Chang Peng wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update test case @eastig Hi, Could you please help to review this patch ? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14245#issuecomment-1576477574 From luhenry at openjdk.org Mon Jun 5 09:51:06 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 5 Jun 2023 09:51:06 GMT Subject: RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes In-Reply-To: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> References: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Message-ID: On Mon, 5 Jun 2023 06:09:55 GMT, Gui Cao wrote: > Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 > [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- > > ### AddReductionVF/AddReductionVD > We can use Float256VectorTests.java Double256VectorTests.java to > emit these nodes and the compilation log is as follows: > #### AddReductionVF > Before this patch: > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a spill F0 -> F1 # spill size = 32 > 10e reduce_addF F1, F1, V1 # KILL V2 > 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > After this patch(Saving a spill operation): > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a reduce_addF F1, F0, V1 # KILL V2 > 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > #### AddReductionVD > Before this patch: > > 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f4 # castII of R9, #@castII > 0f4 addw R10, R9, zr #@convI2L_reg_reg > 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm > 0fa add R11, R30, R10 # ptr, #@addP_reg_reg > 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm > 100 loadV V1, [R11] # ve... Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14308#pullrequestreview-1462158717 From stuefe at openjdk.org Mon Jun 5 10:00:15 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 5 Jun 2023 10:00:15 GMT Subject: RFR: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: <8wHNyWgy3gNczZmlNpFwWNGR3sgSIFOlUJTtnIAn5Mw=.98e0d6f1-e3eb-47c3-979b-450efb548a2c@github.com> Message-ID: On Mon, 5 Jun 2023 09:47:37 GMT, Andrew Haley wrote: > > My proposal was an incremental improvement and it deliberately does not touch the table format and the decoding. If you think it is not worth it, I'll close it - this was just a small improvement and there is no time for more. > > I really don't know one way or another. I am sure, however, that for this patch to be complete it _must_ include the code used to generate the table, and that's a significant bump in complexity. True. We can add this code separately from hotspot though, e.g. as python script. But since I don't have any more time to spend, I'll shelve it for now. Also, maybe there is a way to compute the encoding from the immediate bit pattern; it cannot be that hard. Maybe that's the better way to go. Anyway, thanks for looking at this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14304#issuecomment-1576489841 From stuefe at openjdk.org Mon Jun 5 10:00:16 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 5 Jun 2023 10:00:16 GMT Subject: Withdrawn: JDK-8309410: [aarch64] Use pre-computed tables for logical immediates In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 11:32:55 GMT, Thomas Stuefe wrote: > We compute the lookup tables for logical immediates on startup. We should use pre-computed tables instead. > > If we do that, we don't even need the encoding->immediate lookup table, since that is only used during generation of the reverse lookup table. Since we hardcode the latter, we don't need to store the former. > > I kept the old generator code around to test the hard-coded table (see gtest). To keep reviewing simple, the generator code itself lives still in the same place, mostly unchanged. > > Note that we could shave off some more space for the reverse lookup table by storing immediate and encoding in separate arrays, since we now pay 16 bytes per entry due to alignment. We could also reduce the encoding from 32 bits to 16 bits. I did not do this to not overload this RFE. > > Tests: manually ran tests on linux aarch64, OsX aarch64 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14304 From eastigeevich at openjdk.org Mon Jun 5 10:27:09 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 5 Jun 2023 10:27:09 GMT Subject: RFR: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 [v5] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 06:52:21 GMT, Chang Peng wrote: >> This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. >> >> TEST passed on AArch64: >> hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 >> >> [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- >> [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update test case LGTM ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/14245#pullrequestreview-1462222366 From duke at openjdk.org Mon Jun 5 11:36:16 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 5 Jun 2023 11:36:16 GMT Subject: Integrated: 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 In-Reply-To: References: Message-ID: On Wed, 31 May 2023 10:25:07 GMT, Chang Peng wrote: > This patch fixes the issue introduced by JDK-8307795. Since addv[1] cannot support "T2S" SIMD arrangement, we should use addp[2] in this case. > > TEST passed on AArch64: > hotspot:compiler/vectorapi, jdk:jdk/incubator/vector, tier1-3 > > [1]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDV--Add-across-Vector- > [2]: https://developer.arm.com/documentation/ddi0596/2020-12/SIMD-FP-Instructions/ADDP--vector---Add-Pairwise--vector-- This pull request has now been integrated. Changeset: 6d511f13 Author: changpeng1997 Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/6d511f1376e3a0183a484443d05142678bdaa1c2 Stats: 69 lines in 4 files changed: 46 ins; 7 del; 16 mod 8309129: AArch64: guarantee(T != T2S) failed: "incorrect arrangement" after JDK-8307795 Reviewed-by: thartmann, xgong, eastigeevich ------------- PR: https://git.openjdk.org/jdk/pull/14245 From chagedorn at openjdk.org Mon Jun 5 14:23:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 Jun 2023 14:23:08 GMT Subject: RFR: 8309472: IGV: Add dump_igv(custom_name) for improved debugging Message-ID: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: Compile::current()->print_method(PHASE_END, 3); But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: Compile::current()->dump_igv("foo"); Compile::current()->dump_igv("bar"); Thanks, Christian ------------- Commit messages: - 8309472: IGV: Add dump_igv(custom_name) for improved debugging Changes: https://git.openjdk.org/jdk/pull/14313/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14313&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309472 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14313.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14313/head:pull/14313 PR: https://git.openjdk.org/jdk/pull/14313 From jcking at openjdk.org Mon Jun 5 14:51:18 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 5 Jun 2023 14:51:18 GMT Subject: RFR: JDK-8304684: Memory leak in DirectivesParser::set_option_flag [v4] In-Reply-To: <9XO5we9RK8MKNE5HpGWLFySNOr6Y_TB6gXl13ksg0Yo=.dec7763e-9483-4c8c-ba79-7b6d47148d81@github.com> References: <9XO5we9RK8MKNE5HpGWLFySNOr6Y_TB6gXl13ksg0Yo=.dec7763e-9483-4c8c-ba79-7b6d47148d81@github.com> Message-ID: On Tue, 28 Mar 2023 14:30:55 GMT, Justin King wrote: >> Update `DirectivesSet` to take ownership of string options in some cases, to not leak memory. > > Justin King has updated the pull request incrementally with one additional commit since the last revision: > > Adjust logic based on review > > Signed-off-by: Justin King I'm hoping to get back to my OpenJDK contributions in the future. Just been insanely busy. If somebody wants to do it I won't be annoyed. Just very busy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13125#issuecomment-1576945652 From thartmann at openjdk.org Mon Jun 5 15:09:20 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 Jun 2023 15:09:20 GMT Subject: RFR: JDK-8304684: Memory leak in DirectivesParser::set_option_flag [v4] In-Reply-To: <9XO5we9RK8MKNE5HpGWLFySNOr6Y_TB6gXl13ksg0Yo=.dec7763e-9483-4c8c-ba79-7b6d47148d81@github.com> References: <9XO5we9RK8MKNE5HpGWLFySNOr6Y_TB6gXl13ksg0Yo=.dec7763e-9483-4c8c-ba79-7b6d47148d81@github.com> Message-ID: On Tue, 28 Mar 2023 14:30:55 GMT, Justin King wrote: >> Update `DirectivesSet` to take ownership of string options in some cases, to not leak memory. > > Justin King has updated the pull request incrementally with one additional commit since the last revision: > > Adjust logic based on review > > Signed-off-by: Justin King Okay, thanks for the update! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13125#issuecomment-1576978186 From tonyp at openjdk.org Mon Jun 5 15:43:24 2023 From: tonyp at openjdk.org (Antonios Printezis) Date: Mon, 5 Jun 2023 15:43:24 GMT Subject: Integrated: 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes In-Reply-To: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> References: <7d5aYHV-aS7JSS5mKVejzIfJ4lUcqZIyrsTaF9ojfzM=.4ac2239d-a655-4f0e-9f28-19b254d187f0@github.com> Message-ID: On Fri, 2 Jun 2023 17:53:03 GMT, Antonios Printezis wrote: > 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes This pull request has now been integrated. Changeset: 5cd8af76 Author: Antonios Printezis URL: https://git.openjdk.org/jdk/commit/5cd8af7622a93afb32f5f3fccdc453096992453c Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod 8308726: RISC-V: avoid unnecessary slli in the vectorized arraycopy stubs for bytes Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/14288 From jbhateja at openjdk.org Mon Jun 5 17:06:16 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 Jun 2023 17:06:16 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v6] In-Reply-To: References: Message-ID: On Sun, 4 Jun 2023 17:31:06 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_fmod.cpp line 121: >> >>> 119: // // |x|, |y| >>> 120: // a = DP_AND(x, DP_CONST(7fffffffffffffff)); >>> 121: __ movq(xmm0, xmm0); >> >> Redundatn move. > > I do not believe these are redundant, as the upper quadword of the register is cleared as a side-effect of the vmovq. I do not believe the icx compiler would insert random redundant vmovq instructions at this optimization level. Subsequent uses of xmm0 operate on 128 bit vector and eventually it feed into DIVSD instruction operating on fist 64 bit data. Given that we are clearing upper 64 bit it may be issued to execution port and consume 1 cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14224#discussion_r1218352040 From never at openjdk.org Mon Jun 5 17:28:11 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 5 Jun 2023 17:28:11 GMT Subject: RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording Message-ID: 8309498: [JVMCI] race in CallSiteTargetValue recording ------------- Commit messages: - 8309498: [JVMCI] race in CallSiteTargetValue recording Changes: https://git.openjdk.org/jdk/pull/14315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309498 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14315/head:pull/14315 PR: https://git.openjdk.org/jdk/pull/14315 From sgibbons at openjdk.org Mon Jun 5 17:57:04 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 5 Jun 2023 17:57:04 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v8] In-Reply-To: References: Message-ID: <3V3eleKHO09NJ1RU7cfFK3mPOKa5ngtQYePCt8YAmWY=.9fc4c4bd-962f-445d-8a94-2be5fa654807@github.com> > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Finish review comments; add tests for corner cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/1b44cd62..624d1248 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=06-07 Stats: 238 lines in 4 files changed: 236 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From vlivanov at openjdk.org Mon Jun 5 18:22:43 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 Jun 2023 18:22:43 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 25 May 2023 22:54:15 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - ... and 5 more: https://git.openjdk.org/jdk/compare/46c4da7f...8f81a7c8 src/hotspot/share/code/debugInfo.cpp line 251: > 249: // Set it to true so that the object will get rematerialized > 250: if (!_selected->is_root()) { > 251: _selected->set_root(true); Why do you need `_selected` to be marked as root? src/hotspot/share/code/debugInfo.cpp line 301: > 299: void ObjectMergeValue::print_detailed(outputStream* st) const { > 300: st->print("merge: ID=%d", _id); > 301: #ifndef PRODUCT Can you post a sample of the output, please? Why is it limited to non-product builds? It's valuable irrespective of build flavor. As I see in `ObjectValue::print_on` and `ScopeDesc::print_on`, you mix `print_on` with `print_fields_on`. Any particular reason for that? You could add `is_object_merge` case in ObjectValue::print_on` instead and extend `ObjectValue::print_fields_on` to cover `ObjectMergeValue` case. I find it hard to reason about `ObjectValue::print_on` vs `ObjectMergeValue::print_on` since it's a non-virtual method. Also, formatting is broken. src/hotspot/share/opto/compile.cpp line 2332: > 2330: } > 2331: > 2332: NOT_PRODUCT(ConnectionGraph::verify_ram_nodes(this, root());) Why do you limit the check to non-product builds only? It won't fail the compilation with product builds. src/hotspot/share/opto/output.cpp line 1101: > 1099: > 1100: if (!is_root) { > 1101: for (int k = 0; k < monarray->length(); k++) { I suggest to turn the lookup over `monarray` into a helper method and call it along with `locarray` and `exparray` checks: bool is_root = locarray->contains(ov) || exparray->contains(ov) || contains_as_owner(monarray, ov); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1217488199 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218419279 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1217491794 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218431285 From sgibbons at openjdk.org Mon Jun 5 18:36:29 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 5 Jun 2023 18:36:29 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v9] In-Reply-To: References: Message-ID: <9gyMwajVcShejHFe9dDwsiaGubd4z4x8jn67-q3YBQM=.4e27a912-e2e5-48ee-8a12-fdcb52dfdb61@github.com> > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/624d1248..9b2c1db5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=07-08 Stats: 12 lines in 1 file changed: 1 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From cslucas at openjdk.org Mon Jun 5 19:30:07 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 19:30:07 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 5 Jun 2023 18:05:47 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - ... and 5 more: https://git.openjdk.org/jdk/compare/46c4da7f...8f81a7c8 > > src/hotspot/share/code/debugInfo.cpp line 301: > >> 299: void ObjectMergeValue::print_detailed(outputStream* st) const { >> 300: st->print("merge: ID=%d", _id); >> 301: #ifndef PRODUCT > > Can you post a sample of the output, please? > > Why is it limited to non-product builds? It's valuable irrespective of build flavor. > > As I see in `ObjectValue::print_on` and `ScopeDesc::print_on`, you mix `print_on` with `print_fields_on`. Any particular reason for that? You could add `is_object_merge` case in ObjectValue::print_on` instead and extend `ObjectValue::print_fields_on` to cover `ObjectMergeValue` case. I find it hard to reason about `ObjectValue::print_on` vs `ObjectMergeValue::print_on` since it's a non-virtual method. > > > > Also, formatting is broken. I added a few samples below and there are a few more here: https://gist.github.com/JohnTortugo/913523947e08157def6cfebafa7d5daa Sample 1: Compiled method (c2) 415 24 TestTrapAfterMerge::test (57 bytes) total in heap [0x00007f7b4d03da90,0x00007f7b4d03de18] = 904 relocation [0x00007f7b4d03dc00,0x00007f7b4d03dc18] = 24 main code [0x00007f7b4d03dc20,0x00007f7b4d03dcb8] = 152 stub code [0x00007f7b4d03dcb8,0x00007f7b4d03dcd0] = 24 oops [0x00007f7b4d03dcd0,0x00007f7b4d03dce0] = 16 metadata [0x00007f7b4d03dce0,0x00007f7b4d03dce8] = 8 scopes data [0x00007f7b4d03dce8,0x00007f7b4d03dd50] = 104 scopes pcs [0x00007f7b4d03dd50,0x00007f7b4d03de10] = 192 dependencies [0x00007f7b4d03de10,0x00007f7b4d03de18] = 8 scopes: ScopeDesc(pc=0x00007f7b4d03dc3a offset=1a): TestTrapAfterMerge::test at -1 (line 3) ScopeDesc(pc=0x00007f7b4d03dc41 offset=21): TestTrapAfterMerge::test at 11 (line 5) ScopeDesc(pc=0x00007f7b4d03dc44 offset=24): TestTrapAfterMerge::test at 51 (line 12) ScopeDesc(pc=0x00007f7b4d03dc4a offset=2a): TestTrapAfterMerge::test at 46 (line 8) ScopeDesc(pc=0x00007f7b4d03dc52 offset=32): TestTrapAfterMerge::test at 37 (line 9) ScopeDesc(pc=0x00007f7b4d03dc57 offset=37): TestTrapAfterMerge::test at 43 (line 8) ScopeDesc(pc=0x00007f7b4d03dc61 offset=41): TestTrapAfterMerge::test at 46 (line 8) reexecute=true Locals - l0: empty - l1: empty - l2: reg rbx [6],int - l3: empty - l4: merge: ID=26 - l5: reg r11 [22],int Objects - 0: merge: ID=26, selector="reg r10 [20],int", merge_pointer="nullptr", candidate objs=[27, 28] - 1: obj: ID=27, is_root=0, N.Fields=1, klass: Point Fields: reg r8 [16],int - 2: obj: ID=28, is_root=0, N.Fields=1, klass: Point Fields: reg rcx [2],int ScopeDesc(pc=0x00007f7b4d03dc63 offset=43): TestTrapAfterMerge::test at 46 (line 8) ScopeDesc(pc=0x00007f7b4d03dc6c offset=4c): TestTrapAfterMerge::test at 34 (line 8) ScopeDesc(pc=0x00007f7b4d03dc71 offset=51): TestTrapAfterMerge::test at 55 (line 12) - Sample2: Compiled method (c2) 443 24 TestManys::test (41 bytes) total in heap [0x00007f35e9155b90,0x00007f35e9155e78] = 744 relocation [0x00007f35e9155d00,0x00007f35e9155d18] = 24 main code [0x00007f35e9155d20,0x00007f35e9155d88] = 104 stub code [0x00007f35e9155d88,0x00007f35e9155da0] = 24 oops [0x00007f35e9155da0,0x00007f35e9155db0] = 16 metadata [0x00007f35e9155db0,0x00007f35e9155db8] = 8 scopes data [0x00007f35e9155db8,0x00007f35e9155e10] = 88 scopes pcs [0x00007f35e9155e10,0x00007f35e9155e70] = 96 dependencies [0x00007f35e9155e70,0x00007f35e9155e78] = 8 scopes: ScopeDesc(pc=0x00007f35e9155d3a offset=1a): TestManys::test at -1 (line 57) ScopeDesc(pc=0x00007f35e9155d42 offset=22): TestManys::test at 11 (line 59) ScopeDesc(pc=0x00007f35e9155d58 offset=38): TestManys::test at 25 (line 63) Locals - l0: empty - l1: empty - l2: empty - l3: empty - l4: empty - l5: empty - l6: empty - l7: empty - l8: merge: ID=26 Objects - 0: merge: ID=26, selector="reg rbp [10],int", merge_pointer="nullptr", candidate objs=[27, 28] - 1: obj: ID=27, is_root=0, N.Fields=4, klass: Point Fields: stack[36], stack[36], 0, 0 - 2: obj: ID=28, is_root=0, N.Fields=4, klass: Point Fields: 2023, 0, 0, 0 ScopeDesc(pc=0x00007f35e9155d74 offset=54): TestManys::test at 25 (line 63) - Sample3: Compiled method (c2) 436 24 TestMultiSFO::test (48 bytes) total in heap [0x00007f1df5155590,0x00007f1df5155850] = 704 relocation [0x00007f1df5155700,0x00007f1df5155718] = 24 main code [0x00007f1df5155720,0x00007f1df5155788] = 104 stub code [0x00007f1df5155788,0x00007f1df51557a0] = 24 oops [0x00007f1df51557a0,0x00007f1df51557b0] = 16 metadata [0x00007f1df51557b0,0x00007f1df51557b8] = 8 scopes data [0x00007f1df51557b8,0x00007f1df51557f8] = 64 scopes pcs [0x00007f1df51557f8,0x00007f1df5155848] = 80 dependencies [0x00007f1df5155848,0x00007f1df5155850] = 8 scopes: ScopeDesc(pc=0x00007f1df515573a offset=1a): TestMultiSFO::test at -1 (line 12) ScopeDesc(pc=0x00007f1df515575c offset=3c): TestMultiSFO::test at 28 (line 19) Locals - l0: empty - l1: empty - l2: empty - l3: merge: ID=14 - l4: obj: ID=15, is_root=1, N.Fields=2, klass: TestMultiSFO$Point Fields: stack[12], stack[8] Objects - 0: merge: ID=14, selector="reg rbp [10],int", merge_pointer="nullptr", candidate objs=[15, 16] - 1: obj: ID=15, is_root=1, N.Fields=2, klass: TestMultiSFO$Point Fields: stack[12], stack[8] - 2: obj: ID=16, is_root=0, N.Fields=2, klass: TestMultiSFO$Point Fields: stack[8], stack[12] ScopeDesc(pc=0x00007f1df5155778 offset=58): TestMultiSFO::test at 28 (line 19) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218500009 From cslucas at openjdk.org Mon Jun 5 19:55:09 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 19:55:09 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 5 Jun 2023 19:26:59 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/code/debugInfo.cpp line 301: >> >>> 299: void ObjectMergeValue::print_detailed(outputStream* st) const { >>> 300: st->print("merge: ID=%d", _id); >>> 301: #ifndef PRODUCT >> >> Can you post a sample of the output, please? >> >> Why is it limited to non-product builds? It's valuable irrespective of build flavor. >> >> As I see in `ObjectValue::print_on` and `ScopeDesc::print_on`, you mix `print_on` with `print_fields_on`. Any particular reason for that? You could add `is_object_merge` case in ObjectValue::print_on` instead and extend `ObjectValue::print_fields_on` to cover `ObjectMergeValue` case. I find it hard to reason about `ObjectValue::print_on` vs `ObjectMergeValue::print_on` since it's a non-virtual method. >> >> >> >> Also, formatting is broken. > > I added a few samples below and there are a few more here: https://gist.github.com/JohnTortugo/913523947e08157def6cfebafa7d5daa > > Sample 1: > > > Compiled method (c2) 415 24 TestTrapAfterMerge::test (57 bytes) > total in heap [0x00007f7b4d03da90,0x00007f7b4d03de18] = 904 > relocation [0x00007f7b4d03dc00,0x00007f7b4d03dc18] = 24 > main code [0x00007f7b4d03dc20,0x00007f7b4d03dcb8] = 152 > stub code [0x00007f7b4d03dcb8,0x00007f7b4d03dcd0] = 24 > oops [0x00007f7b4d03dcd0,0x00007f7b4d03dce0] = 16 > metadata [0x00007f7b4d03dce0,0x00007f7b4d03dce8] = 8 > scopes data [0x00007f7b4d03dce8,0x00007f7b4d03dd50] = 104 > scopes pcs [0x00007f7b4d03dd50,0x00007f7b4d03de10] = 192 > dependencies [0x00007f7b4d03de10,0x00007f7b4d03de18] = 8 > scopes: > ScopeDesc(pc=0x00007f7b4d03dc3a offset=1a): > TestTrapAfterMerge::test at -1 (line 3) > ScopeDesc(pc=0x00007f7b4d03dc41 offset=21): > TestTrapAfterMerge::test at 11 (line 5) > ScopeDesc(pc=0x00007f7b4d03dc44 offset=24): > TestTrapAfterMerge::test at 51 (line 12) > ScopeDesc(pc=0x00007f7b4d03dc4a offset=2a): > TestTrapAfterMerge::test at 46 (line 8) > ScopeDesc(pc=0x00007f7b4d03dc52 offset=32): > TestTrapAfterMerge::test at 37 (line 9) > ScopeDesc(pc=0x00007f7b4d03dc57 offset=37): > TestTrapAfterMerge::test at 43 (line 8) > ScopeDesc(pc=0x00007f7b4d03dc61 offset=41): > TestTrapAfterMerge::test at 46 (line 8) reexecute=true > Locals > - l0: empty > - l1: empty > - l2: reg rbx [6],int > - l3: empty > - l4: merge: ID=26 > - l5: reg r11 [22],int > Objects > - 0: merge: ID=26, selector="reg r10 [20],int", merge_pointer="nullptr", candidate objs=[27, 28] > - 1: obj: ID=27, is_root=0, N.Fields=1, klass: Point > Fields: reg r8 [16],int > - 2: obj: ID=28, is_root=0, N.Fields=1, klass: Point > Fields: reg rcx [2],int > ScopeDesc(pc=0x00007f7b4d03dc63 offset=43): > TestTrapAfterMerge::test at 46 (line 8) > ScopeDesc(pc=0x00007f7b4d03dc6c offset=4c): > TestTrapAfterMerge::test at 34 (line 8) > ScopeDesc(pc=0x00007f7b4d03dc71 offset=51): > TestTrapAfterMerge::test at 55 (line 12) > > > - Sample2: > > > Compiled method (c2) 443 24 TestManys::test (41 bytes) > total in heap [0x00007f35e9155b90,0x00007f35e9155e78] = 744 > relocation [0x00007f35e9155d00,0x00007f35e9155d18] = 24 > main code [0x00007f35e9155d20,0x00007f35e9155d88] = 104 > stub code [0x00007f35e9155d88,0x00007f35e9155da0] = 24 > oops [0x00007f35e9155da0,0x00007f35e9155db0] =... > Why is it limited to non-product builds? It's valuable irrespective of build flavor. This is because `print_on` in `AnyObj` is only defined in non-product builds. I based implementation of `ObjectMergeValue::print_on` on `ObjectValue::print_on`. In `ObjectValue::print_on` fields aren't printed in product builds. > Any particular reason for that? You could add is_object_merge case in ObjectValue::print_oninstead and extendObjectValue::print_fields_onto coverObjectMergeValue case. I'll do that then. > Also, formatting is broken. Can you please share an example? If you mean the tabs on lines 303/304/306/307 I added those because I thought would make the code easier to read, but if you want I can definitely remove that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218523643 From cslucas at openjdk.org Mon Jun 5 19:55:14 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 19:55:14 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 5 Jun 2023 05:10:13 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - ... and 5 more: https://git.openjdk.org/jdk/compare/46c4da7f...8f81a7c8 > > src/hotspot/share/opto/compile.cpp line 2332: > >> 2330: } >> 2331: >> 2332: NOT_PRODUCT(ConnectionGraph::verify_ram_nodes(this, root());) > > Why do you limit the check to non-product builds only? It won't fail the compilation with product builds. Duh. I'll fix that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218525026 From vlivanov at openjdk.org Mon Jun 5 20:31:02 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 Jun 2023 20:31:02 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 5 Jun 2023 19:50:25 GMT, Cesar Soares Lucas wrote: > If you mean the tabs on lines 303/304/306/307 Yes, it confused me. As an alternative, you could put selector and merge_pointer-related statements on the same line, but I'm not sure how much it improves readability: st->print(", selector=""); _selector->print_on(st); st->print("""); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218558213 From vlivanov at openjdk.org Mon Jun 5 20:31:03 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 Jun 2023 20:31:03 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <4gr0ARilcuMl1Zfht5_7qYOd-OouT_2rIa8SgQuQWDw=.b55a2bc3-def0-4e29-bfb2-cc940d3493fb@github.com> On Mon, 5 Jun 2023 20:27:42 GMT, Vladimir Ivanov wrote: >>> Why is it limited to non-product builds? It's valuable irrespective of build flavor. >> >> This is because `print_on` in `AnyObj` is only defined in non-product builds. I based implementation of `ObjectMergeValue::print_on` on `ObjectValue::print_on`. In `ObjectValue::print_on` fields aren't printed in product builds. >> >>> Any particular reason for that? You could add is_object_merge case in ObjectValue::print_oninstead and extendObjectValue::print_fields_onto coverObjectMergeValue case. >> >> I'll do that then. >> >>> Also, formatting is broken. >> >> Can you please share an example? If you mean the tabs on lines 303/304/306/307 I added those because I thought would make the code easier to read, but if you want I can definitely remove that. > >> If you mean the tabs on lines 303/304/306/307 > > Yes, it confused me. As an alternative, you could put selector and merge_pointer-related statements on the same line, but I'm not sure how much it improves readability: > > st->print(", selector=""); _selector->print_on(st); st->print("""); A couple of suggestions about the output: * `merge`: it's clearer to call it `merge_obj` * `obj` vs `merge` output: obj output is duplicated in ScopeDesc entries and Objects sections; before it was a short version printed in Locals/Expressions and all the details were included in Objects; I like to see field locations in the short version, but including everything looks way too much IMO; * it makes sense to include selector and merge_pointer info in short version, but `is_root` can be omitted ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218558295 From vkempik at openjdk.org Mon Jun 5 20:59:25 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 5 Jun 2023 20:59:25 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads Message-ID: Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V Placeholder ------------- Commit messages: - Initial fix Changes: https://git.openjdk.org/jdk/pull/14320/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309502 Stats: 18 lines in 1 file changed: 14 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From cslucas at openjdk.org Mon Jun 5 21:13:06 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 21:13:06 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: <4gr0ARilcuMl1Zfht5_7qYOd-OouT_2rIa8SgQuQWDw=.b55a2bc3-def0-4e29-bfb2-cc940d3493fb@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <4gr0ARilcuMl1Zfht5_7qYOd-OouT_2rIa8SgQuQWDw=.b55a2bc3-def0-4e29-bfb2-cc940d3493fb@github.com> Message-ID: <3bgmER7fyi8uvkp58Fwr5s4XHT0BWOoED49EVDTRSDI=.a5839cc3-c2b2-4a94-a097-f748a3cf0a29@github.com> On Mon, 5 Jun 2023 20:27:48 GMT, Vladimir Ivanov wrote: >>> If you mean the tabs on lines 303/304/306/307 >> >> Yes, it confused me. As an alternative, you could put selector and merge_pointer-related statements on the same line, but I'm not sure how much it improves readability: >> >> st->print(", selector=""); _selector->print_on(st); st->print("""); > > A couple of suggestions about the output: > * `merge`: it's clearer to call it `merge_obj` > * `obj` vs `merge` output: obj output is duplicated in ScopeDesc entries and Objects sections; before it was a short version printed in Locals/Expressions and all the details were included in Objects; I like to see field locations in the short version, but including everything looks way too much IMO; > * it makes sense to include selector and merge_pointer info in short version, but `is_root` can be omitted Thanks @iwanowww . Does the output below look good to you? It prints ObjectValue in the same format as it was before this PR and only print details of the merge in the "Objects" section. Is there other output section that you think needs to be adjusted? Compiled method (c2) 436 24 TestMultiSFO::test (48 bytes) total in heap [0x00007f1df5155590,0x00007f1df5155850] = 704 relocation [0x00007f1df5155700,0x00007f1df5155718] = 24 main code [0x00007f1df5155720,0x00007f1df5155788] = 104 stub code [0x00007f1df5155788,0x00007f1df51557a0] = 24 oops [0x00007f1df51557a0,0x00007f1df51557b0] = 16 metadata [0x00007f1df51557b0,0x00007f1df51557b8] = 8 scopes data [0x00007f1df51557b8,0x00007f1df51557f8] = 64 scopes pcs [0x00007f1df51557f8,0x00007f1df5155848] = 80 dependencies [0x00007f1df5155848,0x00007f1df5155850] = 8 scopes: ScopeDesc(pc=0x00007f1df515573a offset=1a): TestMultiSFO::test at -1 (line 12) ScopeDesc(pc=0x00007f1df515575c offset=3c): TestMultiSFO::test at 28 (line 19) Locals - l0: empty - l1: empty - l2: empty - l3: merge_obj[14] - l4: obj[15] Objects - merge_obj[14], selector="reg rbp [10],int", merge_pointer="nullptr", candidate_objs=[15, 16] - obj[15], is_root=1, klass: TestMultiSFO$Point Fields: stack[12], stack[8] - obj[16], is_root=0, klass: TestMultiSFO$Point Fields: stack[8], stack[12] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218596666 From vlivanov at openjdk.org Mon Jun 5 22:08:02 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 Jun 2023 22:08:02 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: <3bgmER7fyi8uvkp58Fwr5s4XHT0BWOoED49EVDTRSDI=.a5839cc3-c2b2-4a94-a097-f748a3cf0a29@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <4gr0ARilcuMl1Zfht5_7qYOd-OouT_2rIa8SgQuQWDw=.b55a2bc3-def0-4e29-bfb2-cc940d3493fb@github.com> <3bgmER7fyi8uvkp58Fwr5s4XHT0BWOoED49EVDTRSDI=.a5839cc3-c2b2-4a94-a097-f748a3cf0a29@github.com> Message-ID: <4CuSp8KR3SDGjc88Pd57VcwsBdjG5_FUT94U8XkoM0s=.e61ce2a5-b9b4-40aa-9399-62b3ac275634@github.com> On Mon, 5 Jun 2023 21:10:22 GMT, Cesar Soares Lucas wrote: >> A couple of suggestions about the output: >> * `merge`: it's clearer to call it `merge_obj` >> * `obj` vs `merge` output: obj output is duplicated in ScopeDesc entries and Objects sections; before it was a short version printed in Locals/Expressions and all the details were included in Objects; I like to see field locations in the short version, but including everything looks way too much IMO; >> * it makes sense to include selector and merge_pointer info in short version, but `is_root` can be omitted > > Thanks @iwanowww . Does the output below look good to you? It prints ObjectValue in the same format as it was before this PR and only print details of the merge in the "Objects" section. Is there other output section that you think needs to be adjusted? > > > Compiled method (c2) 436 24 TestMultiSFO::test (48 bytes) > total in heap [0x00007f1df5155590,0x00007f1df5155850] = 704 > relocation [0x00007f1df5155700,0x00007f1df5155718] = 24 > main code [0x00007f1df5155720,0x00007f1df5155788] = 104 > stub code [0x00007f1df5155788,0x00007f1df51557a0] = 24 > oops [0x00007f1df51557a0,0x00007f1df51557b0] = 16 > metadata [0x00007f1df51557b0,0x00007f1df51557b8] = 8 > scopes data [0x00007f1df51557b8,0x00007f1df51557f8] = 64 > scopes pcs [0x00007f1df51557f8,0x00007f1df5155848] = 80 > dependencies [0x00007f1df5155848,0x00007f1df5155850] = 8 > scopes: > ScopeDesc(pc=0x00007f1df515573a offset=1a): > TestMultiSFO::test at -1 (line 12) > ScopeDesc(pc=0x00007f1df515575c offset=3c): > TestMultiSFO::test at 28 (line 19) > Locals > - l0: empty > - l1: empty > - l2: empty > - l3: merge_obj[14] > - l4: obj[15] > > Objects > - merge_obj[14], selector="reg rbp [10],int", merge_pointer="nullptr", candidate_objs=[15, 16] > - obj[15], is_root=1, klass: TestMultiSFO$Point > Fields: stack[12], stack[8] > - obj[16], is_root=0, klass: TestMultiSFO$Point > Fields: stack[8], stack[12] Thanks, it looks much better now (except the position in Objects array is missing). It makes sense to mention `is_root` for merge_obj case even though it's always equals to '1`. Also, make merge_pointer optional and omit it when its value is null. BTW instead of printing `is_root=0/1`, you can introduce a more compact notation for and mark relevant lines with a single symbol: - 0: R merge_obj[14], selector="reg rbp [10],int" candidates=[15, 16] - 1: R obj[15], klass: TestMultiSFO$Point Fields: stack[12], stack[8] - 2: obj[16], klass: TestMultiSFO$Point Fields: stack[8], stack[12] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218642386 From cslucas at openjdk.org Mon Jun 5 22:49:04 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 22:49:04 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 5 Jun 2023 05:05:26 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - ... and 5 more: https://git.openjdk.org/jdk/compare/46c4da7f...8f81a7c8 > > src/hotspot/share/code/debugInfo.cpp line 251: > >> 249: // Set it to true so that the object will get rematerialized >> 250: if (!_selected->is_root()) { >> 251: _selected->set_root(true); > > Why do you need `_selected` to be marked as root? I think you're right, there is no need for that. I'll remove/refactor that and run tests again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218672363 From cslucas at openjdk.org Mon Jun 5 22:49:05 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 5 Jun 2023 22:49:05 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v14] In-Reply-To: <4CuSp8KR3SDGjc88Pd57VcwsBdjG5_FUT94U8XkoM0s=.e61ce2a5-b9b4-40aa-9399-62b3ac275634@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <4gr0ARilcuMl1Zfht5_7qYOd-OouT_2rIa8SgQuQWDw=.b55a2bc3-def0-4e29-bfb2-cc940d3493fb@github.com> <3bgmER7fyi8uvkp58Fwr5s4XHT0BWOoED49EVDTRSDI=.a5839cc3-c2b2-4a94-a097-f748a3cf0a29@github.com> <4CuSp8KR3SDGjc88Pd57VcwsBdjG5_FUT94U8XkoM0s=.e61ce2a5-b9b4-40aa-9399-62b3ac275634@github.com> Message-ID: On Mon, 5 Jun 2023 22:03:59 GMT, Vladimir Ivanov wrote: >> Thanks @iwanowww . Does the output below look good to you? It prints ObjectValue in the same format as it was before this PR and only print details of the merge in the "Objects" section. Is there other output section that you think needs to be adjusted? >> >> >> Compiled method (c2) 436 24 TestMultiSFO::test (48 bytes) >> total in heap [0x00007f1df5155590,0x00007f1df5155850] = 704 >> relocation [0x00007f1df5155700,0x00007f1df5155718] = 24 >> main code [0x00007f1df5155720,0x00007f1df5155788] = 104 >> stub code [0x00007f1df5155788,0x00007f1df51557a0] = 24 >> oops [0x00007f1df51557a0,0x00007f1df51557b0] = 16 >> metadata [0x00007f1df51557b0,0x00007f1df51557b8] = 8 >> scopes data [0x00007f1df51557b8,0x00007f1df51557f8] = 64 >> scopes pcs [0x00007f1df51557f8,0x00007f1df5155848] = 80 >> dependencies [0x00007f1df5155848,0x00007f1df5155850] = 8 >> scopes: >> ScopeDesc(pc=0x00007f1df515573a offset=1a): >> TestMultiSFO::test at -1 (line 12) >> ScopeDesc(pc=0x00007f1df515575c offset=3c): >> TestMultiSFO::test at 28 (line 19) >> Locals >> - l0: empty >> - l1: empty >> - l2: empty >> - l3: merge_obj[14] >> - l4: obj[15] >> >> Objects >> - merge_obj[14], selector="reg rbp [10],int", merge_pointer="nullptr", candidate_objs=[15, 16] >> - obj[15], is_root=1, klass: TestMultiSFO$Point >> Fields: stack[12], stack[8] >> - obj[16], is_root=0, klass: TestMultiSFO$Point >> Fields: stack[8], stack[12] > > Thanks, it looks much better now (except the position in Objects array is missing). > > It makes sense to mention `is_root` for merge_obj case even though it's always equals to '1`. Also, make merge_pointer optional and omit it when its value is null. > > BTW instead of printing `is_root=0/1`, you can introduce a more compact notation and mark relevant lines with a single symbol: > > - 0: R merge_obj[14], selector="reg rbp [10],int" candidates=[15, 16] > - 1: R obj[15], klass: TestMultiSFO$Point > Fields: stack[12], stack[8] > - 2: obj[16], klass: TestMultiSFO$Point > Fields: stack[8], stack[12] Sounds good. I'll make the changes and push them asap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1218671994 From sgibbons at openjdk.org Mon Jun 5 23:48:21 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 5 Jun 2023 23:48:21 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix tests; need vlbwdq for vpbroadcastq ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14224/files - new: https://git.openjdk.org/jdk/pull/14224/files/9b2c1db5..e77d0817 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14224&range=08-09 Stats: 43 lines in 4 files changed: 30 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14224/head:pull/14224 PR: https://git.openjdk.org/jdk/pull/14224 From sgibbons at openjdk.org Mon Jun 5 23:48:46 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 5 Jun 2023 23:48:46 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v9] In-Reply-To: <9gyMwajVcShejHFe9dDwsiaGubd4z4x8jn67-q3YBQM=.4e27a912-e2e5-48ee-8a12-fdcb52dfdb61@github.com> References: <9gyMwajVcShejHFe9dDwsiaGubd4z4x8jn67-q3YBQM=.4e27a912-e2e5-48ee-8a12-fdcb52dfdb61@github.com> Message-ID: On Mon, 5 Jun 2023 18:36:29 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix test @vnkozlov I believe this is ready for integration now. Can I ask you to run your test battery on this PR please? I should have approvals very soon. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1577694036 From luhenry at openjdk.org Tue Jun 6 06:31:54 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 6 Jun 2023 06:31:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: hotspot:tier1 and jdk:tier1 are clean on hifive, more tbd. Looks good! ------------- Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/14320#pullrequestreview-1464359992 From duke at openjdk.org Tue Jun 6 08:34:24 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 6 Jun 2023 08:34:24 GMT Subject: RFR: 8308537: Remove BreakAtNode Message-ID: The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. ------------- Commit messages: - 8308537: removed most of set_debug_orig(), as the most of the function was only executed when BreakAtNode was not 0, i.e. BreakAtNode was used and not set to default - 8308537: Removed BreakAtNode flag Changes: https://git.openjdk.org/jdk/pull/14311/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14311&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308537 Stats: 20 lines in 2 files changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14311.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14311/head:pull/14311 PR: https://git.openjdk.org/jdk/pull/14311 From roland at openjdk.org Tue Jun 6 08:39:54 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 6 Jun 2023 08:39:54 GMT Subject: RFR: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 12:24:46 GMT, Eric Nothum wrote: > The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. I actually use it (albeit rarely). rr is linux x86 only AFAIK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14311#issuecomment-1578191697 From duke at openjdk.org Tue Jun 6 08:57:59 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 6 Jun 2023 08:57:59 GMT Subject: RFR: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 12:24:46 GMT, Eric Nothum wrote: > The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. Okay sure, in that case I am withdrawing this PR, and keep the BreakAtNode flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14311#issuecomment-1578230533 From duke at openjdk.org Tue Jun 6 08:58:00 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 6 Jun 2023 08:58:00 GMT Subject: Withdrawn: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 12:24:46 GMT, Eric Nothum wrote: > The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14311 From simonis at openjdk.org Tue Jun 6 09:19:00 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 6 Jun 2023 09:19:00 GMT Subject: RFR: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: <03V-J8yY9sMpXCXj2F0yYCrIs9IGy1fZiDzMUyNcKrk=.f1025a48-b364-47c2-9157-d27b3579f9e5@github.com> On Tue, 6 Jun 2023 08:36:45 GMT, Roland Westrelin wrote: >> The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. > > I actually use it (albeit rarely). rr is linux x86 only AFAIK. I agree with @rwestrel. It's one of the features that's rarely used, but it doesn't hurt and for some situations it's good that it's there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14311#issuecomment-1578266557 From yzhu at openjdk.org Tue Jun 6 09:23:01 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Tue, 6 Jun 2023 09:23:01 GMT Subject: RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes In-Reply-To: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> References: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Message-ID: On Mon, 5 Jun 2023 06:09:55 GMT, Gui Cao wrote: > Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 > [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- > > ### AddReductionVF/AddReductionVD > We can use Float256VectorTests.java Double256VectorTests.java to > emit these nodes and the compilation log is as follows: > #### AddReductionVF > Before this patch: > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a spill F0 -> F1 # spill size = 32 > 10e reduce_addF F1, F1, V1 # KILL V2 > 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > After this patch(Saving a spill operation): > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a reduce_addF F1, F0, V1 # KILL V2 > 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > #### AddReductionVD > Before this patch: > > 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f4 # castII of R9, #@castII > 0f4 addw R10, R9, zr #@convI2L_reg_reg > 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm > 0fa add R11, R30, R10 # ptr, #@addP_reg_reg > 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm > 100 loadV V1, [R11] # ve... Marked as reviewed by yzhu (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/14308#pullrequestreview-1464704019 From gcao at openjdk.org Tue Jun 6 09:23:03 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 6 Jun 2023 09:23:03 GMT Subject: RFR: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes In-Reply-To: References: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Message-ID: On Mon, 5 Jun 2023 09:48:00 GMT, Ludovic Henry wrote: >> Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 >> [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- >> >> ### AddReductionVF/AddReductionVD >> We can use Float256VectorTests.java Double256VectorTests.java to >> emit these nodes and the compilation log is as follows: >> #### AddReductionVF >> Before this patch: >> >> 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 >> 0f6 # castII of R19, #@castII >> 0f6 addw R10, R19, zr #@convI2L_reg_reg >> 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm >> 0fc add R11, R31, R10 # ptr, #@addP_reg_reg >> 100 addi R11, R11, #16 # ptr, #@addP_reg_imm >> 102 loadV V1, [R11] # vector (rvv) >> 10a spill F0 -> F1 # spill size = 32 >> 10e reduce_addF F1, F1, V1 # KILL V2 >> 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 >> >> After this patch(Saving a spill operation): >> >> 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 >> 0f6 # castII of R19, #@castII >> 0f6 addw R10, R19, zr #@convI2L_reg_reg >> 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm >> 0fc add R11, R31, R10 # ptr, #@addP_reg_reg >> 100 addi R11, R11, #16 # ptr, #@addP_reg_imm >> 102 loadV V1, [R11] # vector (rvv) >> 10a reduce_addF F1, F0, V1 # KILL V2 >> 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 >> >> #### AddReductionVD >> Before this patch: >> >> 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 >> 0f4 # castII of R9, #@castII >> 0f4 addw R10, R9, zr #@convI2L_reg_reg >> 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm >> 0fa add R11, R30, R10 # ptr, #@addP_reg_reg >> 0fe addi R11, R11, #16 # ptr, ... > > Marked as reviewed by luhenry (Committer). @luhenry @RealFYang Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14308#issuecomment-1578268452 From gcao at openjdk.org Tue Jun 6 09:23:04 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 6 Jun 2023 09:23:04 GMT Subject: Integrated: 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes In-Reply-To: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> References: <6h81pSz9ig3z1b4qh3dmv5jYvuiOwPrBQCLGSz1TzSY=.481b5d27-bfbf-4bbf-b855-9f158f2c7b51@github.com> Message-ID: On Mon, 5 Jun 2023 06:09:55 GMT, Gui Cao wrote: > Hi, We note that in the C2 AddReductionVF & AddReductionVD node, the src1 and dst registers are constrained to be the same register, which is not required, so we relax the register constraint for AddReductionVF/AddReductionVD in the C2 node. For reference, other CPUs, such as x86 and arm neon, do not need the same registers either[1]. arm64 sve constrains them to be the same registers because of the use of the FADDA instruction[2], which is floating point adding all active channels of SIMD&FP scalar sources and vector sources and placing the result in SIMD&FP scalar source registers. So for arm64 sve, it is required that that the two registers be the same. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L2897-L2907 > [2] https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/FADDA--Floating-point-add-strictly-ordered-reduction--accumulating-in-scalar- > > ### AddReductionVF/AddReductionVD > We can use Float256VectorTests.java Double256VectorTests.java to > emit these nodes and the compilation log is as follows: > #### AddReductionVF > Before this patch: > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a spill F0 -> F1 # spill size = 32 > 10e reduce_addF F1, F1, V1 # KILL V2 > 11e bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > After this patch(Saving a spill operation): > > 0f6 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f6 # castII of R19, #@castII > 0f6 addw R10, R19, zr #@convI2L_reg_reg > 0fa slli R10, R10, (#2 & 0x3f) #@lShiftL_reg_imm > 0fc add R11, R31, R10 # ptr, #@addP_reg_reg > 100 addi R11, R11, #16 # ptr, #@addP_reg_imm > 102 loadV V1, [R11] # vector (rvv) > 10a reduce_addF F1, F0, V1 # KILL V2 > 11a bgeu R19, R29, B61 #@cmpU_branch P=0.000001 C=-1.000000 > > #### AddReductionVD > Before this patch: > > 0f4 B15: # out( B61 B16 ) <- in( B14 ) Freq: 55.8033 > 0f4 # castII of R9, #@castII > 0f4 addw R10, R9, zr #@convI2L_reg_reg > 0f8 slli R10, R10, (#3 & 0x3f) #@lShiftL_reg_imm > 0fa add R11, R30, R10 # ptr, #@addP_reg_reg > 0fe addi R11, R11, #16 # ptr, #@addP_reg_imm > 100 loadV V1, [R11] # ve... This pull request has now been integrated. Changeset: 7d25bf77 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/7d25bf7722f6fbe3633dc718adf6f755e354adb9 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod 8309419: RISC-V: Relax register constraint for AddReductionVF & AddReductionVD nodes Reviewed-by: fyang, luhenry, yzhu ------------- PR: https://git.openjdk.org/jdk/pull/14308 From fjiang at openjdk.org Tue Jun 6 11:44:53 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 6 Jun 2023 11:44:53 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1 is clean on hifive, more tbd. Looks good, I see some `load_4chr` at [1], could it also produce misaligned loads? 1. https://github.com/openjdk/jdk/blob/01455a07a7e1f15aed43cd47222047810c826abd/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L675-L690 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1578573385 From roland at openjdk.org Tue Jun 6 12:51:53 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 6 Jun 2023 12:51:53 GMT Subject: RFR: 8309472: IGV: Add dump_igv(custom_name) for improved debugging In-Reply-To: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> References: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> Message-ID: On Mon, 5 Jun 2023 14:16:44 GMT, Christian Hagedorn wrote: > When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: > > Compile::current()->print_method(PHASE_END, 3); > > But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: > > Compile::current()->dump_igv("foo"); > Compile::current()->dump_igv("bar"); > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14313#pullrequestreview-1465122156 From chagedorn at openjdk.org Tue Jun 6 12:58:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jun 2023 12:58:53 GMT Subject: RFR: 8309472: IGV: Add dump_igv(custom_name) for improved debugging In-Reply-To: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> References: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> Message-ID: On Mon, 5 Jun 2023 14:16:44 GMT, Christian Hagedorn wrote: > When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: > > Compile::current()->print_method(PHASE_END, 3); > > But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: > > Compile::current()->dump_igv("foo"); > Compile::current()->dump_igv("bar"); > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14313#issuecomment-1578717396 From duke at openjdk.org Tue Jun 6 13:29:56 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 6 Jun 2023 13:29:56 GMT Subject: RFR: 8302145: ddepth should be uint in PhaseIdealLoop::register_node() Message-ID: Changed the type of the ddepth parameter in PhaseIdealLoop::register_node from int to uint, to avoid potentially unsafe uint -> int casts. The method is often used with uint arguments, coming from dom_depth(). I have verified the calls of PhaseIdealLoop::register_node and have removed related casts. ------------- Commit messages: - For readability: moved "*" to the left for the loop parameter - 8302145: Changed type of ddepth from int to int in PhaseIdealLoop::register_node Changes: https://git.openjdk.org/jdk/pull/14333/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14333&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302145 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14333/head:pull/14333 PR: https://git.openjdk.org/jdk/pull/14333 From chagedorn at openjdk.org Tue Jun 6 13:31:52 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 Jun 2023 13:31:52 GMT Subject: RFR: 8302145: ddepth should be uint in PhaseIdealLoop::register_node() In-Reply-To: References: Message-ID: <8iDXfjZAANuvvJgNQJfDhtEMbAv8TuuKXGLo0BPmm2o=.18002ce7-764d-4977-abc9-f6d7a8e1a8c0@github.com> On Tue, 6 Jun 2023 12:50:50 GMT, Eric Nothum wrote: > Changed the type of the ddepth parameter in PhaseIdealLoop::register_node from int to uint, to avoid potentially unsafe uint -> int casts. > The method is often used with uint arguments, coming from dom_depth(). > I have verified the calls of PhaseIdealLoop::register_node and have removed related casts. Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14333#pullrequestreview-1465216162 From cslucas at openjdk.org Tue Jun 6 14:51:46 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 6 Jun 2023 14:51:46 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v15] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR review 6: debug format output & some refactoring. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/8f81a7c8..3a5ed401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=13-14 Stats: 112 lines in 6 files changed: 37 ins; 59 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From dnsimon at openjdk.org Tue Jun 6 15:03:59 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 Jun 2023 15:03:59 GMT Subject: RFR: 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" Message-ID: This PR fixes an intermittent failure in TestEnableJVMCIProduct.java that can happen when execution is slow enough such that a top-tier JIT compilation is scheduled. ------------- Commit messages: - let exit value determine what output to check for Changes: https://git.openjdk.org/jdk/pull/14336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309542 Stats: 10 lines in 1 file changed: 6 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14336/head:pull/14336 PR: https://git.openjdk.org/jdk/pull/14336 From kvn at openjdk.org Tue Jun 6 15:25:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 Jun 2023 15:25:52 GMT Subject: RFR: 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" In-Reply-To: References: Message-ID: <0kLc_urcPBBaiDg3rtcIpHTA_lnZDRGvoltGc-9161Y=.f5d30399-9fa9-4da7-bdf1-204ec15c518d@github.com> On Tue, 6 Jun 2023 14:54:35 GMT, Doug Simon wrote: > This PR fixes an intermittent failure in TestEnableJVMCIProduct.java that can happen when execution is slow enough such that a top-tier JIT compilation is scheduled. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14336#pullrequestreview-1465475701 From jwaters at openjdk.org Tue Jun 6 15:36:57 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 6 Jun 2023 15:36:57 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Bumping ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1578995624 From never at openjdk.org Tue Jun 6 15:45:56 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 6 Jun 2023 15:45:56 GMT Subject: RFR: 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 14:54:35 GMT, Doug Simon wrote: > This PR fixes an intermittent failure in TestEnableJVMCIProduct.java that can happen when execution is slow enough such that a top-tier JIT compilation is scheduled. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14336#pullrequestreview-1465522240 From vkempik at openjdk.org Tue Jun 6 15:51:54 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 6 Jun 2023 15:51:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 11:41:51 GMT, Feilong Jiang wrote: > Looks good, I see some `load_4chr` at [1], could it also produce misaligned loads? > > 1. https://github.com/openjdk/jdk/blob/01455a07a7e1f15aed43cd47222047810c826abd/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L675-L690 That's hard to say, I just run some tests with perf and check result for trp_lam events. Such analysis is a lot easier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1579021949 From jbhateja at openjdk.org Tue Jun 6 16:12:03 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jun 2023 16:12:03 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 23:48:21 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests; need vlbwdq for vpbroadcastq Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1465575042 From sviswanathan at openjdk.org Tue Jun 6 16:15:01 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 6 Jun 2023 16:15:01 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 23:48:21 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests; need vlbwdq for vpbroadcastq The changes look good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14224#pullrequestreview-1465580781 From dnsimon at openjdk.org Tue Jun 6 16:15:56 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 Jun 2023 16:15:56 GMT Subject: RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 17:20:48 GMT, Tom Rodriguez wrote: > 8309498: [JVMCI] race in CallSiteTargetValue recording Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14315#pullrequestreview-1465581497 From dnsimon at openjdk.org Tue Jun 6 16:16:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 Jun 2023 16:16:02 GMT Subject: RFR: 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" In-Reply-To: References: Message-ID: <1DcWrMrTwrKWJXI-3gwXOO-efNr5KjWPP0otwVnmx5s=.194089f2-090c-42bd-a23a-1d2ec1441282@github.com> On Tue, 6 Jun 2023 14:54:35 GMT, Doug Simon wrote: > This PR fixes an intermittent failure in TestEnableJVMCIProduct.java that can happen when execution is slow enough such that a top-tier JIT compilation is scheduled. Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14336#issuecomment-1579060432 From dnsimon at openjdk.org Tue Jun 6 16:16:03 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 Jun 2023 16:16:03 GMT Subject: Integrated: 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 14:54:35 GMT, Doug Simon wrote: > This PR fixes an intermittent failure in TestEnableJVMCIProduct.java that can happen when execution is slow enough such that a top-tier JIT compilation is scheduled. This pull request has now been integrated. Changeset: 0f0fda7a Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/0f0fda7abc68693d7d764b587bf5588b7cae74d1 Stats: 10 lines in 1 file changed: 6 ins; 3 del; 1 mod 8309542: compiler/jvmci/TestEnableJVMCIProduct.java fails with "JVMCI compiler 'graal' specified by jvmci.Compiler not found" Reviewed-by: kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/14336 From cslucas at openjdk.org Tue Jun 6 16:51:04 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 6 Jun 2023 16:51:04 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v16] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges Catching up with master. - Address PR review 6: debug format output & some refactoring. - Catching up with master branch. Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address PR review 6: refactoring around rematerialization & improve test cases. - Address PR review 5: refactor on rematerialization & add tests. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Fix tests. Remember previous reducible Phis. - Address PR review 3. Some comments and be able to abort compilation. - ... and 7 more: https://git.openjdk.org/jdk/compare/ca6f07f9...cb0b6702 ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=15 Stats: 2741 lines in 26 files changed: 2486 ins; 113 del; 142 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From xliu at openjdk.org Tue Jun 6 18:03:02 2023 From: xliu at openjdk.org (Xin Liu) Date: Tue, 6 Jun 2023 18:03:02 GMT Subject: RFR: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 12:24:46 GMT, Eric Nothum wrote: > The BreakAtNode flag was unused as its utility is now insignificant with the use of "rr" in practice. See the following discussion: https://github.com/openjdk/jdk/pull/13767#issuecomment-1541805032. I removed the BreakAtNode flag and the related parts in the code. I am using BreakAtNode in particular with debug_idx a lot. It helps me to locate when and where the node-in-question is constructed. If you plan to remove it, is there any alternative? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14311#issuecomment-1579222801 From sviswanathan at openjdk.org Tue Jun 6 18:09:02 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 6 Jun 2023 18:09:02 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: <0lQJvljjXjPCoK8TAVG2wNevqMuErq_tBTsDct7jvuI=.157e6338-4203-4857-9d51-30a6f0ab5083@github.com> On Mon, 5 Jun 2023 23:48:21 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests; need vlbwdq for vpbroadcastq @TobiHartmann @vnkozlov Please advise if we could go ahead and integrate this PR from Scott. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1579230349 From jbhateja at openjdk.org Tue Jun 6 18:49:04 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jun 2023 18:49:04 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 17:22:32 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix license in one file src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4140: > 4138: > 4139: if (UseAVX > 2 && VM_Version::supports_avx512dq()) { > 4140: Should this feature check be relaxed to AVX512F for 32 bit sort routines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1220145228 From jbhateja at openjdk.org Tue Jun 6 19:09:58 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jun 2023 19:09:58 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 17:22:32 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix license in one file test/micro/org/openjdk/bench/java/util/ArraysSort.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. Copyright year should be 2023 test/micro/org/openjdk/bench/java/util/ArraysSort.java line 85: > 83: ints_unsorted[i] = rnd.nextInt(); > 84: longs_unsorted[i] = rnd.nextLong(); > 85: floats_unsorted[i] = rnd.nextFloat(); Can you also introduce NaN, Infinity, +0.0, -0.0 in input floating point arrays. test/micro/org/openjdk/bench/java/util/ArraysSort.java line 104: > 102: @Benchmark > 103: public void floatSort() throws Throwable { > 104: floats_sorted = floats_unsorted.clone(); We can move clone out of benchmarking methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1220170913 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1220168276 PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1220174402 From jbhateja at openjdk.org Tue Jun 6 19:21:56 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 Jun 2023 19:21:56 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v6] In-Reply-To: References: Message-ID: <4TFxrQA6h60f8RJZBtehi8_qEanj0xZqveUOqjX3Feo=.1c9d0e49-67aa-4bf5-898d-d79e933e5cef@github.com> On Thu, 1 Jun 2023 17:22:32 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. >> >> **Arrays.sort performance data using JMH benchmarks** >> >> | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- | >> | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | >> | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | >> | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | >> | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | >> | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | >> | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | >> | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | >> | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | >> | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | >> | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | >> | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | >> | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | >> | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | >> | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | >> | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | >> | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix license in one file src/java.base/share/classes/java/util/Arrays.java line 82: > 80: > 81: @IntrinsicCandidate > 82: private static void arraySort(int[] array, int fromIndex, int toIndex) { A minor styling comment: We can use same all small caps naming convention as used for System.arraycopy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1220189843 From duke at openjdk.org Tue Jun 6 22:42:10 2023 From: duke at openjdk.org (duke) Date: Tue, 6 Jun 2023 22:42:10 GMT Subject: Withdrawn: 8295795: hsdis does not build with binutils 2.39+ In-Reply-To: References: Message-ID: On Fri, 21 Oct 2022 15:26:59 GMT, Aleksey Shipilev wrote: > Fails like this: > > > $ sh ./configure --with-boot-jdk=jdk19u-ea --with-hsdis=binutils --with-binutils-src=binutils-2.39 > $ make clean build-hsdis > > === Output from failing command(s) repeated here === > * For target support_hsdis_hsdis-binutils.o: > /home/shade/trunks/jdk/src/utils/hsdis/binutils/hsdis-binutils.c: In function 'init_disassemble_info_from_bfd': > /home/shade/trunks/jdk/src/utils/hsdis/binutils/hsdis-binutils.c:564:3: error: too few arguments to function 'init_disassemble_info' > 564 | init_disassemble_info(dinfo, stream, fprintf_func); > | ^~~~~~~~~~~~~~~~~~~~~ > In file included from /home/shade/trunks/jdk/src/utils/hsdis/binutils/hsdis-binutils.c:62: > /home/shade/trunks/jdk/binutils-2.39/include/dis-asm.h:472:13: note: declared here > 472 | extern void init_disassemble_info (struct disassemble_info *dinfo, void *stream, > | ^~~~~~~~~~~~~~~~~~~~~ > > > Additional testing: > - [x] Linux x86_64 build with binutils 2.38 (still works) > - [x] Linux x86_64 build with binutils 2.39 (now works) > - [ ] JMH -prof perfasm with binutils-2.39-built hsdis This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/10817 From cslucas at openjdk.org Tue Jun 6 23:14:14 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 6 Jun 2023 23:14:14 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v17] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Rome minor refactorings. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/cb0b6702..14ddb63a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=15-16 Stats: 12 lines in 5 files changed: 6 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From thartmann at openjdk.org Wed Jun 7 05:04:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jun 2023 05:04:04 GMT Subject: RFR: 8308537: Remove BreakAtNode In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 18:00:24 GMT, Xin Liu wrote: > If you plan to remove it, is there any alternative? Many of us use http://rr-project.org/ for debugging. Being able to reverse execute is much more powerful because you can simply set watchpoints on the node or it's inputs to trace back creation of entire subgraphs in the IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14311#issuecomment-1579896833 From thartmann at openjdk.org Wed Jun 7 05:10:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jun 2023 05:10:06 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: <0lQJvljjXjPCoK8TAVG2wNevqMuErq_tBTsDct7jvuI=.157e6338-4203-4857-9d51-30a6f0ab5083@github.com> References: <0lQJvljjXjPCoK8TAVG2wNevqMuErq_tBTsDct7jvuI=.157e6338-4203-4857-9d51-30a6f0ab5083@github.com> Message-ID: On Tue, 6 Jun 2023 18:06:11 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests; need vlbwdq for vpbroadcastq > > @TobiHartmann @vnkozlov Please advise if we could go ahead and integrate this PR from Scott. @sviswa7 Thanks for the notification. I'll run this through our testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1579902442 From thartmann at openjdk.org Wed Jun 7 05:46:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jun 2023 05:46:57 GMT Subject: RFR: 8302145: ddepth should be uint in PhaseIdealLoop::register_node() In-Reply-To: References: Message-ID: <53FuU8FyeUpNAhyt5Qxbsc9jnDizTzJrEojG5Ok3FdY=.45f36e42-3c67-45be-9f9b-93bbf69de4ab@github.com> On Tue, 6 Jun 2023 12:50:50 GMT, Eric Nothum wrote: > Changed the type of the ddepth parameter in PhaseIdealLoop::register_node from int to uint, to avoid potentially unsafe uint -> int casts. > The method is often used with uint arguments, coming from dom_depth(). > I have verified the calls of PhaseIdealLoop::register_node and have removed related casts. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14333#pullrequestreview-1466652442 From thartmann at openjdk.org Wed Jun 7 05:48:56 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jun 2023 05:48:56 GMT Subject: RFR: 8309472: IGV: Add dump_igv(custom_name) for improved debugging In-Reply-To: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> References: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> Message-ID: On Mon, 5 Jun 2023 14:16:44 GMT, Christian Hagedorn wrote: > When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: > > Compile::current()->print_method(PHASE_END, 3); > > But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: > > Compile::current()->dump_igv("foo"); > Compile::current()->dump_igv("bar"); > > Thanks, > Christian Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14313#pullrequestreview-1466654129 From fyang at openjdk.org Wed Jun 7 06:26:54 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 7 Jun 2023 06:26:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1 is clean on hifive, more tbd. Hi, I searched and found that we have four direct callers of `C2_MacroAssembler::string_indexof_linearscan`: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by `C2_MacroAssembler::string_indexof`. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance. Also, it would be better to have some numbers on other venders like T-head. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1579972351 From vkempik at openjdk.org Wed Jun 7 07:13:54 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 07:13:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. I'll test on thead as for the first part of the patch, at line 494, on this [line](https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L482), we do nlen_tmp -= 7; then we at this [line](https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L494), result = haystack + nlen_tmp and read long word from result address, obviously this causes misaligned load ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580066972 From chagedorn at openjdk.org Wed Jun 7 07:42:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jun 2023 07:42:05 GMT Subject: RFR: 8309472: IGV: Add dump_igv(custom_name) for improved debugging In-Reply-To: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> References: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> Message-ID: On Mon, 5 Jun 2023 14:16:44 GMT, Christian Hagedorn wrote: > When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: > > Compile::current()->print_method(PHASE_END, 3); > > But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: > > Compile::current()->dump_igv("foo"); > Compile::current()->dump_igv("bar"); > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14313#issuecomment-1580109251 From chagedorn at openjdk.org Wed Jun 7 07:42:06 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jun 2023 07:42:06 GMT Subject: Integrated: 8309472: IGV: Add dump_igv(custom_name) for improved debugging In-Reply-To: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> References: <4zYmosaLepVG8guTb9kOnU8LdeomSLH-n8KHd44coYo=.6d189c68-9a1d-461a-add6-e4629f8528ce@github.com> Message-ID: On Mon, 5 Jun 2023 14:16:44 GMT, Christian Hagedorn wrote: > When debugging, I often add multiple IR dumps throughout the code to capture different states. To do that, I'm just re-using various `PHASE_XYZ` `CompilerPhaseType` enum values: > > Compile::current()->print_method(PHASE_END, 3); > > But this becomes confusing when using multiple such enum values and trying to remember what they actually mean. To overcome that (and to avoid creating new enum values each time), I suggest to introduce a new `dump_igv(custom_name)` method where `custom_name` can be an arbitrary string. Then we can use the following when debugging: > > Compile::current()->dump_igv("foo"); > Compile::current()->dump_igv("bar"); > > Thanks, > Christian This pull request has now been integrated. Changeset: 0ed4af76 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/0ed4af76c07ff71acc202796e504f092910215ac Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8309472: IGV: Add dump_igv(custom_name) for improved debugging Reviewed-by: roland, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14313 From vkempik at openjdk.org Wed Jun 7 07:43:55 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 07:43:55 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 06:20:33 GMT, Fei Yang wrote: > Hi, I searched and found that we have four direct callers of `C2_MacroAssembler::string_indexof_linearscan`: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by `C2_MacroAssembler::string_indexof`. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance. Originally, when I found this misaligned load, this code `(this->*load_2chr)(ch2, Address(tmp3), noreg);` was corresponding to `lhu t1, 0(t4)`. So I can say the isLL variable was true. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580114390 From fyang at openjdk.org Wed Jun 7 07:56:59 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 7 Jun 2023 07:56:59 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:41:15 GMT, Vladimir Kempik wrote: > > Hi, I searched and found that we have four direct callers of `C2_MacroAssembler::string_indexof_linearscan`: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by `C2_MacroAssembler::string_indexof`. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance. > > Originally, when I found this misaligned load, this code `(this->*load_2chr)(ch2, Address(tmp3), noreg);` was corresponding to `lhu t1, 0(t4)`. So I can say the isLL variable was true. Can we simply change the two conditions `if (AvoidUnalignedAccesses) {` added in `C2_MacroAssembler::string_indexof_linearscan` into something like `if (needle_con_cnt == -1 && AvoidUnalignedAccesses) {` and see if this could also resolve the problem? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580129550 From vkempik at openjdk.org Wed Jun 7 07:57:01 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 07:57:01 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. See, this part of algo is misaligned bind(CH1_LOOP); add(tmp3, haystack, hlen_neg); (this->*load_2chr)(ch2, Address(tmp3), noreg); beq(ch1, ch2, MATCH); add(hlen_neg, hlen_neg, haystack_chr_size); blez(hlen_neg, CH1_LOOP); this becomes: CH1_LOOP: add t4, a1, a2 lhu t1, 0(t4) beg t0, t1, 0xMATCH addi a2, a2, 1 blez a2, CH1_LOOP so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580138574 From yzheng at openjdk.org Wed Jun 7 08:12:21 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 Jun 2023 08:12:21 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. Message-ID: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. ------------- Commit messages: - Export various virtual thread JVMTI fields to JVMCI compiler. Changes: https://git.openjdk.org/jdk/pull/14348/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14348&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309562 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14348/head:pull/14348 PR: https://git.openjdk.org/jdk/pull/14348 From fyang at openjdk.org Wed Jun 7 08:30:56 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 7 Jun 2023 08:30:56 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:54:49 GMT, Vladimir Kempik wrote: > See, this part of algo is misaligned > > ``` > bind(CH1_LOOP); > add(tmp3, haystack, hlen_neg); > (this->*load_2chr)(ch2, Address(tmp3), noreg); > beq(ch1, ch2, MATCH); > add(hlen_neg, hlen_neg, haystack_chr_size); > blez(hlen_neg, CH1_LOOP); > ``` > > this becomes: > > ``` > CH1_LOOP: add t4, a1, a2 > lhu t1, 0(t4) > beg t0, t1, 0xMATCH > addi a2, a2, 1 > blez a2, CH1_LOOP > ``` > > so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned. I see it now. Thanks. Then I am expecting that we could also have similar issue for the `if (needle_con_cnt == 4)` case in the same function where we do `load_4chr` incrementally with 1 byte step when `isLL` variable is true, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580190910 From dnsimon at openjdk.org Wed Jun 7 09:11:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 7 Jun 2023 09:11:57 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. In-Reply-To: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> Message-ID: On Wed, 7 Jun 2023 08:04:52 GMT, Yudi Zheng wrote: > This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14348#pullrequestreview-1467082906 From duke at openjdk.org Wed Jun 7 11:25:12 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 7 Jun 2023 11:25:12 GMT Subject: RFR: 8309474: [IR Framework] Wrong @ForceCompile link in README Message-ID: Fixed the @ForceCompile link to now actually point to the ForceCompile.java ------------- Commit messages: - 8309474: Fix ForceCompile link in README.md Changes: https://git.openjdk.org/jdk/pull/14347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309474 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14347/head:pull/14347 PR: https://git.openjdk.org/jdk/pull/14347 From chagedorn at openjdk.org Wed Jun 7 11:39:57 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 Jun 2023 11:39:57 GMT Subject: RFR: 8309474: [IR Framework] Wrong @ForceCompile link in README In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:11:31 GMT, Eric Nothum wrote: > Fixed the @ForceCompile link to now actually point to the ForceCompile.java Looks good and trivial! Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14347#pullrequestreview-1467421730 PR Review: https://git.openjdk.org/jdk/pull/14347#pullrequestreview-1467421784 From duke at openjdk.org Wed Jun 7 11:43:07 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 7 Jun 2023 11:43:07 GMT Subject: Integrated: 8302145: ddepth should be uint in PhaseIdealLoop::register_node() In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 12:50:50 GMT, Eric Nothum wrote: > Changed the type of the ddepth parameter in PhaseIdealLoop::register_node from int to uint, to avoid potentially unsafe uint -> int casts. > The method is often used with uint arguments, coming from dom_depth(). > I have verified the calls of PhaseIdealLoop::register_node and have removed related casts. This pull request has now been integrated. Changeset: 1de40f36 Author: Eric Nothum Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/1de40f360f3beed5eb9fbd62a992989bb5bdb315 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod 8302145: ddepth should be uint in PhaseIdealLoop::register_node() Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14333 From duke at openjdk.org Wed Jun 7 12:41:14 2023 From: duke at openjdk.org (Daohan Qu) Date: Wed, 7 Jun 2023 12:41:14 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Message-ID: This patch should fix [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). A `jtreg` test is also added. I'd appreciate any comments and reviews. Thanks in advance! ## Problem Analysis For the following program, public class Test { static boolean flag; public static void main(String[] args) { for (int i = 0; i < 10000; i++) { flag = !flag; test(); } } public static void test() { int limit = flag ? Integer.MAX_VALUE : 1000; int i = 0; while (i < limit) { i += 3; if (flag) { return; } } } } A `LoopLimitNode` will be generated and its `Limit` input is a `PhiNode`, as depicted in the following picture. phi_as_limit During `PhaseCCP`, the `LoopLimitNode::Value()` tries to calculate the constant final value: https://github.com/openjdk/jdk/blob/16ebf47fe3b0fac7b67acfa589a26abf8843306b/src/hotspot/share/opto/loopnode.cpp#L2289-L2301 The problem is that the assertion in `line 2299` could fail during CCP though it must hold true at the end of CCP. Here is the reason: `PhaseCCP` initializes all nodes with the type `TOP` and iterates in an "arbitrary" order. The following order may happen: 28 IfTrue => 34 Region => 36 Phi => 195 LoopLimit => ... => 29 IfFalse 1. In `ProjNode::Value()` (`IfTrue` inherits it), the type of `IfTrue` is set to `Type::CONTROL` https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/multnode.cpp#L168-L171 2. In `PhiNode::Value()`, only `28 IfTrue`'s correspondence `33 ConI` gets merged (as `29 IfFalse` has not been dealt with yet), then it has a value of `int:max`. https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/cfgnode.cpp#L1269-L1277 3. In `LoopLimitNode::Value()`, it finds its `Limit` input `36 Phi` is constant, which triggers the assertion, and the assertion fails since the final value calculated from that constant limit (`int:max`) overflows. ## Solution Move the overflow check to the end of CCP, where it must not fail. ------------- Commit messages: - Add a jtreg test for this bug - Move final value overflow check of LoopLimitNode to the end of CCP - Add query and cast method for LoopLimitNode Changes: https://git.openjdk.org/jdk/pull/14353/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14353&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309266 Stats: 119 lines in 5 files changed: 110 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14353.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14353/head:pull/14353 PR: https://git.openjdk.org/jdk/pull/14353 From vkempik at openjdk.org Wed Jun 7 13:08:56 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 13:08:56 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:49:22 GMT, Fei Yang wrote: > > > Hi, I searched and found that we have four direct callers of `C2_MacroAssembler::string_indexof_linearscan`: three in file riscv.ad (by string_indexof_conUU, string_indexof_conLL and string_indexof_conUL) and one by `C2_MacroAssembler::string_indexof`. Did you check which one is triggering these unaligned accesses? I am not sure but it looks to me that the three direct callers in file riscv.ad are less likely to have such an issue. If that is true, we might need some distinugish among those callers for better performance. > > > > > > Originally, when I found this misaligned load, this code `(this->*load_2chr)(ch2, Address(tmp3), noreg);` was corresponding to `lhu t1, 0(t4)`. So I can say the isLL variable was true. > > Can we simply change the two conditions `if (AvoidUnalignedAccesses) {` added in `C2_MacroAssembler::string_indexof_linearscan` into something like `if (needle_con_cnt == -1 && AvoidUnalignedAccesses) {` and see if this could also resolve the problem? I can't put whole section ( https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L714 ) under AvoidUnalignedAccesses as it defines label DO3 which is used [earlier](https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L671) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1580777262 From thartmann at openjdk.org Wed Jun 7 13:27:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 Jun 2023 13:27:54 GMT Subject: RFR: 8309474: [IR Framework] Wrong @ForceCompile link in README In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:11:31 GMT, Eric Nothum wrote: > Fixed the @ForceCompile link to now actually point to the ForceCompile.java Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14347#pullrequestreview-1467663631 From epeter at openjdk.org Wed Jun 7 14:33:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 Jun 2023 14:33:55 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 12:33:48 GMT, Daohan Qu wrote: > This patch should fix [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). A `jtreg` test is also added. I'd appreciate any comments and reviews. Thanks in advance! > > ## Problem Analysis > > For the following program, > > public class Test { > > static boolean flag; > > public static void main(String[] args) { > for (int i = 0; i < 10000; i++) { > flag = !flag; > test(); > } > } > > public static void test() { > int limit = flag ? Integer.MAX_VALUE : 1000; > > int i = 0; > while (i < limit) { > i += 3; > if (flag) { > return; > } > } > } > } > > A `LoopLimitNode` will be generated and its `Limit` input is a `PhiNode`, as depicted in the following picture. > > phi_as_limit > > During `PhaseCCP`, the `LoopLimitNode::Value()` tries to calculate the constant final value: > https://github.com/openjdk/jdk/blob/16ebf47fe3b0fac7b67acfa589a26abf8843306b/src/hotspot/share/opto/loopnode.cpp#L2289-L2301 > > The problem is that the assertion in `line 2299` could fail during CCP though it must hold true at the end of CCP. Here is the reason: `PhaseCCP` initializes all nodes with the type `TOP` and iterates in an "arbitrary" order. The following order may happen: > > 28 IfTrue => 34 Region => 36 Phi => 195 LoopLimit => ... => 29 IfFalse > > 1. In `ProjNode::Value()` (`IfTrue` inherits it), the type of `IfTrue` is set to `Type::CONTROL` > https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/multnode.cpp#L168-L171 > > 2. In `PhiNode::Value()`, only `28 IfTrue`'s correspondence `33 ConI` gets merged (as `29 IfFalse` has not been dealt with yet), then it has a value of `int:max`. > https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/cfgnode.cpp#L1269-L1277 > > 3. In `LoopLimitNode::Value()`, it finds its `Limit` input `36 Phi` is constant, which triggers the assertion, and the assertion fails since the final value calculated from that constant limit (`int:max`) overflows. > > ## Solution > Move the overflow check to the end of CCP, where it must not fail. @quadhier Thanks for looking into this! This but is currently not assigned to you. Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Currently, @enothum had it assigned and was also working on it. Why is the overflow acceptable? Does that not mean that the calculation did something wrong? In the example, we have init_con = 3 limit_con = 2147483647 = max_jint stride_con = 3 stride_m = stride_con - 1 = 2 trip_count = (limit_con - init_con + stride_m)/stride_con = 715827882 final_con = init_con + stride_con*trip_count = 2147483649 = max_jint + 2 (overflow!) final_int = -2147483647 (overflow!) Does that not mean that we mis-calculated the `trip_count`? If it was 1 less, we would not have an overflow. Would that not fix the issue in a simpler way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1580950081 From duke at openjdk.org Wed Jun 7 14:57:54 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 7 Jun 2023 14:57:54 GMT Subject: RFR: 8309474: [IR Framework] Wrong @ForceCompile link in README In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:11:31 GMT, Eric Nothum wrote: > Fixed the @ForceCompile link to now actually point to the ForceCompile.java Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14347#issuecomment-1580995549 From kvn at openjdk.org Wed Jun 7 15:24:54 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 Jun 2023 15:24:54 GMT Subject: RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 17:20:48 GMT, Tom Rodriguez wrote: > intermittent failures with Graal on ContinuousCallSiteTargetChange showed that when constructing the CallSiteTargetValue Assumption we read the value twice so the dependency and the value in the program might be different leading to incorrect execution. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14315#pullrequestreview-1467977997 From duke at openjdk.org Wed Jun 7 15:27:56 2023 From: duke at openjdk.org (Daohan Qu) Date: Wed, 7 Jun 2023 15:27:56 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 14:31:19 GMT, Emanuel Peter wrote: > Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Hi, @eme64 , I'm so sorry for that! I don't have a JBS account yet, but from now on I will avoid to work on assigned issues and try to get an account ASAP. > Why is the overflow acceptable? Does that not mean that the calculation did something wrong? If my understanding is correct, the final value should be `i`'s value at loop exit. If limit is `max_jint`, the final value should be `max_jint + 2`. So the calculation is not wrong but is an intermediate result. We need to check `LoopLimitNode::Value()` at end of CCP to ensure this calculation doesn't overflow. BTW, The `LoopLimitNode` is generated and its check happens because it believe that the code don't always overflow and overflow case will be handled by a `uncommon_trap`, which is done by the following code (If `check_stride_overflow()` returns `-1`, the overflow will always happen). https://github.com/openjdk/jdk/blob/5b147eb5e46ac7fa637ed997c6da8f238f685ea4/src/hotspot/share/opto/loopnode.cpp#L1781-L1804 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1581056520 From kvn at openjdk.org Wed Jun 7 15:43:58 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 Jun 2023 15:43:58 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. In-Reply-To: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> Message-ID: On Wed, 7 Jun 2023 08:04:52 GMT, Yudi Zheng wrote: > This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 222: > 220: JVMTI_ONLY(nonstatic_field(JavaThread, _is_in_tmp_VTMS_transition, bool)) \ > 221: \ > 222: static_field(JvmtiVTMSTransitionDisabler, _VTMS_notify_jvmti_events, bool) \ I think this should be also under `JVMTI_ONLY()`. Did you have issues if you do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14348#discussion_r1221815421 From vkempik at openjdk.org Wed Jun 7 15:56:28 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 15:56:28 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 08:27:53 GMT, Fei Yang wrote: >> See, this part of algo is misaligned >> >> bind(CH1_LOOP); >> add(tmp3, haystack, hlen_neg); >> (this->*load_2chr)(ch2, Address(tmp3), noreg); >> beq(ch1, ch2, MATCH); >> add(hlen_neg, hlen_neg, haystack_chr_size); >> blez(hlen_neg, CH1_LOOP); >> >> this becomes: >> >> CH1_LOOP: add t4, a1, a2 >> lhu t1, 0(t4) >> beg t0, t1, 0xMATCH >> addi a2, a2, 1 >> blez a2, CH1_LOOP >> >> >> so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned. > >> See, this part of algo is misaligned >> >> ``` >> bind(CH1_LOOP); >> add(tmp3, haystack, hlen_neg); >> (this->*load_2chr)(ch2, Address(tmp3), noreg); >> beq(ch1, ch2, MATCH); >> add(hlen_neg, hlen_neg, haystack_chr_size); >> blez(hlen_neg, CH1_LOOP); >> ``` >> >> this becomes: >> >> ``` >> CH1_LOOP: add t4, a1, a2 >> lhu t1, 0(t4) >> beg t0, t1, 0xMATCH >> addi a2, a2, 1 >> blez a2, CH1_LOOP >> ``` >> >> so we load halfword on each iteration from address (a1+a2), and while a1 is constant, a2 is incrementing by 1 each step, so every other load is misaligned. > > I see it now. Thanks. Then I am expecting that we could also have similar issue for the `if (needle_con_cnt == 4)` case in the same function where we do `load_4chr` incrementally with 1 byte step when `isLL` variable is true, right? @RealFYang check last commit please, it improves DO2 by replacing one haystack_load_1chr() with one srli in expense one one load before the loop (CH1_LOOP) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581102536 From vkempik at openjdk.org Wed Jun 7 15:56:27 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 15:56:27 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v2] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: make DO2 read by one character from memory per loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/a0880518..e9596561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=00-01 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From yzheng at openjdk.org Wed Jun 7 16:16:01 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 Jun 2023 16:16:01 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. [v2] In-Reply-To: References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> Message-ID: On Wed, 7 Jun 2023 15:39:30 GMT, Vladimir Kozlov wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 222: > >> 220: JVMTI_ONLY(nonstatic_field(JavaThread, _is_in_tmp_VTMS_transition, bool)) \ >> 221: \ >> 222: static_field(JvmtiVTMSTransitionDisabler, _VTMS_notify_jvmti_events, bool) \ > > I think this should be also under `JVMTI_ONLY()`. Did you have issues if you do that? No. Thought it was accessible without JVMTI. Thanks for pointing this out, I have pushed an update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14348#discussion_r1221861206 From yzheng at openjdk.org Wed Jun 7 16:15:59 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 Jun 2023 16:15:59 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. [v2] In-Reply-To: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> Message-ID: <1UrN0Vo4EuCUfmKj-X0g6mI3QLp49fuxDZgDurCy2IM=.ae7f8297-f70e-46f6-b7d8-ada0cb7943e1@github.com> > This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14348/files - new: https://git.openjdk.org/jdk/pull/14348/files/c52b76ec..cd5222b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14348&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14348&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14348/head:pull/14348 PR: https://git.openjdk.org/jdk/pull/14348 From vkempik at openjdk.org Wed Jun 7 17:01:55 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 17:01:55 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 15:56:27 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > make DO2 read by one character from memory per loop I have made a microtest to test specifically DO2 part of string_indexof_linear diff --git a/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java b/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java index 57ced6d8e13..33c8d998d8d 100644 --- a/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java +++ b/test/micro/org/openjdk/bench/java/lang/StringIndexOf.java @@ -46,6 +46,8 @@ public class StringIndexOf { private String shortSub1; private String data2; private String shortSub2; + + private String shortSub3; private String string16Short; private String string16Medium; private String string16Long; @@ -64,6 +66,7 @@ public class StringIndexOf { shortSub1 = "1"; data2 = "00001001010100a10110101010010101110101001110110101010010101010010000010111010101010101010a100010010101110111010101101010100010010a100a0010101111111000010101010010101000010101010010101010101110a10010101010101010101010101010"; shortSub2 = "a"; + shortSub3 = "a1"; searchChar = 's'; string16Short = "scar\u01fe1"; @@ -246,6 +249,20 @@ public class StringIndexOf { return dummy; } + /** + * Benchmarks String.indexOf with a rather big String. Search repeatedly for a matched that is 2 chars but only with + * a few matches. + */ + @Benchmark + public int advancedWithShortSub3() { + int dummy = 0; + int index = 0; + while ((index = data2.indexOf(shortSub3, index)) > -1) { + index++; + dummy += index; + } + return dummy; + } @Benchmark public void constantPattern() { String tmp = "simple-hash:SHA-1/UTF-8"; Results, v1 - original patch in this PR, v2 - latest update to DO2 hifive Benchmark Mode Cnt Score Error Units Before StringIndexOf.advancedWithShortSub3 avgt 25 37302.933 ? 80.306 ns/op V1 StringIndexOf.advancedWithShortSub3 avgt 25 1362.159 ? 37.021 ns/op V2 StringIndexOf.advancedWithShortSub3 avgt 25 1248.750 ? 40.432 ns/op thead Benchmark Mode Cnt Score Error Units Before StringIndexOf.advancedWithShortSub3 avgt 25 632.976 ? 42.601 ns/op V1 StringIndexOf.advancedWithShortSub3 avgt 25 916.040 ? 45.086 ns/op V2 StringIndexOf.advancedWithShortSub3 avgt 25 919.363 ? 21.977 ns/op while hifive benefits the update, thead doesn't care and like misaligned way the most ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581203623 From kvn at openjdk.org Wed Jun 7 17:02:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 Jun 2023 17:02:55 GMT Subject: RFR: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. [v2] In-Reply-To: <1UrN0Vo4EuCUfmKj-X0g6mI3QLp49fuxDZgDurCy2IM=.ae7f8297-f70e-46f6-b7d8-ada0cb7943e1@github.com> References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> <1UrN0Vo4EuCUfmKj-X0g6mI3QLp49fuxDZgDurCy2IM=.ae7f8297-f70e-46f6-b7d8-ada0cb7943e1@github.com> Message-ID: On Wed, 7 Jun 2023 16:15:59 GMT, Yudi Zheng wrote: >> This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14348#pullrequestreview-1468188808 From vkempik at openjdk.org Wed Jun 7 18:12:14 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 18:12:14 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 15:56:27 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > make DO2 read by one character from memory per loop Numbers on DO4 ( comparing 4 characters at once) DO4: hifive Benchmark Mode Cnt Score Error Units before StringIndexOf.advancedWithShortSub4Chars avgt 25 69514.891 ? 128.730 ns/op after StringIndexOf.advancedWithShortSub4Chars avgt 25 2481.448 ? 13.481 ns/op thead Benchmark Mode Cnt Score Error Units before StringIndexOf.advancedWithShortSub4Chars avgt 25 753.125 ? 2.859 ns/op after StringIndexOf.advancedWithShortSub4Chars avgt 25 741.031 ? 9.075 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581288502 From vkempik at openjdk.org Wed Jun 7 18:20:26 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 7 Jun 2023 18:20:26 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v3] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: fix misaligned access in DO4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/e9596561..362af5f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=01-02 Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From duke at openjdk.org Wed Jun 7 20:20:03 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 7 Jun 2023 20:20:03 GMT Subject: Integrated: 8309474: [IR Framework] Wrong @ForceCompile link in README In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 07:11:31 GMT, Eric Nothum wrote: > Fixed the @ForceCompile link to now actually point to the ForceCompile.java This pull request has now been integrated. Changeset: 92beb855 Author: Eric Nothum Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/92beb85510a809b15c9bd5a4c19c305fc339a2c9 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8309474: [IR Framework] Wrong @ForceCompile link in README Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14347 From cslucas at openjdk.org Wed Jun 7 20:22:01 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 7 Jun 2023 20:22:01 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 23 May 2023 17:19:23 GMT, Vladimir Ivanov wrote: >>> I verified that the new test cases do trigger SR+NSR scenario. >>> >>> How do you test that deoptimization works as expected? >>> >> >> I have a copy of the tests in AllocationMergesTests.java in a separate file (not included in this PR) and I run the tests with a tool that compares the output of the test with RAM enabled and disabled. So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. >> >>> Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. >>> >> >> I'll take care of that. I was testing only with PrintDebugInfo. >> >>> FTR `_skip_rematerialization` flag is unused now. >>> >> >> yeah, I forgot to remove that. Thanks. >> >>> Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. >>> >> >> Sounds like a good idea. I'll do that. Thanks. >> >>> Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? >> >> I don't think so. This current patch only handle Phis that don't have NULL as input. As part of the reduction process we set at least one of the reducible Phi inputs to NULL. Therefore, subsequent iterations of EA won't reduce the same Phi. > >> So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. > > Please, enhance `AllocationMergesTests` to cover deoptimization (e.g., using WhiteBox API or additional run w/ -XX:+DeoptimizeALot) and ensure that tests are sensitive enough to fail when wrong state is rematerialized. @iwanowww - I pushed some changes to address your feedback. Please let me know if you have any more comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1581452155 From yzheng at openjdk.org Wed Jun 7 21:20:56 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 Jun 2023 21:20:56 GMT Subject: Integrated: 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. In-Reply-To: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> References: <_yfk7S3iOhM0VbvnYrw9w6DX0SLibMbcp-kx6DCRS50=.bb62411c-113f-4ad9-8dca-a41e7c31279a@github.com> Message-ID: On Wed, 7 Jun 2023 08:04:52 GMT, Yudi Zheng wrote: > This PR allows JVMCI compiler to implement VirtualThread notifyJvmti intrinsics. This pull request has now been integrated. Changeset: 99749c59 Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/99749c597b0be640ca8fd848d874222d69d66ae9 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod 8309562: [JVMCI] Export symbols used by VirtualThread notifyJvmti intrinsics to JVMCI compilers. Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14348 From fjiang at openjdk.org Thu Jun 8 01:04:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 8 Jun 2023 01:04:49 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v3] In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 18:20:26 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix misaligned access in DO4 Changes requested by fjiang (Author). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 689: > 687: if (isLL) > 688: { > 689: //need to erase 1 most significant byte in 32-bit value of ch2 Suggestion: // need to erase 1 most significant byte in 32-bit value of ch2 src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 693: > 691: srli(ch2, ch2, 32); > 692: } else { > 693: slli(ch2, ch2, 16); //2 most significant bytes will be erased by this operation Suggestion: slli(ch2, ch2, 16); // 2 most significant bytes will be erased by this operation ------------- PR Review: https://git.openjdk.org/jdk/pull/14320#pullrequestreview-1468817339 PR Review Comment: https://git.openjdk.org/jdk/pull/14320#discussion_r1222332775 PR Review Comment: https://git.openjdk.org/jdk/pull/14320#discussion_r1222332867 From fyang at openjdk.org Thu Jun 8 02:42:50 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 8 Jun 2023 02:42:50 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v3] In-Reply-To: References: Message-ID: <0jP3hzkM79NMUg3eJODqM5PPDKBGXybx75bWJ1UAvYE=.f7d2239d-01e6-47d6-b800-17c08bfd09d7@github.com> On Wed, 7 Jun 2023 18:20:26 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix misaligned access in DO4 @VladimirKempik : Thanks for the update. Would you mind one more tweak? Since `needle_chr_shift` and `haystack_chr_shift` could be 0 for the L case, I think we should guard the shift instructions at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L634, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L637, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L679, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L700, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L724, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L775 with conditions `if (needle_chr_shift)` or `if (haystack_chr_shift)`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 494: > 492: bne(tmp3, skipch, BMSKIP); // if not equal, skipch is bad char > 493: add(result, haystack, isLL ? nlen_tmp : ch2); > 494: load_long_misaligned(ch2, Address(result), ch1); // can use ch1 as tmpreg here as it will be trashed on next mv command anyway Suggestion for comment: // can use ch1 as temp register here as it will be trashed by next mv anyway src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 688: > 686: (this->*load_4chr)(ch2, Address(tmp3), noreg); > 687: if (isLL) > 688: { Suggestion for coding style: `if (isLL) {` ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14320#pullrequestreview-1468875975 PR Review Comment: https://git.openjdk.org/jdk/pull/14320#discussion_r1222369153 PR Review Comment: https://git.openjdk.org/jdk/pull/14320#discussion_r1222369603 From duke at openjdk.org Thu Jun 8 02:52:19 2023 From: duke at openjdk.org (Chang Peng) Date: Thu, 8 Jun 2023 02:52:19 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 Message-ID: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). Test: All vector and vectorapi test passed. Performance: The benchmark function is like: @Benchmark public static int testInt() { int res = 0; for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) { VectorMask m = VectorMask.fromArray(INT_SPECIES, ia, i); res += m.firstTrue(); } return res; } Following data is collected on a 128-bit Neon machine. Benchmark Before After Unit testInt 22214.740 25627.833 ops/ms testLong 11649.898 13698.535 ops/ms [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Change-Id: I4a2de805ffa4469f88d510c96617eae165f0e025 ------------- Commit messages: - 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 Changes: https://git.openjdk.org/jdk/pull/14373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309583 Stats: 84 lines in 2 files changed: 14 ins; 58 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373 PR: https://git.openjdk.org/jdk/pull/14373 From fyang at openjdk.org Thu Jun 8 02:54:52 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 8 Jun 2023 02:54:52 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v2] In-Reply-To: References: Message-ID: <8q9m6txSh_u_Y3SpCXBhL1KWK2HoA_6P0aAELLBUmvw=.cc546c58-93d6-44a8-9c2b-aa2cc4ba1c8d@github.com> On Wed, 7 Jun 2023 18:09:51 GMT, Vladimir Kempik wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> make DO2 read by one character from memory per loop > > Numbers on DO4 ( comparing 4 characters at once) ( substring has to be final String of 4 characters) > DO4: > > hifive > Benchmark Mode Cnt Score Error Units > before > StringIndexOf.advancedWithShortSub4Chars avgt 25 69514.891 ? 128.730 ns/op > after > StringIndexOf.advancedWithShortSub4Chars avgt 25 2481.448 ? 13.481 ns/op > > thead > > Benchmark Mode Cnt Score Error Units > before > StringIndexOf.advancedWithShortSub4Chars avgt 25 753.125 ? 2.859 ns/op > after > StringIndexOf.advancedWithShortSub4Chars avgt 25 741.031 ? 9.075 ns/op > @VladimirKempik : Thanks for the update. Would you mind one more tweak? Since `needle_chr_shift` and `haystack_chr_shift` could be 0 for the L case, I think we should guard the shift instructions at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L634, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L637, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L679, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L700, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L724, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L775 with conditions `if (needle_chr_shift)` or `if (haystack_chr_shift)`. Ah, this won't save us any instructions. Let's change this code snippet: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#LL754-L757 into a single line: `slli(tmp3, result_tmp, haystack_chr_shift);` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581808899 From fgao at openjdk.org Thu Jun 8 04:48:48 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 8 Jun 2023 04:48:48 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required In-Reply-To: References: Message-ID: On Tue, 23 May 2023 07:16:48 GMT, Emanuel Peter wrote: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java line 166: > 164: // No true dependency in read-forward case. > 165: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > 166: counts = {IRNode.STORE_VECTOR, ">0"}) You may need add `applyIf = {"AlignVector", "false"}` for these newly added IR check rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14096#discussion_r1222444036 From vkempik at openjdk.org Thu Jun 8 05:49:05 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 05:49:05 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: fix nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/362af5f8..185a811d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=02-03 Stats: 11 lines in 1 file changed: 1 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From vkempik at openjdk.org Thu Jun 8 05:49:07 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 05:49:07 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v2] In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 18:09:51 GMT, Vladimir Kempik wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> make DO2 read by one character from memory per loop > > Numbers on DO4 ( comparing 4 characters at once) ( substring has to be final String of 4 characters) > DO4: > > hifive > Benchmark Mode Cnt Score Error Units > before > StringIndexOf.advancedWithShortSub4Chars avgt 25 69514.891 ? 128.730 ns/op > after > StringIndexOf.advancedWithShortSub4Chars avgt 25 2481.448 ? 13.481 ns/op > > thead > > Benchmark Mode Cnt Score Error Units > before > StringIndexOf.advancedWithShortSub4Chars avgt 25 753.125 ? 2.859 ns/op > after > StringIndexOf.advancedWithShortSub4Chars avgt 25 741.031 ? 9.075 ns/op > @VladimirKempik : Thanks for the update. Would you mind one more tweak? Since `needle_chr_shift` and `haystack_chr_shift` could be 0 for the L case, I think we should guard the shift instructions at https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L634, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L637, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L679, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L700, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L724, https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L775 with conditions `if (needle_chr_shift)` or `if (haystack_chr_shift)`. It's questionable, mv(Xd, Xs) becomes addi(Xd, Xs, 0). and what is chearper - addi(Xd, Xs,0) or slli(Xd, Xs,0) is an open question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581924217 From fyang at openjdk.org Thu Jun 8 06:17:49 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 8 Jun 2023 06:17:49 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 05:49:05 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14320#pullrequestreview-1469075239 From vkempik at openjdk.org Thu Jun 8 06:20:50 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 06:20:50 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 05:49:05 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits I have asked our HW team, addi sometimes can be cheaper than slli: addi Xd, Xs, 0 can be resolved by register renaming ( not on every uarch tho), without using ALU. and slli always uses ALU So it may be worth it to add umbrella for slli in macroassembler, if shift amount is zero - use addi instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1581953365 From fjiang at openjdk.org Thu Jun 8 06:53:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 8 Jun 2023 06:53:49 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 05:49:05 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits Marked as reviewed by fjiang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/14320#pullrequestreview-1469123165 From epeter at openjdk.org Thu Jun 8 07:40:59 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 07:40:59 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 07:35:52 GMT, Daohan Qu wrote: >> @quadhier Thanks for looking into this! This but is currently not assigned to you. Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Currently, @enothum had it assigned and was also working on it. >> >> The regression test cleanly reproduces before the patch, good. >> >> Why is the overflow acceptable? Does that not mean that the calculation did something wrong? >> >> In the example, we have >> >> init_con = 3 >> limit_con = 2147483647 = max_jint >> stride_con = 3 >> stride_m = stride_con - 1 = 2 >> trip_count = (limit_con - init_con + stride_m)/stride_con = 715827882 >> final_con = init_con + stride_con*trip_count = 2147483649 = max_jint + 2 (overflow!) >> final_int = -2147483647 (overflow!) >> >> >> Does that not mean that we mis-calculated the `trip_count`? If it was 1 less, we would not have an overflow. Would that not fix the issue in a simpler way? Or did I get something wrong? >> >> Let's expand the formula: >> >> final_con = init_con + stride_con*trip_count >> final_con = init_con + stride_con * ((limit_con - init_con + stride_m) / stride_con) >> final_con = init_con + stride_con * ((limit_con - init_con + stride_con - 1) / stride_con) >> >> >> Is the issue not that instead of coming up with a final value that is slightly below `limit_con`, we come up with one that is slightly above `limit_con`, and can thus overflow? >> >> Would this be correct instead (for positive stride)? >> >> final_con = init_con + stride_con * ((limit_con - init_con + 1 - stride_con) / stride_con) >> >> >> Could the limit type ever overflow at runtime? Does the loop limit check not prevent that (unsure)? >> >> Can you explain again why exactly we calculate what we calculate here, and why that is correct? > > This patch causes many compilation timeouts. I'd close this PR and wait for a better fix. Thanks for your review and suggestion @eme64 ! @quadhier ok, good luck! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582060428 From duke at openjdk.org Thu Jun 8 07:40:59 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 8 Jun 2023 07:40:59 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 14:31:19 GMT, Emanuel Peter wrote: >> This patch should fix [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). A `jtreg` test is also added. I'd appreciate any comments and reviews. Thanks in advance! >> >> ## Problem Analysis >> >> For the following program, >> >> public class Test { >> >> static boolean flag; >> >> public static void main(String[] args) { >> for (int i = 0; i < 10000; i++) { >> flag = !flag; >> test(); >> } >> } >> >> public static void test() { >> int limit = flag ? Integer.MAX_VALUE : 1000; >> >> int i = 0; >> while (i < limit) { >> i += 3; >> if (flag) { >> return; >> } >> } >> } >> } >> >> A `LoopLimitNode` will be generated and its `Limit` input is a `PhiNode`, as depicted in the following picture. >> >> phi_as_limit >> >> During `PhaseCCP`, the `LoopLimitNode::Value()` tries to calculate the constant final value: >> https://github.com/openjdk/jdk/blob/16ebf47fe3b0fac7b67acfa589a26abf8843306b/src/hotspot/share/opto/loopnode.cpp#L2289-L2301 >> >> The problem is that the assertion in `line 2299` could fail during CCP though it must hold true at the end of CCP. Here is the reason: `PhaseCCP` initializes all nodes with the type `TOP` and iterates in an "arbitrary" order. The following order may happen: >> >> 28 IfTrue => 34 Region => 36 Phi => 195 LoopLimit => ... => 29 IfFalse >> >> 1. In `ProjNode::Value()` (`IfTrue` inherits it), the type of `IfTrue` is set to `Type::CONTROL` >> https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/multnode.cpp#L168-L171 >> >> 2. In `PhiNode::Value()`, only `28 IfTrue`'s correspondence `33 ConI` gets merged (as `29 IfFalse` has not been dealt with yet), then it has a value of `int:max`. >> https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/cfgnode.cpp#L1269-L1277 >> >> 3. In `LoopLimitNode::Value()`, it finds its `Limit` input `36 Phi` is constant, which triggers the assertion, and the assertion fails since the final value calculated from that constant limit (`int:max`) overflows. >> >> ## Solution >> Move the overflow check to the end of CCP, where it must not fail. > > @quadhier Thanks for looking into this! This but is currently not assigned to you. Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Currently, @enothum had it assigned and was also working on it. > > The regression test cleanly reproduces before the patch, good. > > Why is the overflow acceptable? Does that not mean that the calculation did something wrong? > > In the example, we have > > init_con = 3 > limit_con = 2147483647 = max_jint > stride_con = 3 > stride_m = stride_con - 1 = 2 > trip_count = (limit_con - init_con + stride_m)/stride_con = 715827882 > final_con = init_con + stride_con*trip_count = 2147483649 = max_jint + 2 (overflow!) > final_int = -2147483647 (overflow!) > > > Does that not mean that we mis-calculated the `trip_count`? If it was 1 less, we would not have an overflow. Would that not fix the issue in a simpler way? Or did I get something wrong? > > Let's expand the formula: > > final_con = init_con + stride_con*trip_count > final_con = init_con + stride_con * ((limit_con - init_con + stride_m) / stride_con) > final_con = init_con + stride_con * ((limit_con - init_con + stride_con - 1) / stride_con) > > > Is the issue not that instead of coming up with a final value that is slightly below `limit_con`, we come up with one that is slightly above `limit_con`, and can thus overflow? > > Would this be correct instead (for positive stride)? > > final_con = init_con + stride_con * ((limit_con - init_con + 1 - stride_con) / stride_con) > > > Could the limit type ever overflow at runtime? Does the loop limit check not prevent that (unsure)? > > Can you explain again why exactly we calculate what we calculate here, and why that is correct? This patch causes many compilation timeouts. I'd close this PR and wait for a better fix. Thanks for your review and suggestion @eme64 ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582057788 From duke at openjdk.org Thu Jun 8 07:41:01 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 8 Jun 2023 07:41:01 GMT Subject: Withdrawn: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 12:33:48 GMT, Daohan Qu wrote: > This patch should fix [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). A `jtreg` test is also added. I'd appreciate any comments and reviews. Thanks in advance! > > ## Problem Analysis > > For the following program, > > public class Test { > > static boolean flag; > > public static void main(String[] args) { > for (int i = 0; i < 10000; i++) { > flag = !flag; > test(); > } > } > > public static void test() { > int limit = flag ? Integer.MAX_VALUE : 1000; > > int i = 0; > while (i < limit) { > i += 3; > if (flag) { > return; > } > } > } > } > > A `LoopLimitNode` will be generated and its `Limit` input is a `PhiNode`, as depicted in the following picture. > > phi_as_limit > > During `PhaseCCP`, the `LoopLimitNode::Value()` tries to calculate the constant final value: > https://github.com/openjdk/jdk/blob/16ebf47fe3b0fac7b67acfa589a26abf8843306b/src/hotspot/share/opto/loopnode.cpp#L2289-L2301 > > The problem is that the assertion in `line 2299` could fail during CCP though it must hold true at the end of CCP. Here is the reason: `PhaseCCP` initializes all nodes with the type `TOP` and iterates in an "arbitrary" order. The following order may happen: > > 28 IfTrue => 34 Region => 36 Phi => 195 LoopLimit => ... => 29 IfFalse > > 1. In `ProjNode::Value()` (`IfTrue` inherits it), the type of `IfTrue` is set to `Type::CONTROL` > https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/multnode.cpp#L168-L171 > > 2. In `PhiNode::Value()`, only `28 IfTrue`'s correspondence `33 ConI` gets merged (as `29 IfFalse` has not been dealt with yet), then it has a value of `int:max`. > https://github.com/openjdk/jdk/blob/fa791119f0b73cd1e110d6a62d3bed58fee5740a/src/hotspot/share/opto/cfgnode.cpp#L1269-L1277 > > 3. In `LoopLimitNode::Value()`, it finds its `Limit` input `36 Phi` is constant, which triggers the assertion, and the assertion fails since the final value calculated from that constant limit (`int:max`) overflows. > > ## Solution > Move the overflow check to the end of CCP, where it must not fail. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14353 From epeter at openjdk.org Thu Jun 8 07:43:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 07:43:58 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> On Thu, 8 Jun 2023 07:35:52 GMT, Daohan Qu wrote: >> @quadhier Thanks for looking into this! This but is currently not assigned to you. Please always make sure that you have it assigned to you, or at least mention in JIRA that you are working on it. Currently, @enothum had it assigned and was also working on it. >> >> The regression test cleanly reproduces before the patch, good. >> >> Why is the overflow acceptable? Does that not mean that the calculation did something wrong? >> >> In the example, we have >> >> init_con = 3 >> limit_con = 2147483647 = max_jint >> stride_con = 3 >> stride_m = stride_con - 1 = 2 >> trip_count = (limit_con - init_con + stride_m)/stride_con = 715827882 >> final_con = init_con + stride_con*trip_count = 2147483649 = max_jint + 2 (overflow!) >> final_int = -2147483647 (overflow!) >> >> >> Does that not mean that we mis-calculated the `trip_count`? If it was 1 less, we would not have an overflow. Would that not fix the issue in a simpler way? Or did I get something wrong? >> >> Let's expand the formula: >> >> final_con = init_con + stride_con*trip_count >> final_con = init_con + stride_con * ((limit_con - init_con + stride_m) / stride_con) >> final_con = init_con + stride_con * ((limit_con - init_con + stride_con - 1) / stride_con) >> >> >> Is the issue not that instead of coming up with a final value that is slightly below `limit_con`, we come up with one that is slightly above `limit_con`, and can thus overflow? >> >> Would this be correct instead (for positive stride)? >> >> final_con = init_con + stride_con * ((limit_con - init_con + 1 - stride_con) / stride_con) >> >> >> Could the limit type ever overflow at runtime? Does the loop limit check not prevent that (unsure)? >> >> Can you explain again why exactly we calculate what we calculate here, and why that is correct? > > This patch causes many compilation timeouts. I'd close this PR and wait for a better fix. Thanks for your review and suggestion @eme64 ! @quadhier So you intend to keep working on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582063718 From jwaters at openjdk.org Thu Jun 8 07:45:53 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 8 Jun 2023 07:45:53 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Anyone? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1582066068 From duke at openjdk.org Thu Jun 8 08:06:57 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 8 Jun 2023 08:06:57 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Thu, 8 Jun 2023 07:40:54 GMT, Emanuel Peter wrote: >> This patch causes many compilation timeouts. I'd close this PR and wait for a better fix. Thanks for your review and suggestion @eme64 ! > > @quadhier So you intend to keep working on this? @eme64 Sorry, I won't work on this bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582095589 From epeter at openjdk.org Thu Jun 8 08:16:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 08:16:01 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Thu, 8 Jun 2023 08:03:34 GMT, Daohan Qu wrote: >> @quadhier So you intend to keep working on this? > > @eme64 Sorry, I won't work on this bug. @quadhier I hope I did not discourage you, my feedback yesterday was a bit scattered and maybe overwhealming, I'm sorry for that. These things are not easy to get right. I was impressed how far you got! Let me know if you want to take this back up, or want another task to work on - though a JBS account would help ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582109181 From duke at openjdk.org Thu Jun 8 08:31:58 2023 From: duke at openjdk.org (Daohan Qu) Date: Thu, 8 Jun 2023 08:31:58 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Thu, 8 Jun 2023 08:12:55 GMT, Emanuel Peter wrote: >> @eme64 Sorry, I won't work on this bug. > > @quadhier I hope I did not discourage you, my feedback yesterday was a bit scattered and maybe overwhealming, I'm sorry for that. > These things are not easy to get right. I was impressed how far you got! > Let me know if you want to take this back up, or want another task to work on - though a JBS account would help ;) @eme64 That's alright! I appreciate your telling me about some disciplines not written in the contributor guides. :P After some explorations, I realized that I didn't fully understand the root cause of this bug. Since I use my spare time to contribute, I think it would be better for some other experts to work on this so that we don't have to wait too long. I'm applying for an `Author` role and hope then I could continue contributing. Many thanks for your kindness again! :D ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582131608 From epeter at openjdk.org Thu Jun 8 08:31:59 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 08:31:59 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Thu, 8 Jun 2023 08:26:54 GMT, Daohan Qu wrote: >> @quadhier I hope I did not discourage you, my feedback yesterday was a bit scattered and maybe overwhealming, I'm sorry for that. >> These things are not easy to get right. I was impressed how far you got! >> Let me know if you want to take this back up, or want another task to work on - though a JBS account would help ;) > > @eme64 That's alright! I appreciate your telling me about some disciplines not written in the contributor guides. :P > > After some explorations, I realized that I didn't fully understand the root cause of this bug. Since I use my spare time to contribute, I think it would be better for some other experts to work on this so that we don't have to wait too long. I'm applying for an `Author` role and hope then I could continue contributing. Many thanks for your kindness again! :D @quadhier Ok, sounds good. Looking forward to your future PR's! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1582134400 From aph at openjdk.org Thu Jun 8 08:35:54 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 8 Jun 2023 08:35:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 05:49:05 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable. Sprinkling '`if (AvoidUnalignedAccesses)`' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1582139173 From epeter at openjdk.org Thu Jun 8 08:56:19 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 08:56:19 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v2] In-Reply-To: References: Message-ID: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - IR whitelist AlignVector, require it false in the newly added tests - Merge branch 'master' into JDK-8308606 - Merge branch 'master' into JDK-8308606 - remove some outdated comments - Benchmark VectorAlignment - Merge branch 'master' into JDK-8308606 - remove dead code and add offset printing - fix typo - 8308606: C2 SuperWord: remove alignment checks where not required ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14096/files - new: https://git.openjdk.org/jdk/pull/14096/files/7333d115..b00a7760 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=00-01 Stats: 60053 lines in 911 files changed: 47816 ins; 7685 del; 4552 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From epeter at openjdk.org Thu Jun 8 08:56:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 08:56:20 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v2] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 04:43:38 GMT, Fei Gao wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - IR whitelist AlignVector, require it false in the newly added tests >> - Merge branch 'master' into JDK-8308606 >> - Merge branch 'master' into JDK-8308606 >> - remove some outdated comments >> - Benchmark VectorAlignment >> - Merge branch 'master' into JDK-8308606 >> - remove dead code and add offset printing >> - fix typo >> - 8308606: C2 SuperWord: remove alignment checks where not required > > test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java line 166: > >> 164: // No true dependency in read-forward case. >> 165: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, >> 166: counts = {IRNode.STORE_VECTOR, ">0"}) > > You may need add `applyIf = {"AlignVector", "false"}` for these newly added IR check rules. @fg1417 you are right! I think we should also add `AlignVector` to the IR whitelist. It makes sense to add it with this change here, because only from now on can we actually have misaligned loads / stores on the same memory-slice! So we should also test things more thoroughly now. I also see that in `test/hotspot/jtreg/compiler/vectorization/runner/` a lot of tests have `@requires vm.flagless`. That means we actually do not check any flag combinations with those tests. I think we should file an RFE to make them more general, and add the requirements to the IR rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14096#discussion_r1222667056 From aph at openjdk.org Thu Jun 8 09:17:47 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 8 Jun 2023 09:17:47 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> On Thu, 8 Jun 2023 02:44:08 GMT, Chang Peng wrote: > This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. > > VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. > > This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). > > Test: > All vector and vectorapi test passed. > > Performance: > The benchmark function is like: > > > @Benchmark > public static int testInt() { > int res = 0; > for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) { > VectorMask m = VectorMask.fromArray(INT_SPECIES, ia, i); > res += m.firstTrue(); > } > > return res; > } > > > Following data is collected on a 128-bit Neon machine. > > Benchmark Before After Unit > testInt 22214.740 25627.833 ops/ms > testLong 11649.898 13698.535 ops/ms > > [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() > [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 > [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Where is the benchmark? You don't seem to have included it in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1582198629 From aivanov at openjdk.org Thu Jun 8 11:22:52 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Thu, 8 Jun 2023 11:22:52 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning I'll take a look? hopefully next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1582402901 From roland at openjdk.org Thu Jun 8 11:34:07 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 8 Jun 2023 11:34:07 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class Message-ID: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> In this simple micro benchmark: https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 Performance drops sharply with polluted profile: Benchmark (typePollution) Mode Cnt Score Error Units RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us to: Benchmark (typePollution) Mode Cnt Score Error Units RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us The test has 2 type checks to 2 different interfaces so caching with `secondary_super_cache` doesn't help. The micro-benchmark only uses 2 different concrete classes (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded in profile data at the type checks. But c2 only take advantage of profile data at type checks if they report a single class. What I propose is that the full blown type check expanded in `Phase::gen_subtype_check()` takes advantage of profile data. So in the case of the micro benchmark, before checking the `secondary_super_cache`, generated code checks whether the object being type checked is a `DuplicatedContext` or a `NonDuplicatedContext`. This works fairly well on this micro benchmark: Benchmark (typePollution) Mode Cnt Score Error Units RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us It also scales much better if there are multiple threads running the same test (`secondary_super_cache` doesn't scale well: see JDK-8180450). Now if the micro-benchmark is changed according to the comment: https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 so the type check hits in the `secondary_super_cache`, the current code performs much better: Benchmark (typePollution) Mode Cnt Score Error Units RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us but leveraging profiling as explained above performs even better: Benchmark (typePollution) Mode Cnt Score Error Units RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 1165.474 ? 70.171 ops/us I think it's actually likely that there's a performance advantage even if profiling sees more than 2 types at a type check unless the profile is heavily polluted. The problem is that, the way current profile data is collected, we can't tell if the profile is heavily polluted because, unlike profiling at virtual calls, there's no counter for non recorded types. The `count` field is used to count failed type checks instead. JVMCI added a `nonprofiled_count`. I thought about using that one but it seems after looking at the way c2 uses the failed type check count that it would be simpler to simply collect profile data at type checks the way it's done at virtual calls. Indeed, C2 uses the unique class reported by profile data only if there was no failed type checks recorded in profile data but: - at checkcasts, it also checks that it can prove the check would statically fold. That last check seems to be the one that matters. - at instanceof, AFAICT, a profiled type that causes the instanceof to fail is as valuable as one that makes it succeed so it would be better to ignore failures reported by profiling. I also discussed this briefly with Tom and he said graal doesn't need the failed type check count. So, in the patch I propose, I changed the way profile data is collected so it works the same it does at virtual call. If this patch is accepted, I'll need help with platforms other than x86 and aarch64. I also modified the JVMCI code. BTW, I also wonder if `VirtualCallData.getMethodProfile()` is not obsolete. Finally, I changed `Phase::gen_subtype_check()` so it emits the extra checks. That method is now called at macro expansion when profile data is not longer available. So I attached profile data to the `SubTypeCheck` node. For each profile data entry, 2 edges are added: one for the klass, one for the profile frequency. Because `SubTypeCheck` now has extra edges, it can happen that 2 `SubTypeCheck` nodes that perform the same subtype check don't common during `IGVN` which can get in the way of some optimizations. I had to make some adjustments to the logic of split if and code that looks for dominating identical checks because of that. ------------- Commit messages: - white spaces - fix & test Changes: https://git.openjdk.org/jdk/pull/14375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308869 Stats: 849 lines in 24 files changed: 523 ins; 232 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From vkempik at openjdk.org Thu Jun 8 12:24:54 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 12:24:54 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v5] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Increase granularity when isLL is false ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/185a811d..45498879 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From vkempik at openjdk.org Thu Jun 8 12:24:55 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 12:24:55 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: References: Message-ID: <6UqJEHkalCTR1_EcmUqrq4P_bgseywAiEHLmw_HFGrw=.28ee5ca2-05fd-4701-bd04-8a93e208b931@github.com> On Thu, 8 Jun 2023 08:32:31 GMT, Andrew Haley wrote: > I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable. Sprinkling '`if (AvoidUnalignedAccesses)`' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions. Hello, do you mean things like load_XXXX_misaligned ( e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1735 ) or more complicated things ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1582483023 From epeter at openjdk.org Thu Jun 8 12:41:48 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 12:41:48 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v2] In-Reply-To: References: Message-ID: <5IXW95IQe1Emlp4SCKkR19qcKH0csQnIuop0QVO21dY=.aa6711f6-cf12-4226-ad46-8e305f90cb45@github.com> On Thu, 8 Jun 2023 08:48:43 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java line 166: >> >>> 164: // No true dependency in read-forward case. >>> 165: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, >>> 166: counts = {IRNode.STORE_VECTOR, ">0"}) >> >> You may need add `applyIf = {"AlignVector", "false"}` for these newly added IR check rules. > > @fg1417 you are right! > > I think we should also add `AlignVector` to the IR whitelist. It makes sense to add it with this change here, because only from now on can we actually have misaligned loads / stores on the same memory-slice! So we should also test things more thoroughly now. > > I also see that in `test/hotspot/jtreg/compiler/vectorization/runner/` a lot of tests have `@requires vm.flagless`. That means we actually do not check any flag combinations with those tests. I think we should file an RFE to make them more general, and add the requirements to the IR rules. Ok, there are some bugs around, I cannot yet directly add `AlignVector` to the IR framework whitelist. We can do it in a follow up RFE https://bugs.openjdk.org/browse/JDK-8309662 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14096#discussion_r1222982009 From epeter at openjdk.org Thu Jun 8 12:47:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 Jun 2023 12:47:25 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v3] In-Reply-To: References: Message-ID: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: removed AlignVector from IR framework again, do that in RFE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14096/files - new: https://git.openjdk.org/jdk/pull/14096/files/b00a7760..c554e6c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From duke at openjdk.org Thu Jun 8 13:19:57 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 8 Jun 2023 13:19:57 GMT Subject: RFR: 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS Message-ID: README now refers to JTREG_WHITELIST_FLAGS instead of JTREG_WHITE_LIST_FLAGS ------------- Commit messages: - 8307620: Update README.md Changes: https://git.openjdk.org/jdk/pull/14377/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14377&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307620 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14377.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14377/head:pull/14377 PR: https://git.openjdk.org/jdk/pull/14377 From rcastanedalo at openjdk.org Thu Jun 8 13:29:47 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 8 Jun 2023 13:29:47 GMT Subject: RFR: 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 12:52:06 GMT, Eric Nothum wrote: > README now refers to JTREG_WHITELIST_FLAGS instead of JTREG_WHITE_LIST_FLAGS Looks good and trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14377#pullrequestreview-1469913881 From chagedorn at openjdk.org Thu Jun 8 13:29:48 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 Jun 2023 13:29:48 GMT Subject: RFR: 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 12:52:06 GMT, Eric Nothum wrote: > README now refers to JTREG_WHITELIST_FLAGS instead of JTREG_WHITE_LIST_FLAGS Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14377#pullrequestreview-1469918037 From vkempik at openjdk.org Thu Jun 8 13:30:51 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 8 Jun 2023 13:30:51 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v5] In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 12:24:54 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Increase granularity when isLL is false First change, at [Line496](https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6R496) regresses performance of indexOf based on Boyer-Moore-Horspool algo on thead : Before: org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub 2790.160 ? 56.442 ns/op After: org.openjdk.bench.java.lang.StringIndexOf.advancedWithMediumSub 3377.943 ? 42.496 ns/op I think this could be improved Currently, when we compare a needle and a region of haystack, we first read last 8 bytes from both regions then compare them, then if they match, compare rest byte per byte. Reading 8 bytes from haystack is not always aligned or misaligned, we can read 4 or 2 bytes for first comparision, reducing wasted reads from haystack ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1582582020 From tholenstein at openjdk.org Thu Jun 8 14:29:56 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 8 Jun 2023 14:29:56 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v3] In-Reply-To: References: Message-ID: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as ... Tobias Holenstein has updated the pull request incrementally with 11 additional commits since the last revision: - Update TestStableUShort.java - Update TestStableUByte.java - Update TestStableShort.java - Update TestStableObject.java - Update TestStableLong.java - Update TestStableInt.java - Update TestStableFloat.java - Update TestStableDouble.java - Update TestStableChar.java - Update TestStableByte.java - ... and 1 more: https://git.openjdk.org/jdk/compare/40b17296...e320a9de ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13802/files - new: https://git.openjdk.org/jdk/pull/13802/files/40b17296..e320a9de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=01-02 Stats: 44 lines in 11 files changed: 0 ins; 0 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From duke at openjdk.org Thu Jun 8 15:02:49 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 8 Jun 2023 15:02:49 GMT Subject: RFR: 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 13:24:40 GMT, Roberto Casta?eda Lozano wrote: >> README now refers to JTREG_WHITELIST_FLAGS instead of JTREG_WHITE_LIST_FLAGS > > Looks good and trivial. Thanks for the reviews @robcasloz and @chhagedorn :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14377#issuecomment-1582751220 From roland at openjdk.org Thu Jun 8 15:05:30 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 8 Jun 2023 15:05:30 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v2] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: 32 bit fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/7b47aec4..72ef4189 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=00-01 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From never at openjdk.org Thu Jun 8 16:14:55 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 8 Jun 2023 16:14:55 GMT Subject: Integrated: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 17:20:48 GMT, Tom Rodriguez wrote: > intermittent failures with Graal on ContinuousCallSiteTargetChange showed that when constructing the CallSiteTargetValue Assumption we read the value twice so the dependency and the value in the program might be different leading to incorrect execution. This pull request has now been integrated. Changeset: bb966827 Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/bb966827ac445d805bac5005d0fbda0c61111252 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod 8309498: [JVMCI] race in CallSiteTargetValue recording Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14315 From duke at openjdk.org Thu Jun 8 17:28:51 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 8 Jun 2023 17:28:51 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v7] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into avx512sort - fix license in one file - Update test/micro/org/openjdk/bench/java/util/ArraysSort.java Co-authored-by: Andrew Haley - fix license - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort - remove libstdc++ - 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/1dc9589e..3bd12ec5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=05-06 Stats: 57637 lines in 809 files changed: 47983 ins; 6934 del; 2720 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From aph at openjdk.org Thu Jun 8 20:48:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 8 Jun 2023 20:48:42 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v4] In-Reply-To: <6UqJEHkalCTR1_EcmUqrq4P_bgseywAiEHLmw_HFGrw=.28ee5ca2-05fd-4701-bd04-8a93e208b931@github.com> References: <6UqJEHkalCTR1_EcmUqrq4P_bgseywAiEHLmw_HFGrw=.28ee5ca2-05fd-4701-bd04-8a93e208b931@github.com> Message-ID: <_oDUkvjvE9uIavyUqgA6axMM09r4ia4869tBdmd-dbY=.24fa89a3-fee5-4338-afe7-cd9dffa92f9e@github.com> On Thu, 8 Jun 2023 12:19:36 GMT, Vladimir Kempik wrote: > > I am very concerned about the increased complexity and maintenance burden caused by these unaligned access patches. While RISC-V is not a mainstream arch at this time, it may become one, and it that happens we'll need something reasonably maintainable. Sprinkling '`if (AvoidUnalignedAccesses)`' all over the back end is disastrous for readability. I urge you to find a more abstract solution, for example by creating a memory access assembler class and subclassing it as appropriate with aligned and unaligned versions. > > Hello, do you mean things like load_XXXX_misaligned ( e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1735 ) or more complicated things ? That's certainly a good start, although I believe its implementation could be much improved. But everywhere you see `if (AvoidUnalignedAccesses)` is potentially a candidate for factoring out the parts and moving them into a misaligned memory access class. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1583318757 From pli at openjdk.org Fri Jun 9 02:16:41 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 9 Jun 2023 02:16:41 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v3] In-Reply-To: <5IXW95IQe1Emlp4SCKkR19qcKH0csQnIuop0QVO21dY=.aa6711f6-cf12-4226-ad46-8e305f90cb45@github.com> References: <5IXW95IQe1Emlp4SCKkR19qcKH0csQnIuop0QVO21dY=.aa6711f6-cf12-4226-ad46-8e305f90cb45@github.com> Message-ID: On Thu, 8 Jun 2023 12:38:51 GMT, Emanuel Peter wrote: >> @fg1417 you are right! >> >> I think we should also add `AlignVector` to the IR whitelist. It makes sense to add it with this change here, because only from now on can we actually have misaligned loads / stores on the same memory-slice! So we should also test things more thoroughly now. >> >> I also see that in `test/hotspot/jtreg/compiler/vectorization/runner/` a lot of tests have `@requires vm.flagless`. That means we actually do not check any flag combinations with those tests. I think we should file an RFE to make them more general, and add the requirements to the IR rules. > > Ok, there are some bugs around, I cannot yet directly add `AlignVector` to the IR framework whitelist. We can do it in a follow up RFE https://bugs.openjdk.org/browse/JDK-8309662 Hi @eme64, I'd like to explain more about the `@requires vm.flagless`. Vladimir Kozlov had suggested removing those annotations. I didn't do that before because those annotations cannot be simply removed. All tests under `compiler/vectorization/runner/` are used for both correctness check and vectorizability (IR) check. For correctness check, each test method is invoked twice and the return results from the interpreter and C2 compiled code are compared. We use compiler control via WhiteBox API from the test runner to force these methods running in interpreter and C2 (see the logic in `VectorizationTestRunner.java`). The force compilation would fail if some extra vm option of compiler control (such as `-Xint`) is specified. A way of removing `@requires vm.flagless` I can think of may be skipping the correctness check in the vectorization test runner if the compiler control fails. I just filed [JDK-8309697](https://bugs.openjdk.org/browse/JDK-8309697) for this. Please let me know if you have any better ideas or suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14096#discussion_r1223758382 From duke at openjdk.org Fri Jun 9 06:12:49 2023 From: duke at openjdk.org (Eric Nothum) Date: Fri, 9 Jun 2023 06:12:49 GMT Subject: Integrated: 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 12:52:06 GMT, Eric Nothum wrote: > README now refers to JTREG_WHITELIST_FLAGS instead of JTREG_WHITE_LIST_FLAGS This pull request has now been integrated. Changeset: 0a697e73 Author: Eric Nothum Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/0a697e73d5e444710a35a5d373431328a421a336 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8307620: [IR Framework] Readme mentions JTREG_WHITE_LIST_FLAGS instead of JTREG_WHITELIST_FLAGS Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14377 From epeter at openjdk.org Fri Jun 9 06:38:43 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jun 2023 06:38:43 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v3] In-Reply-To: References: <5IXW95IQe1Emlp4SCKkR19qcKH0csQnIuop0QVO21dY=.aa6711f6-cf12-4226-ad46-8e305f90cb45@github.com> Message-ID: On Fri, 9 Jun 2023 02:14:18 GMT, Pengfei Li wrote: >> Ok, there are some bugs around, I cannot yet directly add `AlignVector` to the IR framework whitelist. We can do it in a follow up RFE https://bugs.openjdk.org/browse/JDK-8309662 > > Hi @eme64, I'd like to explain more about the `@requires vm.flagless`. Vladimir Kozlov had suggested removing those annotations. I didn't do that before because those annotations cannot be simply removed. All tests under `compiler/vectorization/runner/` are used for both correctness check and vectorizability (IR) check. For correctness check, each test method is invoked twice and the return results from the interpreter and C2 compiled code are compared. We use compiler control via WhiteBox API from the test runner to force these methods running in interpreter and C2 (see the logic in `VectorizationTestRunner.java`). The force compilation would fail if some extra vm option of compiler control (such as `-Xint`) is specified. > > A way of removing `@requires vm.flagless` I can think of may be skipping the correctness check in the vectorization test runner if the compiler control fails. I just filed [JDK-8309697](https://bugs.openjdk.org/browse/JDK-8309697) for this. Please let me know if you have any better ideas or suggestions. @pfustc If I run it with `-Xint`, then it says "Test results: no tests selected". I think that is because of `@requires vm.compiler2.enabled`. But sure, there may be some other flags that mess with the compiler controls. But I think it is important to remove the `@requires vm.flagless`, there are always bugs lurking around with more flag combinations. Plus, we don't have all the hardware that exists out there. That is why it is crucial that we can run with flags `AlignVector` (some ARM machines have it on true) or `UseKNLSetting` (intel), for example. I'll temporarily add the `@requires vm.flagless` back in for `test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14096#discussion_r1223902593 From epeter at openjdk.org Fri Jun 9 06:46:30 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jun 2023 06:46:30 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add vm.flagless back in for LoopArrayIndexComputeTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14096/files - new: https://git.openjdk.org/jdk/pull/14096/files/c554e6c7..06cc1c37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From epeter at openjdk.org Fri Jun 9 10:03:45 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jun 2023 10:03:45 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) Message-ID: Context: `Float.floatToFloat16` -> `vcvtps2ph`. **Problem** vcvtps2ph pre=Assembler::VEX_SIMD_66 opc=Assembler::VEX_OPCODE_0F_3A VEX.128.66.0F3A requires F16C https://www.felixcloutier.com/x86/vcvtps2ph So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. **Suggested Solution** As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. **Testing** I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). Running: tier1-6 + stress testing. ------------- Commit messages: - Fix by @sviswa7 - 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) Changes: https://git.openjdk.org/jdk/pull/14379/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14379&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309660 Stats: 36 lines in 2 files changed: 32 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14379.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14379/head:pull/14379 PR: https://git.openjdk.org/jdk/pull/14379 From sviswanathan at openjdk.org Fri Jun 9 10:03:46 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 Jun 2023 10:03:46 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. vcvtps2ph is correct. The problem is that the register from XMM16-XMM31 is being passed for KNL. The higher bank register is and AVX512 feature. The fix would be: --- a/src/hotspot/cpu/x86/x86.ad +++ b/src/hotspot/cpu/x86/x86.ad @@ -3638,7 +3638,7 @@ instruct sqrtD_reg(regD dst) %{ ins_pipe(pipe_slow); %} -instruct convF2HF_reg_reg(rRegI dst, regF src, regF tmp) %{ +instruct convF2HF_reg_reg(rRegI dst, vlRegF src, vlRegF tmp) %{ effect(TEMP tmp); match(Set dst (ConvF2HF src)); ins_cost(125); @@ -3682,7 +3682,7 @@ instruct vconvF2HF_mem_reg(memory mem, vec src) %{ ins_pipe( pipe_slow ); %} -instruct convHF2F_reg_reg(regF dst, rRegI src) %{ +instruct convHF2F_reg_reg(vlRegF dst, rRegI src) %{ match(Set dst (ConvHF2F src)); format %{ "vcvtph2ps $dst,$src" %} ins_encode %{ ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1583715351 From epeter at openjdk.org Fri Jun 9 10:03:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 Jun 2023 10:03:47 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 00:59:21 GMT, Sandhya Viswanathan wrote: >> Context: `Float.floatToFloat16` -> `vcvtps2ph`. >> >> **Problem** >> >> vcvtps2ph >> pre=Assembler::VEX_SIMD_66 >> opc=Assembler::VEX_OPCODE_0F_3A >> VEX.128.66.0F3A >> requires F16C >> >> https://www.felixcloutier.com/x86/vcvtps2ph >> >> So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. >> >> There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. >> >> So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. >> >> **Suggested Solution** >> As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. >> >> **Testing** >> I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). >> >> Running: tier1-6 + stress testing. > > vcvtps2ph is correct. The problem is that the register from XMM16-XMM31 is being passed for KNL. The higher bank register is and AVX512 feature. The fix would be: > --- a/src/hotspot/cpu/x86/x86.ad > +++ b/src/hotspot/cpu/x86/x86.ad > @@ -3638,7 +3638,7 @@ instruct sqrtD_reg(regD dst) %{ > ins_pipe(pipe_slow); > %} > > -instruct convF2HF_reg_reg(rRegI dst, regF src, regF tmp) %{ > +instruct convF2HF_reg_reg(rRegI dst, vlRegF src, vlRegF tmp) %{ > effect(TEMP tmp); > match(Set dst (ConvF2HF src)); > ins_cost(125); > @@ -3682,7 +3682,7 @@ instruct vconvF2HF_mem_reg(memory mem, vec src) %{ > ins_pipe( pipe_slow ); > %} > > -instruct convHF2F_reg_reg(regF dst, rRegI src) %{ > +instruct convHF2F_reg_reg(vlRegF dst, rRegI src) %{ > match(Set dst (ConvHF2F src)); > format %{ "vcvtph2ps $dst,$src" %} > ins_encode %{ @sviswa7 Thanks for the quick response! Yes, I was finally able to test it with `sde`, and got an error: $ /oracle-work/sde-external-9.21.1-2023-04-24-lin/sde -knl -- ./java -XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting -XX:CompileCommand=compileonly,Test2::test -XX:CompileCommand=printcompilation,Test2::* -XX:-TieredCompilation -Xbatch Test2.java CompileCommand: compileonly Test2.test bool compileonly = true CompileCommand: PrintCompilation Test2.* bool PrintCompilation = true 57913 82 % b Test2::test @ 2 (30 bytes) 61114 83 b Test2::test (30 bytes) TID 1 SDE-ERROR: Executed instruction not valid for specified chip (KNL): 0x7fa0c11a5d75: vcvtps2ph xmm0, xmm16, 0x4 Instruction bytes are: 62 e3 7d 08 1d c0 04 Looks like I would have been indeed using `xmm16` which is not allowed on KNL. Ok, I think I also understand why your fix works: `vlRegF` is defined with `constraint(ALLOC_IN_RC(float_reg_vl))`, and `float_reg_vl` is defined as reg_class_dynamic float_reg_vl(float_reg_evex, float_reg_legacy, %{ VM_Version::supports_evex() && VM_Version::supports_avx512vl() %} ); `reg_class_dynamic` evaluates the condition (checks if we have `evex` and `avx512vl`), and if we have it it picks the larger `float_reg_evex` (XMM0-XMM31), else it picks `float_reg_legacy` (XMM0-XMM15). @sviswa7 thanks for the patch! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1584103090 From qamai at openjdk.org Fri Jun 9 10:15:42 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 9 Jun 2023 10:15:42 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. You also need to do the same for `convF2HF_mem_reg` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1584330880 From jsjolen at openjdk.org Fri Jun 9 11:57:43 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 9 Jun 2023 11:57:43 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 10:17:46 GMT, Johan Sj?len wrote: > Hi, > > Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. > > Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. > > I'm currently running tier1-3 tests. > > Thanks for considering this, > Johan Passes the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14391#issuecomment-1584458476 From tholenstein at openjdk.org Fri Jun 9 12:55:47 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Jun 2023 12:55:47 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v3] In-Reply-To: <1iF_2pFExGpBX1dxqyM6TiQecD8o1qSJWeIv4HVG0vE=.930245c2-8033-402e-a0e8-0a7e3ffaff6c@github.com> References: <1iF_2pFExGpBX1dxqyM6TiQecD8o1qSJWeIv4HVG0vE=.930245c2-8033-402e-a0e8-0a7e3ffaff6c@github.com> Message-ID: <6e9OLFYSVDsASSN0k8mSceLxSQf5FuchlB45eokbscI=.75ea9cb1-2891-4025-9101-131e80034bdf@github.com> On Wed, 17 May 2023 16:52:17 GMT, Vladimir Kozlov wrote: >> Tobias Holenstein has updated the pull request incrementally with 11 additional commits since the last revision: >> >> - Update TestStableUShort.java >> - Update TestStableUByte.java >> - Update TestStableShort.java >> - Update TestStableObject.java >> - Update TestStableLong.java >> - Update TestStableInt.java >> - Update TestStableFloat.java >> - Update TestStableDouble.java >> - Update TestStableChar.java >> - Update TestStableByte.java >> - ... and 1 more: https://git.openjdk.org/jdk/compare/40b17296...e320a9de > > Thank you for fixing this finally! > > FTR. We planned to do this for long time. Main motivations: unify syntax and catch invalid commands. Thanks for the reviews @vnkozlov , @chhagedorn and @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/13802#issuecomment-1584530237 From tholenstein at openjdk.org Fri Jun 9 13:28:22 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Jun 2023 13:28:22 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v4] In-Reply-To: References: Message-ID: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as ... Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into JDK-8027711 - Update TestStableUShort.java - Update TestStableUByte.java - Update TestStableShort.java - Update TestStableObject.java - Update TestStableLong.java - Update TestStableInt.java - Update TestStableFloat.java - Update TestStableDouble.java - Update TestStableChar.java - ... and 8 more: https://git.openjdk.org/jdk/compare/c0527561...c9ae4991 ------------- Changes: https://git.openjdk.org/jdk/pull/13802/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=03 Stats: 374 lines in 71 files changed: 29 ins; 69 del; 276 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From chagedorn at openjdk.org Fri Jun 9 13:28:23 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 9 Jun 2023 13:28:23 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v4] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 13:24:20 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into JDK-8027711 > - Update TestStableUShort.java > - Update TestStableUByte.java > - Update TestStableShort.java > - Update TestStableObject.java > - Update TestStableLong.java > - Update TestStableInt.java > - Update TestStableFloat.java > - Update TestStableDouble.java > - Update TestStableChar.java > - ... and 8 more: https://git.openjdk.org/jdk/compare/c0527561...c9ae4991 Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1472058579 From tholenstein at openjdk.org Fri Jun 9 13:44:20 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 9 Jun 2023 13:44:20 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v5] In-Reply-To: References: Message-ID: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update compilerOracle.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13802/files - new: https://git.openjdk.org/jdk/pull/13802/files/c9ae4991..c5a7f608 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From chagedorn at openjdk.org Fri Jun 9 14:03:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 9 Jun 2023 14:03:44 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v5] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 13:44:20 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update compilerOracle.cpp Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1472172737 From kvn at openjdk.org Fri Jun 9 16:02:47 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 Jun 2023 16:02:47 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v5] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 13:44:20 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update compilerOracle.cpp GHA testing failed. Please look. CompileOnly: An error occurred during parsing Error: Could not parse method pattern Line: 'TestZeroTripGuardShared' ------------- PR Comment: https://git.openjdk.org/jdk/pull/13802#issuecomment-1584812719 From cslucas at openjdk.org Fri Jun 9 17:19:52 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 9 Jun 2023 17:19:52 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 23 May 2023 17:19:23 GMT, Vladimir Ivanov wrote: >>> I verified that the new test cases do trigger SR+NSR scenario. >>> >>> How do you test that deoptimization works as expected? >>> >> >> I have a copy of the tests in AllocationMergesTests.java in a separate file (not included in this PR) and I run the tests with a tool that compares the output of the test with RAM enabled and disabled. So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. >> >>> Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. >>> >> >> I'll take care of that. I was testing only with PrintDebugInfo. >> >>> FTR `_skip_rematerialization` flag is unused now. >>> >> >> yeah, I forgot to remove that. Thanks. >> >>> Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. >>> >> >> Sounds like a good idea. I'll do that. Thanks. >> >>> Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? >> >> I don't think so. This current patch only handle Phis that don't have NULL as input. As part of the reduction process we set at least one of the reducible Phi inputs to NULL. Therefore, subsequent iterations of EA won't reduce the same Phi. > >> So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. > > Please, enhance `AllocationMergesTests` to cover deoptimization (e.g., using WhiteBox API or additional run w/ -XX:+DeoptimizeALot) and ensure that tests are sensitive enough to fail when wrong state is rematerialized. @iwanowww - I want to clarify expectations with my colleagues so I have to ask you how much left are there for you to review and whether there is some part of this PR that you're worried about in terms of correctness/performance/etc? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1584907343 From vlivanov at openjdk.org Fri Jun 9 17:25:55 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 9 Jun 2023 17:25:55 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v17] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 6 Jun 2023 23:14:14 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Rome minor refactorings. Overall, I like how this patch shapes. I need to go through share/opto changes (so far, I did only a shallow review of that part), but the rest looks good. I plan to submit functional and performance testing over the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1584913223 From cslucas at openjdk.org Fri Jun 9 17:35:55 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 9 Jun 2023 17:35:55 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v17] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 6 Jun 2023 23:14:14 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Rome minor refactorings. Thank you for letting me know! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1584925728 From qamai at openjdk.org Fri Jun 9 18:15:19 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 9 Jun 2023 18:15:19 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - fix call node, address reviews - Merge branch 'master' into getandadd - flag to use imm16 - Merge branch 'master' into getandadd - fix tests - fix missing xadds_reg_no_res - Merge branch 'master' into getandadd - should not ignore blackhole - improve GetAndAdd ------------- Changes: https://git.openjdk.org/jdk/pull/14061/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14061&range=01 Stats: 279 lines in 6 files changed: 253 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/14061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14061/head:pull/14061 PR: https://git.openjdk.org/jdk/pull/14061 From sviswanathan at openjdk.org Fri Jun 9 23:03:08 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 Jun 2023 23:03:08 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: <0lQJvljjXjPCoK8TAVG2wNevqMuErq_tBTsDct7jvuI=.157e6338-4203-4857-9d51-30a6f0ab5083@github.com> References: <0lQJvljjXjPCoK8TAVG2wNevqMuErq_tBTsDct7jvuI=.157e6338-4203-4857-9d51-30a6f0ab5083@github.com> Message-ID: On Tue, 6 Jun 2023 18:06:11 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests; need vlbwdq for vpbroadcastq > > @TobiHartmann @vnkozlov Please advise if we could go ahead and integrate this PR from Scott. > @sviswa7 Thanks for the notification. I'll run this through our testing and report back. @TobiHartmann Thanks a lot. Please do let us know as and when testing is complete or if you see any issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1585163643 From qamai at openjdk.org Sat Jun 10 01:28:25 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 Jun 2023 01:28:25 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: wrong operand ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14061/files - new: https://git.openjdk.org/jdk/pull/14061/files/9b5f3814..c240b733 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14061&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14061&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14061/head:pull/14061 PR: https://git.openjdk.org/jdk/pull/14061 From qamai at openjdk.org Sat Jun 10 01:30:09 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 Jun 2023 01:30:09 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: - missing java_negate - Merge branch 'master' into unsignedDiv - whitespace - move asserts to use sites - windows complaints - compiler complaints - undefined internal linkage - add tests, special casing large shift - draft - Merge branch 'master' into unsignedDiv - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 ------------- Changes: https://git.openjdk.org/jdk/pull/9947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=15 Stats: 2163 lines in 13 files changed: 1750 ins; 303 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sat Jun 10 01:30:09 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 Jun 2023 01:30:09 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 13:35:46 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > whitespace May I have a second review for this patch, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1585399702 From qamai at openjdk.org Sat Jun 10 04:24:52 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 Jun 2023 04:24:52 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v3] In-Reply-To: References: Message-ID: <-HOjPqbosa7jYh-E_WeqRrc2MKvYnmMlp1N4LDJXIHU=.864ea5d9-a155-4c69-94a5-ada13f4aeb02@github.com> On Sat, 20 May 2023 00:27:08 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> wrong operand > > Implementation looks good to me. I have few comments about test. > You need second review. @vnkozlov I have addressed the reviews, the test is moved to `irTest`. @TobiHartmann I have fixed the error, it is because a call node may return a membar on `bottom_type()`, I excluded them in the check. Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14061#issuecomment-1585474467 From kvn at openjdk.org Sat Jun 10 06:38:42 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 10 Jun 2023 06:38:42 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v3] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 01:28:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > wrong operand Update looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14061#pullrequestreview-1473329271 From tholenstein at openjdk.org Sat Jun 10 12:25:57 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Sat, 10 Jun 2023 12:25:57 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v6] In-Reply-To: References: Message-ID: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: update new Tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13802/files - new: https://git.openjdk.org/jdk/pull/13802/files/c5a7f608..799b10ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13802&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13802/head:pull/13802 PR: https://git.openjdk.org/jdk/pull/13802 From thartmann at openjdk.org Mon Jun 12 06:03:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jun 2023 06:03:58 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 23:48:21 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests; need vlbwdq for vpbroadcastq All testing passed. Sorry for the delay, I was out for a few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1586637333 From chagedorn at openjdk.org Mon Jun 12 06:42:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 06:42:53 GMT Subject: RFR: 8309814: [IR Framework] Dump socket output string in which IR encoding was not found Message-ID: To better understand the low-frequency failure reported in [JDK-8309689](https://bugs.openjdk.org/browse/JDK-8309689), I suggest to additionally dump the socket output string, in which the IR encoding was not found, to better analyze the problem. Thanks, Christian ------------- Commit messages: - 8309814: [IR Framework] Dump socket output string in which IR encoding was not found Changes: https://git.openjdk.org/jdk/pull/14410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309814 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14410/head:pull/14410 PR: https://git.openjdk.org/jdk/pull/14410 From chagedorn at openjdk.org Mon Jun 12 06:45:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 06:45:02 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v6] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 12:25:57 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > update new Tests Update and testing results look good now! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1474141188 From thartmann at openjdk.org Mon Jun 12 06:45:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jun 2023 06:45:03 GMT Subject: RFR: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly [v6] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 12:25:57 GMT, Tobias Holenstein wrote: >> At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. >> >> ### Old CompileOnly format >> - matching a **method name** with **class name** and **package name**: >> `-XX:CompileOnly=package/path/Class.method` >> `-XX:CompileOnly=package/path/Class::method` >> `-XX:CompileOnly=package.path.Class::method` >> BUT NOT `-XX:CompileOnly=package.path.Class.method` >> >> - just matching a **single method name**: >> `-XX:CompileOnly=.hashCode` >> `-XX:CompileOnly=::hashCode` >> BUT NOT `-XX:CompileOnly=hashCode` >> >> - Matching **all method names** in a **class name** with **package name** >> `-XX:CompileOnly=java/lang/String` >> BUT NOT `-XX:CompileOnly=java/lang/String.` >> BUT NOT `-XX:CompileOnly=java.lang.String` >> BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) >> BUT NOT `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - Matching **all method names** in a **class name** with **NO package name** >> `-XX:CompileOnly=String` >> BUT NOT `-XX:CompileOnly=String.` >> BUT NOT `-XX:CompileOnly=String::` >> >> - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored >> e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command >> >> ### CompileCommand=compileonly format >> `CompileCommand` allows two different forms for paths: >> - `package/path/Class.method` >> - `package.path.Class::method` >> >> In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. >> >> Valid forms: >> `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` >> `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` >> `-XX:CompileCommand=compileonly,java.lang.String::*` >> `-XX:CompileCommand=compileonly,*::hashCode` >> `-XX:CompileCommand=compileonly,*ng.String::hashC*` >> `-XX:CompileCommand=compileonly,*String::hash*` >> >> Invalid forms (Error: Embedded * not allowed): >> `-XX:CompileCommand=compileonly,java.*.String::has*Code` >> >> ### Use CompileCommand syntax for CompileOnly >> At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. >> >> `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. >> >> W... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > update new Tests Still looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13802#pullrequestreview-1474142986 From epeter at openjdk.org Mon Jun 12 06:52:41 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jun 2023 06:52:41 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 06:46:30 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Add vm.flagless back in for LoopArrayIndexComputeTest.java @fg1417 @pfustc I think I have addressed your concerns. Can you please re-review ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1586689149 From chagedorn at openjdk.org Mon Jun 12 07:00:50 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 07:00:50 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Fri, 26 May 2023 13:45:23 GMT, Emanuel Peter wrote: > I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. > > I added the code above the assert, the comments explain why: > > https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 > > Here the graph just before the assert: > ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) > > `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` > `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. > `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. > > Testing up to tier6 and stress testing. Passed. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14178#pullrequestreview-1474162981 From duke at openjdk.org Mon Jun 12 07:18:28 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 12 Jun 2023 07:18:28 GMT Subject: RFR: 8304403: Remove unused methods in RangeCheckElimination::Bound Message-ID: Removed 3 unused methods in RangeCheckElimination: RangeCheckEliminator::Bound::set_lower RangeCheckEliminator::Bound::set_upper RangeCheckEliminator::Bound::add_constant Testing passed after removal ------------- Commit messages: - 8304403: removed unused functions in RangeCheckElimination Changes: https://git.openjdk.org/jdk/pull/14328/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14328&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304403 Stats: 23 lines in 2 files changed: 0 ins; 23 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14328/head:pull/14328 PR: https://git.openjdk.org/jdk/pull/14328 From chagedorn at openjdk.org Mon Jun 12 07:31:51 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 07:31:51 GMT Subject: RFR: 8304403: Remove unused methods in RangeCheckElimination::Bound In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 09:17:45 GMT, Eric Nothum wrote: > Removed 3 unused methods in RangeCheckElimination: > > RangeCheckEliminator::Bound::set_lower > RangeCheckEliminator::Bound::set_upper > RangeCheckEliminator::Bound::add_constant > > Testing passed after removal Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14328#pullrequestreview-1474219103 From epeter at openjdk.org Mon Jun 12 07:32:40 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jun 2023 07:32:40 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. @merykitty I'm not sure if you deleted your comment, but you basically asked me to check if there are any other cases that need this fix. I think this one is ok, because it already explicitly requires `avx512vl`. instruct convF2HF_mem_reg(memory mem, regF src, kReg ktmp, rRegI rtmp) %{ predicate((UseAVX > 2) && VM_Version::supports_avx512vl()); I'm not sure about all the vector cases, like this `instruct vconvHF2F(vec dst, vec src)`. Can someone tell me where to find the definition of register class `vec`, cannot seem to find it in `x86.ad`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1586746188 From thartmann at openjdk.org Mon Jun 12 07:38:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jun 2023 07:38:47 GMT Subject: RFR: 8304403: Remove unused methods in RangeCheckElimination::Bound In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 09:17:45 GMT, Eric Nothum wrote: > Removed 3 unused methods in RangeCheckElimination: > > RangeCheckEliminator::Bound::set_lower > RangeCheckEliminator::Bound::set_upper > RangeCheckEliminator::Bound::add_constant > > Testing passed after removal Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14328#pullrequestreview-1474232337 From tholenstein at openjdk.org Mon Jun 12 07:42:59 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 12 Jun 2023 07:42:59 GMT Subject: Integrated: JDK-8027711: Unify wildcarding syntax for CompileCommand and CompileOnly In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:36:22 GMT, Tobias Holenstein wrote: > At the moment `CompileCommand` and `CompileOnly` use different syntax for matching methods. > > ### Old CompileOnly format > - matching a **method name** with **class name** and **package name**: > `-XX:CompileOnly=package/path/Class.method` > `-XX:CompileOnly=package/path/Class::method` > `-XX:CompileOnly=package.path.Class::method` > BUT NOT `-XX:CompileOnly=package.path.Class.method` > > - just matching a **single method name**: > `-XX:CompileOnly=.hashCode` > `-XX:CompileOnly=::hashCode` > BUT NOT `-XX:CompileOnly=hashCode` > > - Matching **all method names** in a **class name** with **package name** > `-XX:CompileOnly=java/lang/String` > BUT NOT `-XX:CompileOnly=java/lang/String.` > BUT NOT `-XX:CompileOnly=java.lang.String` > BUT NOT `-XX:CompileOnly=java.lang.String::` (This is actually a bug) > BUT NOT `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - Matching **all method names** in a **class name** with **NO package name** > `-XX:CompileOnly=String` > BUT NOT `-XX:CompileOnly=String.` > BUT NOT `-XX:CompileOnly=String::` > > - There is a bug when `CompileOnly` ends with `::` where the `CompileOnly` is just ignored > e.g. `-XX:CompileOnly=String::` compiles as many methods as when omitting the `-XX:CompileOnly=` command > > ### CompileCommand=compileonly format > `CompileCommand` allows two different forms for paths: > - `package/path/Class.method` > - `package.path.Class::method` > > In contrary to `CompileOnly` `CompileCommand` supports wildcard matching using `*`. `*` can appear at the beginning and/or end of a `package.path.Class` and `method` name. > > Valid forms: > `-XX:CompileCommand=compileonly,*.lang.*::*shCo*` > `-XX:CompileCommand=compileonly,*/lang/*.*shCo*` > `-XX:CompileCommand=compileonly,java.lang.String::*` > `-XX:CompileCommand=compileonly,*::hashCode` > `-XX:CompileCommand=compileonly,*ng.String::hashC*` > `-XX:CompileCommand=compileonly,*String::hash*` > > Invalid forms (Error: Embedded * not allowed): > `-XX:CompileCommand=compileonly,java.*.String::has*Code` > > ### Use CompileCommand syntax for CompileOnly > At the moment, in some cases it is not possible to just take pattern used with `CompileOnly` and plug it into compile command file. Syntax used by CompileOnly is also not very intuitive. > > `CompileOnly` is convenient because it's shorter to write and takes lists of patterns, whereas `CompileCommand` only takes one pattern per command. > > With this PR `CompileOnly` becomes an alias for `CompileCommand=compileonly` with possibility to take lists as ... This pull request has now been integrated. Changeset: f5cbe53f Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/f5cbe53fdd736f54a140e9db2499a7439f8752c4 Stats: 376 lines in 73 files changed: 29 ins; 69 del; 278 mod 8027711: Unify wildcarding syntax for CompileCommand and CompileOnly Reviewed-by: kvn, thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/13802 From epeter at openjdk.org Mon Jun 12 07:43:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jun 2023 07:43:52 GMT Subject: RFR: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 15:48:02 GMT, Vladimir Kozlov wrote: >> I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. >> >> I added the code above the assert, the comments explain why: >> >> https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 >> >> Here the graph just before the assert: >> ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) >> >> `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` >> `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. >> `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. >> >> Testing up to tier6 and stress testing. Passed. > > Marked as reviewed by kvn (Reviewer). Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14178#issuecomment-1586759492 From epeter at openjdk.org Mon Jun 12 07:43:54 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jun 2023 07:43:54 GMT Subject: Integrated: 8308749: C2 failed: regular loops only (counted loop inside infinite loop) In-Reply-To: References: Message-ID: On Fri, 26 May 2023 13:45:23 GMT, Emanuel Peter wrote: > I found this failure with my jasm fuzzer. Have not tried to reproduce it with plain java. > > I added the code above the assert, the comments explain why: > > https://github.com/openjdk/jdk/blob/1fd6699e44f8caea9a1c7a8b1e946b2d1ebc0f82/src/hotspot/share/opto/loopnode.cpp#L1749-L1763 > > Here the graph just before the assert: > ![image](https://github.com/openjdk/jdk/assets/32593061/7f11875d-49fb-49df-a3d8-6d4102711b01) > > `120 Loop` -> need it to kick of `beautify_loop` and a second `build_loop_tree` > `71 Region` -> inifinite loop, `NeverBranch` is inserted on first `build_loop_tree` pass. Only attached to loop tree after second `build_loop_tree`. > `x = 81 Region` -> looks like a counted loop, but is only attached to the loop tree after `beautify_loop` in the second `build_loop_tree`. > > Testing up to tier6 and stress testing. Passed. This pull request has now been integrated. Changeset: 6c3e621f Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6c3e621f9818fe764501e1a72c7cf8e9803da683 Stats: 114 lines in 3 files changed: 114 ins; 0 del; 0 mod 8308749: C2 failed: regular loops only (counted loop inside infinite loop) Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14178 From dholmes at openjdk.org Mon Jun 12 08:10:51 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 12 Jun 2023 08:10:51 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v2] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 10:24:08 GMT, Martin Doerr wrote: >> We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. >> >> Testing: >> >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index f179d3ba88d..c35a1ac595e 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { >> make_dtrace_method_entry(method()); >> } >> >> + if (UseNewCode) { >> + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); >> + C->root()->add_req(halt); >> + set_control(halt); >> + } >> + >> #ifdef ASSERT >> // Narrow receiver type when it is too broad for the method being parsed. >> if (!method()->is_static()) { >> >> >> "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): >> >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) >> V [jvm.dll+0x8a3afa] VMError::report+0xd6a (vmError.cpp:973) >> V [jvm.dll+0x8a5cde] VMError::report_and_die+0x5fe (vmError.cpp:1765) >> V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) >> V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) >> C 0x000001635fe021f4 >> >> >> called by the following code: >> Compiled method (c2) 87 16 4 java.lang.Object:: (1 bytes) >> total in heap [0x000001635fe02010,0x000001635fe02250] = 576 >> relocation [0x000001635fe02170,0x000001635fe02188] = 24 >> main code [0x000001635fe021a0,0x000001635fe02200] = 96 >> stub code [0x000001635fe02200,0x000001635fe02218] = 24 >> metadata [0x000001635fe02218,0x000001635fe02220] = 8 >> scopes data [0x000001635fe02220,0x000001635fe02228] = 8 >> scopes pcs [0x000001635fe02228,0x000001635fe02248] = 32 >> dependencies [0x000001635fe02248,0x000001635fe02250] = 8 >> >> [Constant Pool (empty)] >> >> [MachCode] >> [Entry Point] >> # {method} {0x0000000800478d78} '' '()V' in 'java/lang/Object' >> # [sp+0x20] (sp of caller) >> 0x000001635fe021a0: 448b 5208 | 49bb 0000 | 0000 0800 | 0000 4d03 | d349 3bc2 >> >> 0x000001635fe021b4: ; {runtime_call ic_miss_stub} >> 0x000001635fe021b4: 0f85 c6c4 | 8fff 6690 | 0f1f 4000 >> [Verified Ent... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Check result of print_code and update printed_len. C 0x000001635fe021f4 called by the following code: So does everything after `called by the following code` relate to the frame `C 0x000001635fe021f4`? If so I'd like to see this delineated more clearly as presently it would appear very disruptive when reading the initial sections of the hs_err file. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/14358#issuecomment-1586803723 From duke at openjdk.org Mon Jun 12 08:35:00 2023 From: duke at openjdk.org (Harry Dinh) Date: Mon, 12 Jun 2023 08:35:00 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v4] In-Reply-To: References: Message-ID: On Thu, 1 Jun 2023 08:04:20 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java > > Co-authored-by: Tobias Hartmann May I know if the fix applied to jdk11 as well? We got the same issue with jdk11.0.19. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1586591007 From chagedorn at openjdk.org Mon Jun 12 09:05:00 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 09:05:00 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v4] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 05:12:01 GMT, Harry Dinh wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/loopPredicate.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java >> >> Co-authored-by: Tobias Hartmann > > May I know if the fix applied to jdk11 as well? > > We got the same issue with jdk11.0.19. @hungk20 The negation of the condition for range checks was introduced in [JDK-7173584](https://bugs.openjdk.org/browse/JDK-7173584) which went into JDK 9. Therefore, JDK 11u is also affected. You could either backport this fix or backout [JDK-8297951](https://bugs.openjdk.org/browse/JDK-8297951) (see [JDK-8308884](https://bugs.openjdk.org/browse/JDK-8308884)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1586896163 From epeter at openjdk.org Mon Jun 12 09:25:51 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 Jun 2023 09:25:51 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. @merykitty @sviswa7 @fg1417 Is there a way to stress-test the registers? It seems this bug only triggered because we had a moderately large unrolling factor, and then did not vectorize, leaving lots of instructions with probably a higher register pressure. Would be nice to have some sort of testing where we generate more (all?) of the possible register combinations. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1586942377 From rcastanedalo at openjdk.org Mon Jun 12 09:29:48 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 Jun 2023 09:29:48 GMT Subject: RFR: 8309814: [IR Framework] Dump socket output string in which IR encoding was not found In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 06:36:39 GMT, Christian Hagedorn wrote: > To better understand and analyze the low-frequency failure reported in [JDK-8309689](https://bugs.openjdk.org/browse/JDK-8309689), I suggest to additionally dump the socket output string, in which the IR encoding was not found. > > Thanks, > Christian Looks good, and trivial! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14410#pullrequestreview-1474463308 From sgehwolf at openjdk.org Mon Jun 12 09:36:00 2023 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Mon, 12 Jun 2023 09:36:00 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v4] In-Reply-To: References: Message-ID: <2Rp81YlczVSBHeMb77B_m5NkAYQEAC3siXnfRbf_xxc=.d1ea8488-2ed6-4301-8c4e-a02a37118c7b@github.com> On Thu, 1 Jun 2023 08:04:20 GMT, Christian Hagedorn wrote: >> [JDK-4809552](https://bugs.openjdk.org/browse/JDK-4809552) allowed Loop Predication to be applied to `IfNodes` that have a positive value instead of a `LoadRangeNode`: >> https://github.com/openjdk/jdk/blob/48d21bd089a3f344ee5407926f8ed2af3734d2b0/src/hotspot/share/opto/loopPredicate.cpp#L854-L862 >> >> This, however, is only correct if we have an actual `RangeCheckNode` for an array. The reason for that is that if we hoist a real range check and create a Hoisted Predicate for it, we only need to check the lower and upper bound of all array accesses (i.e. the array access of the first and the last loop iteration). All array accesses in between are implicitly covered and do not need to be checked again. >> >> But if we face an `IfNode` without a `LoadRangeNode`, we could be comparing anything. We do not have any guarantee that if the first and last loop iteration check succeed that the other loop iteration checks will also succeed. An example of this is shown in the test case `test()`. We wrongly create a Hoisted Range Check Predicate where the lower and upper bound are always true, but for some values of the loop induction variable, the hoisted check would actually fail. We then crash because an added Assertion Predicate exactly performs this failing check (crash with halt). Without any loop splitting (i.e. no Assertion Predicates), we have a wrong execution due to never executing the branch where we increment `iFld2` because we removed it together with the check. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/predicates/TestHoistedPredicateForNonRangeCheck.java > > Co-authored-by: Tobias Hartmann OpenJDK 11.0.20 in July will have [JDK-8297951](https://bugs.openjdk.org/browse/JDK-8297951) backed out. We'll revisit the situation for OpenJDK 11.0.21 (October) where the actual fix (i.e. this bug) will likely get in. See https://bugs.openjdk.org/browse/JDK-8309119 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1586958904 From chagedorn at openjdk.org Mon Jun 12 10:48:57 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 10:48:57 GMT Subject: RFR: 8309814: [IR Framework] Dump socket output string in which IR encoding was not found In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 06:36:39 GMT, Christian Hagedorn wrote: > To better understand and analyze the low-frequency failure reported in [JDK-8309689](https://bugs.openjdk.org/browse/JDK-8309689), I suggest to additionally dump the socket output string, in which the IR encoding was not found. > > Thanks, > Christian Thanks Roberto for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14410#issuecomment-1587074055 From chagedorn at openjdk.org Mon Jun 12 10:48:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 Jun 2023 10:48:58 GMT Subject: Integrated: 8309814: [IR Framework] Dump socket output string in which IR encoding was not found In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 06:36:39 GMT, Christian Hagedorn wrote: > To better understand and analyze the low-frequency failure reported in [JDK-8309689](https://bugs.openjdk.org/browse/JDK-8309689), I suggest to additionally dump the socket output string, in which the IR encoding was not found. > > Thanks, > Christian This pull request has now been integrated. Changeset: 4bc6bbb2 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/4bc6bbb23f46e702a89218e06581be559d72c3ee Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8309814: [IR Framework] Dump socket output string in which IR encoding was not found Reviewed-by: rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/14410 From duke at openjdk.org Mon Jun 12 11:41:03 2023 From: duke at openjdk.org (Harry Dinh) Date: Mon, 12 Jun 2023 11:41:03 GMT Subject: RFR: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition [v4] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 09:01:53 GMT, Christian Hagedorn wrote: >> May I know if the fix applied to jdk11 as well? >> >> We got the same issue with jdk11.0.19. > > @hungk20 The negation of the condition for range checks was introduced in [JDK-7173584](https://bugs.openjdk.org/browse/JDK-7173584) which went into JDK 9. Therefore, JDK 11u is also affected. You could either backport this fix or backout [JDK-8297951](https://bugs.openjdk.org/browse/JDK-8297951) (see [JDK-8308884](https://bugs.openjdk.org/browse/JDK-8308884)). Thanks @chhagedorn @jerboaa, I will keep an eye on the the new versions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14156#issuecomment-1587156137 From thartmann at openjdk.org Mon Jun 12 12:01:50 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 12 Jun 2023 12:01:50 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage In-Reply-To: References: Message-ID: <_1umPM5qSKc-jXzhAkdjJ2f4U6tpQUctYCTOgli36JY=.2746aab1-0e00-4584-a922-6b39a1574850@github.com> On Fri, 9 Jun 2023 10:17:46 GMT, Johan Sj?len wrote: > Hi, > > Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. > > Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. > > I'm currently running tier1-3 tests. > > Thanks for considering this, > Johan Looks reasonable to me. src/hotspot/share/opto/compile.hpp line 812: > 810: void set_unique(uint i) { _unique = i; } > 811: Arena* node_arena() { return _node_arena; } > 812: Arena* old_arena() { return &_node_arena_one == _node_arena ? &_node_arena_two : &_node_arena_one; } Suggestion: Arena* old_arena() { return (&_node_arena_one == _node_arena) ? &_node_arena_two : &_node_arena_one; } src/hotspot/share/opto/matcher.cpp line 338: > 336: > 337: // Swap out to old-space; emptying new-space > 338: Arena *old = C->swap_old_and_new(); Suggestion: Arena* old = C->swap_old_and_new(); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14391#pullrequestreview-1474737149 PR Review Comment: https://git.openjdk.org/jdk/pull/14391#discussion_r1226543328 PR Review Comment: https://git.openjdk.org/jdk/pull/14391#discussion_r1226543705 From mdoerr at openjdk.org Mon Jun 12 14:24:27 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Jun 2023 14:24:27 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: > We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. > > Testing: > > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index f179d3ba88d..c35a1ac595e 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { > make_dtrace_method_entry(method()); > } > > + if (UseNewCode) { > + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); > + C->root()->add_req(halt); > + set_control(halt); > + } > + > #ifdef ASSERT > // Narrow receiver type when it is too broad for the method being parsed. > if (!method()->is_static()) { > > > "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) > V [jvm.dll+0x8a3afa] VMError::report+0xd6a (vmError.cpp:973) > V [jvm.dll+0x8a5cde] VMError::report_and_die+0x5fe (vmError.cpp:1765) > V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) > V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) > C 0x000001635fe021f4 > > > called by the following code: > Compiled method (c2) 87 16 4 java.lang.Object:: (1 bytes) > total in heap [0x000001635fe02010,0x000001635fe02250] = 576 > relocation [0x000001635fe02170,0x000001635fe02188] = 24 > main code [0x000001635fe021a0,0x000001635fe02200] = 96 > stub code [0x000001635fe02200,0x000001635fe02218] = 24 > metadata [0x000001635fe02218,0x000001635fe02220] = 8 > scopes data [0x000001635fe02220,0x000001635fe02228] = 8 > scopes pcs [0x000001635fe02228,0x000001635fe02248] = 32 > dependencies [0x000001635fe02248,0x000001635fe02250] = 8 > > [Constant Pool (empty)] > > [MachCode] > [Entry Point] > # {method} {0x0000000800478d78} '' '()V' in 'java/lang/Object' > # [sp+0x20] (sp of caller) > 0x000001635fe021a0: 448b 5208 | 49bb 0000 | 0000 0800 | 0000 4d03 | d349 3bc2 > > 0x000001635fe021b4: ; {runtime_call ic_miss_stub} > 0x000001635fe021b4: 0f85 c6c4 | 8fff 6690 | 0f1f 4000 > [Verified Entry Point] > 0x000001635fe021c0: 4881 ec18 | 0000 0048 | 896c 2410 | 4181 7f20 | 0100 0000 | 0f85 1b00 > > 0x0000... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move parts to step which prints code blobs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14358/files - new: https://git.openjdk.org/jdk/pull/14358/files/3bbd2a04..56a3f4c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14358&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14358&range=01-02 Stats: 44 lines in 1 file changed: 32 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14358.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14358/head:pull/14358 PR: https://git.openjdk.org/jdk/pull/14358 From sgibbons at openjdk.org Mon Jun 12 14:33:04 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 12 Jun 2023 14:33:04 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 23:48:21 GMT, Scott Gibbons wrote: >> Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. >> >> Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). >> >> Old: >> gcc-12.2.1-4.fc36.x86_64 >> 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix >> JVM version: 21-internal >> Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 >> Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 >> Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 >> Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 >> Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 >> Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 >> Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 >> Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 >> Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 >> New: >> JVM version: 21-internal (float) >> Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 >> Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 >> Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 >> Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 >> Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 >> Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 >> Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests; need vlbwdq for vpbroadcastq Thanks, @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1587464645 From mdoerr at openjdk.org Mon Jun 12 14:55:51 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 Jun 2023 14:55:51 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 14:24:27 GMT, Martin Doerr wrote: >> We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. >> >> Testing: >> >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index f179d3ba88d..c35a1ac595e 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { >> make_dtrace_method_entry(method()); >> } >> >> + if (UseNewCode) { >> + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); >> + C->root()->add_req(halt); >> + set_control(halt); >> + } >> + >> #ifdef ASSERT >> // Narrow receiver type when it is too broad for the method being parsed. >> if (!method()->is_static()) { >> >> >> "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): >> >> --------------- T H R E A D --------------- >> >> Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] >> >> Stack: [0x000000cdacc00000,0x000000cdacd00000] >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) >> V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) >> V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) >> V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) >> V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) >> C 0x0000024dc1553cf4 >> >> the last pc belongs to nmethod (will be printed below) >> >> Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) >> total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 >> relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 >> main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 >> stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 >> metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 >> scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 >> scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 >> dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 >> >> [Constant Pool (empty)] >> >> [MachCode] >> [Entry Point] >> # {method} {0x0000000800478d78} '' '()V' ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move parts to step which prints code blobs Hi David, I've moved the verbose parts to the step "printing code blobs if possible" and updated the example output. In this particular case, the steps between the native stack and my new code don't print anything, so, the output looks very similar. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14358#issuecomment-1587507260 From sviswanathan at openjdk.org Mon Jun 12 15:10:03 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 12 Jun 2023 15:10:03 GMT Subject: RFR: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 [v10] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 06:00:45 GMT, Tobias Hartmann wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix tests; need vlbwdq for vpbroadcastq > > All testing passed. Sorry for the delay, I was out for a few days. Thanks a lot @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14224#issuecomment-1587529475 From sgibbons at openjdk.org Mon Jun 12 15:10:05 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 12 Jun 2023 15:10:05 GMT Subject: Integrated: 8308966 Add intrinsic for float/double modulo for x86 AVX2 and AVX512 In-Reply-To: References: Message-ID: On Tue, 30 May 2023 17:07:01 GMT, Scott Gibbons wrote: > Add an intrinsic for x86 AVX and AVX512 fmod. This addresses both a performance regression and acceleration of the floating point remainder operation (fmod / frem). Also addresses dmod / drem. > > Performance has increased an average of ~4x as indicated by the benchmark included with [JDK-8302191](https://bugs.openjdk.org/browse/JDK-8302191). > > Old: > gcc-12.2.1-4.fc36.x86_64 > 3db352d003c5996a5f86f0f465adf86326f7e1fe openjdk21 + fix > JVM version: 21-internal > Iteration 0 regression case Took : 89 noMod case took: 39 noPower case took: 68 > Iteration 1 regression case Took : 86 noMod case took: 39 noPower case took: 67 > Iteration 2 regression case Took : 41 noMod case took: 39 noPower case took: 70 > Iteration 3 regression case Took : 41 noMod case took: 39 noPower case took: 69 > Iteration 4 regression case Took : 40 noMod case took: 39 noPower case took: 44 > Iteration 5 regression case Took : 47 noMod case took: 39 noPower case took: 40 > Iteration 6 regression case Took : 41 noMod case took: 39 noPower case took: 40 > Iteration 7 regression case Took : 40 noMod case took: 39 noPower case took: 40 > Iteration 8 regression case Took : 41 noMod case took: 38 noPower case took: 41 > Iteration 9 regression case Took : 40 noMod case took: 39 noPower case took: 40 > New: > JVM version: 21-internal (float) > Iteration 0 regression case Took : 24 noMod case took: 11 noPower case took: 42 > Iteration 1 regression case Took : 35 noMod case took: 22 noPower case took: 27 > Iteration 2 regression case Took : 17 noMod case took: 19 noPower case took: 17 > Iteration 3 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 4 regression case Took : 17 noMod case took: 3 noPower case took: 17 > Iteration 5 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 6 regression case Took : 16 noMod case took: 3 noPower case took: 17 > Iteration 7 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 8 regression case Took : 17 noMod case took: 3 noPower case took: 16 > Iteration 9 regression case Took : 17 noMod case took: 3 noPower case took: 17 This pull request has now been integrated. Changeset: 5d5ae352 Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/5d5ae35288989fcfabdea013b6e3cdafe359f0df Stats: 902 lines in 12 files changed: 901 ins; 0 del; 1 mod 8308966: Add intrinsic for float/double modulo for x86 AVX2 and AVX512 Co-authored-by: Marius Cornea Reviewed-by: jbhateja, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/14224 From never at openjdk.org Mon Jun 12 15:52:51 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 12 Jun 2023 15:52:51 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v2] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 8 Jun 2023 15:05:30 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > 32 bit fix src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotMethodData.java line 569: > 567: } > 568: > 569: protected long getTypesNotRecordedExecutionCount(HotSpotMethodData data, int position) { I think you can inline this method into the callers as it only existed to distinguish the count and nonprofiled_count ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1226878949 From never at openjdk.org Mon Jun 12 15:55:57 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 12 Jun 2023 15:55:57 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v2] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 8 Jun 2023 15:05:30 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > 32 bit fix src/hotspot/share/oops/methodData.hpp line 1091: > 1089: protected: > 1090: enum { > 1091: #if INCLUDE_JVMCI I think you should add a comment to ReceiverTypeData explaining when count is incremented ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1226883442 From never at openjdk.org Mon Jun 12 15:59:51 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 12 Jun 2023 15:59:51 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v2] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 8 Jun 2023 15:05:30 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > 32 bit fix The JVMCI, C1 and interpreter changes look ok to me. I didn't review the C2 changes. Yes the method profiling is unused for a long time and I'm planning to delete it when I have time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1587619276 From jwilhelm at openjdk.org Mon Jun 12 19:29:00 2023 From: jwilhelm at openjdk.org (Jesper Wilhelmsson) Date: Mon, 12 Jun 2023 19:29:00 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Thu, 8 Jun 2023 08:26:54 GMT, Daohan Qu wrote: >> @quadhier I hope I did not discourage you, my feedback yesterday was a bit scattered and maybe overwhealming, I'm sorry for that. >> These things are not easy to get right. I was impressed how far you got! >> Let me know if you want to take this back up, or want another task to work on - though a JBS account would help ;) > > @eme64 That's alright! I appreciate your telling me about some disciplines not written in the contributor guides. :P > > After some explorations, I realized that I didn't fully understand the root cause of this bug. Since I use my spare time to contribute, I think it would be better for some other experts to work on this so that we don't have to wait too long. I'm applying for an `Author` role and hope then I could continue contributing. Many thanks for your kindness again! :D @quadhier Please let me know if there is anything you think is missing from the OpenJDK Developers' Guide, I'd be happy to work on any improvements. Since you mention the guide, I assume that you have read https://openjdk.org/guide/index.html#contributing-to-an-openjdk-project and in particular https://openjdk.org/guide/index.html#socialize-your-change ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1587945280 From dholmes at openjdk.org Tue Jun 13 02:34:52 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Jun 2023 02:34:52 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 14:24:27 GMT, Martin Doerr wrote: >> We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. >> >> Testing: >> >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index f179d3ba88d..c35a1ac595e 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { >> make_dtrace_method_entry(method()); >> } >> >> + if (UseNewCode) { >> + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); >> + C->root()->add_req(halt); >> + set_control(halt); >> + } >> + >> #ifdef ASSERT >> // Narrow receiver type when it is too broad for the method being parsed. >> if (!method()->is_static()) { >> >> >> "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): >> >> --------------- T H R E A D --------------- >> >> Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] >> >> Stack: [0x000000cdacc00000,0x000000cdacd00000] >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) >> V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) >> V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) >> V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) >> V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) >> C 0x0000024dc1553cf4 >> >> the last pc belongs to nmethod (will be printed below) >> >> Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) >> total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 >> relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 >> main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 >> stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 >> metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 >> scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 >> scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 >> dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 >> >> [Constant Pool (empty)] >> >> [MachCode] >> [Entry Point] >> # {method} {0x0000000800478d78} '' '()V' ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move parts to step which prints code blobs I guess I never come across hs_err logs with this kind of detail as I was expecting to see something between the "T H R E A D" and "P R O C E S S" sections. A couple of minor nits but otherwise I guess this is okay. Thanks. src/hotspot/share/utilities/vmError.cpp line 998: > 996: const char* name = find_code_name(lastpc); > 997: if (name != nullptr) { > 998: st->print_cr("the last pc belongs to %s (will be printed below)", name); s/the/The s/will be// ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14358#pullrequestreview-1476133013 PR Review Comment: https://git.openjdk.org/jdk/pull/14358#discussion_r1227447171 From duke at openjdk.org Tue Jun 13 03:01:58 2023 From: duke at openjdk.org (Daohan Qu) Date: Tue, 13 Jun 2023 03:01:58 GMT Subject: RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: <8wr7O_NwK2PfD9heZ4bcp8dIc5yNorMGpar5tfsoJAc=.2d3c99a3-e4c5-4780-ac2d-3019951d882f@github.com> Message-ID: On Mon, 12 Jun 2023 19:26:06 GMT, Jesper Wilhelmsson wrote: >> @eme64 That's alright! I appreciate your telling me about some disciplines not written in the contributor guides. :P >> >> After some explorations, I realized that I didn't fully understand the root cause of this bug. Since I use my spare time to contribute, I think it would be better for some other experts to work on this so that we don't have to wait too long. I'm applying for an `Author` role and hope then I could continue contributing. Many thanks for your kindness again! :D > > @quadhier Please let me know if there is anything you think is missing from the OpenJDK Developers' Guide, I'd be happy to work on any improvements. > > Since you mention the guide, I assume that you have read https://openjdk.org/guide/index.html#contributing-to-an-openjdk-project and in particular https://openjdk.org/guide/index.html#socialize-your-change Thanks, @JesperIRL. The guide is good and thoughtful. But it seems that nowadays the paradigm has slightly changed ? developers create issues in JBS, and someone picks up an issue and creates a PR in GitHub. Most of the communications happen in GitHub comments. It would be better to emphasize that before working on an issue, you should make sure that it is assigned to you, or should mention in JBS or mailing list that you are working on it. BTW, maybe for an unassigned issue, we could create a PR directly? Since we can communicate in the GitHub comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14353#issuecomment-1588447159 From duke at openjdk.org Tue Jun 13 04:32:28 2023 From: duke at openjdk.org (Vladimir Petko) Date: Tue, 13 Jun 2023 04:32:28 GMT Subject: RFR: 8309847: FrameForm and RegisterForm constructors should initialize all members Message-ID: This PR fixes missing constructor initialisations in formsopt.cpp. This PR does not implement [CppCoreGuidelines#Rc-in-class-initializer](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-in-class-initializer ) to keep the style consistent and minimise changes. ------------- Commit messages: - 8309847: add missing member initialisations Changes: https://git.openjdk.org/jdk/pull/14435/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14435&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309847 Stats: 11 lines in 1 file changed: 9 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14435.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14435/head:pull/14435 PR: https://git.openjdk.org/jdk/pull/14435 From kvn at openjdk.org Tue Jun 13 04:50:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 Jun 2023 04:50:52 GMT Subject: RFR: 8309847: FrameForm and RegisterForm constructors should initialize all members In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 02:26:01 GMT, Vladimir Petko wrote: > This PR fixes missing constructor initialisations in formsopt.cpp. > This PR does not implement [CppCoreGuidelines#Rc-in-class-initializer](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-in-class-initializer ) to keep the style consistent and minimise changes. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14435#pullrequestreview-1476243324 From stuefe at openjdk.org Tue Jun 13 06:10:56 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 13 Jun 2023 06:10:56 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 14:24:27 GMT, Martin Doerr wrote: >> We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. >> >> Testing: >> >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index f179d3ba88d..c35a1ac595e 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { >> make_dtrace_method_entry(method()); >> } >> >> + if (UseNewCode) { >> + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); >> + C->root()->add_req(halt); >> + set_control(halt); >> + } >> + >> #ifdef ASSERT >> // Narrow receiver type when it is too broad for the method being parsed. >> if (!method()->is_static()) { >> >> >> "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): >> >> --------------- T H R E A D --------------- >> >> Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] >> >> Stack: [0x000000cdacc00000,0x000000cdacd00000] >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) >> V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) >> V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) >> V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) >> V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) >> C 0x0000024dc1553cf4 >> >> the last pc belongs to nmethod (will be printed below) >> >> Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) >> total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 >> relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 >> main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 >> stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 >> metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 >> scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 >> scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 >> dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 >> >> [Constant Pool (empty)] >> >> [MachCode] >> [Entry Point] >> # {method} {0x0000000800478d78} '' '()V' ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move parts to step which prints code blobs Looks good. Small nits only, up to you. src/hotspot/share/runtime/os.hpp line 1011: > 1009: public: > 1010: inline static bool platform_print_native_stack(outputStream* st, const void* context, > 1011: char *buf, int buf_size); Small nit, can we make this not a reference, but a pointer, and make it optional and possibly default it to nullptr? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14358#pullrequestreview-1476309321 PR Review Comment: https://git.openjdk.org/jdk/pull/14358#discussion_r1227569765 From stuefe at openjdk.org Tue Jun 13 06:10:58 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 13 Jun 2023 06:10:58 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 02:29:04 GMT, David Holmes wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move parts to step which prints code blobs > > src/hotspot/share/utilities/vmError.cpp line 998: > >> 996: const char* name = find_code_name(lastpc); >> 997: if (name != nullptr) { >> 998: st->print_cr("the last pc belongs to %s (will be printed below)", name); > > s/the/The > s/will be// Why not print the nmethod right here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14358#discussion_r1227569666 From vkempik at openjdk.org Tue Jun 13 06:12:03 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 13 Jun 2023 06:12:03 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v6] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: add load_short_misaligned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/45498879..898538f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=04-05 Stats: 32 lines in 2 files changed: 26 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From dholmes at openjdk.org Tue Jun 13 08:04:49 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 13 Jun 2023 08:04:49 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 06:02:16 GMT, Thomas Stuefe wrote: >> src/hotspot/share/utilities/vmError.cpp line 998: >> >>> 996: const char* name = find_code_name(lastpc); >>> 997: if (name != nullptr) { >>> 998: st->print_cr("the last pc belongs to %s (will be printed below)", name); >> >> s/the/The >> s/will be// > > Why not print the nmethod right here? See my earlier comments. I was looking for some kind of delineation of all this additional stuff so that it would be easier to see where it fits into things. Not sure we've really achieved that regardless. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14358#discussion_r1227701172 From mdoerr at openjdk.org Tue Jun 13 09:40:33 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Jun 2023 09:40:33 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v4] In-Reply-To: References: Message-ID: > We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. > > Testing: > > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index f179d3ba88d..c35a1ac595e 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { > make_dtrace_method_entry(method()); > } > > + if (UseNewCode) { > + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); > + C->root()->add_req(halt); > + set_control(halt); > + } > + > #ifdef ASSERT > // Narrow receiver type when it is too broad for the method being parsed. > if (!method()->is_static()) { > > > "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): > > --------------- T H R E A D --------------- > > Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] > > Stack: [0x000000cdacc00000,0x000000cdacd00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) > V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) > V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) > V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) > V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) > C 0x0000024dc1553cf4 > > the last pc belongs to nmethod (will be printed below) > > Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) > total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 > relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 > main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 > stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 > metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 > scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 > scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 > dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 > > [Constant Pool (empty)] > > [MachCode] > [Entry Point] > # {method} {0x0000000800478d78} '' '()V' in 'java/lang/Object' > # [sp+0x20] (sp of caller) > 0x0000024dc1553ca0: 448b 5208 | 49bb 0000 | 00... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Improve output string. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14358/files - new: https://git.openjdk.org/jdk/pull/14358/files/56a3f4c6..8fc5edcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14358&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14358&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14358.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14358/head:pull/14358 PR: https://git.openjdk.org/jdk/pull/14358 From mdoerr at openjdk.org Tue Jun 13 09:40:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Jun 2023 09:40:36 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v3] In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 14:24:27 GMT, Martin Doerr wrote: >> We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. >> >> Testing: >> >> diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp >> index f179d3ba88d..c35a1ac595e 100644 >> --- a/src/hotspot/share/opto/parse1.cpp >> +++ b/src/hotspot/share/opto/parse1.cpp >> @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { >> make_dtrace_method_entry(method()); >> } >> >> + if (UseNewCode) { >> + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); >> + C->root()->add_req(halt); >> + set_control(halt); >> + } >> + >> #ifdef ASSERT >> // Narrow receiver type when it is too broad for the method being parsed. >> if (!method()->is_static()) { >> >> >> "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): >> >> --------------- T H R E A D --------------- >> >> Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] >> >> Stack: [0x000000cdacc00000,0x000000cdacd00000] >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) >> V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) >> V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) >> V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) >> V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) >> C 0x0000024dc1553cf4 >> >> the last pc belongs to nmethod (will be printed below) >> >> Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) >> total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 >> relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 >> main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 >> stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 >> metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 >> scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 >> scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 >> dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 >> >> [Constant Pool (empty)] >> >> [MachCode] >> [Entry Point] >> # {method} {0x0000000800478d78} '' '()V' ... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move parts to step which prints code blobs Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14358#issuecomment-1588918517 From mdoerr at openjdk.org Tue Jun 13 09:40:38 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 Jun 2023 09:40:38 GMT Subject: RFR: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error [v4] In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 06:02:24 GMT, Thomas Stuefe wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve output string. > > src/hotspot/share/runtime/os.hpp line 1011: > >> 1009: public: >> 1010: inline static bool platform_print_native_stack(outputStream* st, const void* context, >> 1011: char *buf, int buf_size); > > Small nit, can we make this not a reference, but a pointer, and make it optional and possibly default it to nullptr? Then, we should probably name it `lastpc_ptr`? Would that really be better? I'm not conviced. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14358#discussion_r1227827772 From vkempik at openjdk.org Tue Jun 13 12:57:38 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 13 Jun 2023 12:57:38 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v7] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: fix nits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14320/files - new: https://git.openjdk.org/jdk/pull/14320/files/898538f1..a7672f4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14320&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14320.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14320/head:pull/14320 PR: https://git.openjdk.org/jdk/pull/14320 From gbarany at openjdk.org Tue Jun 13 13:12:09 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Tue, 13 Jun 2023 13:12:09 GMT Subject: RFR: 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind Message-ID: `jdk.vm.ci.amd64.AMD64#getLargestStorableKind(RegisterCategory)` unconditionally returns `AMD64Kind.MASK64` for mask registers. This is only correct if the target supports AVX512BW. On other AVX512 versions this should be `MASK16`. The Graal compiler uses this method to determine how to spill a given register. An incorrect size will lead to compilation errors due to trying to emit a move with a size that is not supported by the target. I have manually verified that this fixes those problems. ------------- Commit messages: - 8309601: [JVMCI] Fix AMD64.getLargestStorableKind for AVX512 masks Changes: https://git.openjdk.org/jdk/pull/14441/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14441&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309601 Stats: 11 lines in 1 file changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14441.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14441/head:pull/14441 PR: https://git.openjdk.org/jdk/pull/14441 From thartmann at openjdk.org Tue Jun 13 13:13:44 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jun 2023 13:13:44 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v3] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 01:28:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > wrong operand Looks good to me. All tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14061#pullrequestreview-1477139275 From thartmann at openjdk.org Tue Jun 13 13:22:28 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jun 2023 13:22:28 GMT Subject: RFR: 8309854: ciReplay TestServerVM test fails with Graal Message-ID: We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. Thanks, Tobias ------------- Commit messages: - 8309854: ciReplay TestServerVM test fails with Graal Changes: https://git.openjdk.org/jdk/pull/14447/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14447&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309854 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14447/head:pull/14447 PR: https://git.openjdk.org/jdk/pull/14447 From chagedorn at openjdk.org Tue Jun 13 13:41:41 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 Jun 2023 13:41:41 GMT Subject: RFR: 8309854: ciReplay TestServerVM test fails with Graal In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 13:04:32 GMT, Tobias Hartmann wrote: > We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. > > Thanks, > Tobias Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14447#pullrequestreview-1477207016 From thartmann at openjdk.org Tue Jun 13 13:48:51 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jun 2023 13:48:51 GMT Subject: RFR: 8309854: ciReplay TestServerVM test fails with Graal In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 13:04:32 GMT, Tobias Hartmann wrote: > We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14447#issuecomment-1589352189 From duke at openjdk.org Tue Jun 13 13:48:57 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 13 Jun 2023 13:48:57 GMT Subject: Integrated: 8304403: Remove unused methods in RangeCheckElimination::Bound In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 09:17:45 GMT, Eric Nothum wrote: > Removed 3 unused methods in RangeCheckElimination: > > RangeCheckEliminator::Bound::set_lower > RangeCheckEliminator::Bound::set_upper > RangeCheckEliminator::Bound::add_constant > > Testing passed after removal This pull request has now been integrated. Changeset: 6d05360b Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/6d05360bec809ac121dae7afb0e770aaa7d79401 Stats: 23 lines in 2 files changed: 0 ins; 23 del; 0 mod 8304403: Remove unused methods in RangeCheckElimination::Bound Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14328 From dnsimon at openjdk.org Tue Jun 13 13:57:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 Jun 2023 13:57:43 GMT Subject: RFR: 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 08:24:23 GMT, Gerg? Barany wrote: > `jdk.vm.ci.amd64.AMD64#getLargestStorableKind(RegisterCategory)` unconditionally returns `AMD64Kind.MASK64` for mask registers. This is only correct if the target supports AVX512BW. On other AVX512 versions this should be `MASK16`. > > The Graal compiler uses this method to determine how to spill a given register. An incorrect size will lead to compilation errors due to trying to emit a move with a size that is not supported by the target. I have manually verified that this fixes those problems. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14441#pullrequestreview-1477243302 From roland at openjdk.org Tue Jun 13 14:29:42 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Jun 2023 14:29:42 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v3] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/72ef4189..987d8b4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=01-02 Stats: 14 lines in 2 files changed: 3 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From roland at openjdk.org Tue Jun 13 14:29:45 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 Jun 2023 14:29:45 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v2] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Mon, 12 Jun 2023 15:57:20 GMT, Tom Rodriguez wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> 32 bit fix > > The JVMCI, C1 and interpreter changes look ok to me. I didn't review the C2 changes. > > Yes the method profiling is unused for a long time and I'm planning to delete it when I have time. @tkrodriguez thanks for the review. New commit should address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1589421859 From duke at openjdk.org Tue Jun 13 14:43:59 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 13 Jun 2023 14:43:59 GMT Subject: RFR: 8293069: Make -XX:+Verbose less verbose Message-ID: 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). I also rearranged the if statement around Verbose and removed the TODO comment. @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. ------------- Commit messages: - Rewrote the if statement in record_best_dyno_loc - Removed noisy Verbose in ciEnv.cpp - 8293069: Guarded Verbose prints additionally with PrintOpto in compile.cpp and doCall.cpp. This avoids increased printing when setting the Verbose flag without the PrintOpto flag. Changes: https://git.openjdk.org/jdk/pull/14420/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14420&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293069 Stats: 9 lines in 3 files changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14420/head:pull/14420 PR: https://git.openjdk.org/jdk/pull/14420 From never at openjdk.org Tue Jun 13 15:08:51 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 13 Jun 2023 15:08:51 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v3] In-Reply-To: <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> Message-ID: On Tue, 13 Jun 2023 14:29:42 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review The updates look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1589498715 From kvn at openjdk.org Tue Jun 13 16:00:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 Jun 2023 16:00:55 GMT Subject: RFR: 8309854: ciReplay TestServerVM test fails with Graal In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 13:04:32 GMT, Tobias Hartmann wrote: > We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. > > Thanks, > Tobias Agree. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14447#pullrequestreview-1477553589 From thartmann at openjdk.org Tue Jun 13 16:16:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jun 2023 16:16:58 GMT Subject: Integrated: 8309854: ciReplay TestServerVM test fails with Graal In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 13:04:32 GMT, Tobias Hartmann wrote: > We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 3eec179c Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/3eec179c726e66bc1d0638dfe6e05f46fcea9d10 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8309854: ciReplay TestServerVM test fails with Graal Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14447 From thartmann at openjdk.org Tue Jun 13 16:16:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 Jun 2023 16:16:57 GMT Subject: RFR: 8309854: ciReplay TestServerVM test fails with Graal In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 13:04:32 GMT, Tobias Hartmann wrote: > We hit an assert in the test because the VM does not crash with` -XX:CICrashAt=1` with Graal as JIT. Also, Graal does not support replay compilation (see [JDK-8181747](https://bugs.openjdk.org/browse/JDK-8181747)), so the test should simply be excluded with Graal. > > Thanks, > Tobias Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14447#issuecomment-1589622244 From cslucas at openjdk.org Tue Jun 13 17:56:00 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 13 Jun 2023 17:56:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v17] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 9 Jun 2023 17:23:22 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Rome minor refactorings. > > Overall, I like how this patch shapes. > > I need to go through share/opto changes (so far, I did only a shallow review of that part), but the rest looks good. I plan to submit functional and performance testing over the weekend. @iwanowww - May I ask how did the tests go? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1589775478 From Divino.Cesar at microsoft.com Tue Jun 13 18:01:59 2023 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Tue, 13 Jun 2023 18:01:59 +0000 Subject: Use IR test framework Message-ID: Hi there! In the IR test framework, is it possible to express a check that validates the output of phase N only if some IR nodes were present in compilation phase N-1? Thanks Cesar -------------- next part -------------- An HTML attachment was scrubbed... URL: From vkempik at openjdk.org Tue Jun 13 21:32:12 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 13 Jun 2023 21:32:12 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v7] In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 12:57:38 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits jmh results, StringIndexOf advancedWithShortSubXChars - three new tests which test speed of linear search with needle of size X before: Benchmark (loops) (pathCnt) (rngSeed) Mode Cnt Score Error Units StringIndexOf.advancedWithMediumSub N/A N/A N/A avgt 25 47184.829 ? 227.340 ns/op StringIndexOf.advancedWithShortSub1 N/A N/A N/A avgt 25 4121.931 ? 194.659 ns/op StringIndexOf.advancedWithShortSub2 N/A N/A N/A avgt 25 992.714 ? 24.638 ns/op StringIndexOf.advancedWithShortSub2Chars N/A N/A N/A avgt 25 37315.528 ? 61.079 ns/op StringIndexOf.advancedWithShortSub3Chars N/A N/A N/A avgt 25 37316.547 ? 26.079 ns/op StringIndexOf.advancedWithShortSub4Chars N/A N/A N/A avgt 25 69367.409 ? 117.164 ns/op StringIndexOf.constantPattern N/A N/A N/A avgt 25 73.671 ? 3.551 ns/op StringIndexOf.searchChar16LongSuccess N/A N/A N/A avgt 25 251.513 ? 3.103 ns/op StringIndexOf.searchChar16LongWithOffsetSuccess N/A N/A N/A avgt 25 258.523 ? 2.925 ns/op StringIndexOf.searchChar16MediumSuccess N/A N/A N/A avgt 25 134.067 ? 3.635 ns/op StringIndexOf.searchChar16MediumWithOffsetSuccess N/A N/A N/A avgt 25 146.327 ? 3.257 ns/op StringIndexOf.searchChar16ShortSuccess N/A N/A N/A avgt 25 35.564 ? 2.468 ns/op StringIndexOf.searchChar16ShortWithOffsetSuccess N/A N/A N/A avgt 25 40.607 ? 3.270 ns/op StringIndexOf.searchCharLongSuccess N/A N/A N/A avgt 25 122.948 ? 4.485 ns/op StringIndexOf.searchCharMediumSuccess N/A N/A N/A avgt 25 63.505 ? 2.645 ns/op StringIndexOf.searchCharShortSuccess N/A N/A N/A avgt 25 33.107 ? 2.404 ns/op StringIndexOf.searchString16LongLatinSuccess N/A N/A N/A avgt 25 504.675 ? 6.297 ns/op StringIndexOf.searchString16LongSuccess N/A N/A N/A avgt 25 628.733 ? 3.652 ns/op StringIndexOf.searchString16LongWithOffsetLatinSuccess N/A N/A N/A avgt 25 325.615 ? 3.355 ns/op StringIndexOf.searchString16LongWithOffsetSuccess N/A N/A N/A avgt 25 343.145 ? 3.068 ns/op StringIndexOf.searchString16MediumLatinSuccess N/A N/A N/A avgt 25 226.349 ? 3.635 ns/op StringIndexOf.searchString16MediumSuccess N/A N/A N/A avgt 25 279.963 ? 3.536 ns/op StringIndexOf.searchString16MediumWithOffsetLatinSuccess N/A N/A N/A avgt 25 161.672 ? 3.024 ns/op StringIndexOf.searchString16MediumWithOffsetSuccess N/A N/A N/A avgt 25 162.722 ? 3.598 ns/op StringIndexOf.searchString16ShortLatinSuccess N/A N/A N/A avgt 25 322.307 ? 3.027 ns/op StringIndexOf.searchString16ShortSuccess N/A N/A N/A avgt 25 55.243 ? 3.518 ns/op StringIndexOf.searchString16ShortWithOffsetLatinSuccess N/A N/A N/A avgt 25 55.825 ? 2.582 ns/op StringIndexOf.searchString16ShortWithOffsetSuccess N/A N/A N/A avgt 25 54.268 ? 3.709 ns/op StringIndexOf.success N/A N/A N/A avgt 25 80.776 ? 2.500 ns/op StringIndexOf.successBig N/A N/A N/A avgt 25 6283.167 ? 11.876 ns/op StringIndexOfChar.latin1_AVX2_String 100000 1000 1999 avgt 25 206802.394 ? 598.649 ns/op StringIndexOfChar.latin1_AVX2_char 100000 1000 1999 avgt 25 103587.559 ? 214.802 ns/op StringIndexOfChar.latin1_SSE4_String 100000 1000 1999 avgt 25 121714.481 ? 118.594 ns/op StringIndexOfChar.latin1_SSE4_char 100000 1000 1999 avgt 25 75014.737 ? 178.044 ns/op StringIndexOfChar.latin1_Short_String 100000 1000 1999 avgt 25 116975.364 ? 90.326 ns/op StringIndexOfChar.latin1_Short_char 100000 1000 1999 avgt 25 81844.387 ? 230.281 ns/op StringIndexOfChar.latin1_mixed_String 100000 1000 1999 avgt 25 210860.343 ? 159.635 ns/op StringIndexOfChar.latin1_mixed_char 100000 1000 1999 avgt 25 117095.518 ? 204.476 ns/op StringIndexOfChar.utf16_AVX2_String 100000 1000 1999 avgt 25 100868.093 ? 136.887 ns/op StringIndexOfChar.utf16_AVX2_char 100000 1000 1999 avgt 25 80257.944 ? 208.123 ns/op StringIndexOfChar.utf16_SSE4_String 100000 1000 1999 avgt 25 74831.080 ? 284.069 ns/op StringIndexOfChar.utf16_SSE4_char 100000 1000 1999 avgt 25 64963.525 ? 113.680 ns/op StringIndexOfChar.utf16_Short_String 100000 1000 1999 avgt 25 72531.734 ? 209.899 ns/op StringIndexOfChar.utf16_Short_char 100000 1000 1999 avgt 25 70835.907 ? 202.187 ns/op StringIndexOfChar.utf16_mixed_String 100000 1000 1999 avgt 25 162457.612 ? 178.987 ns/op StringIndexOfChar.utf16_mixed_char 100000 1000 1999 avgt 25 149974.738 ? 320.802 ns/op Hifive, after: Benchmark (loops) (pathCnt) (rngSeed) Mode Cnt Score Error Units StringIndexOf.advancedWithMediumSub N/A N/A N/A avgt 25 4276.564 ? 39.149 ns/op StringIndexOf.advancedWithShortSub1 N/A N/A N/A avgt 25 4149.350 ? 209.233 ns/op StringIndexOf.advancedWithShortSub2 N/A N/A N/A avgt 25 1128.838 ? 20.157 ns/op StringIndexOf.advancedWithShortSub2Chars N/A N/A N/A avgt 25 1277.692 ? 13.031 ns/op StringIndexOf.advancedWithShortSub3Chars N/A N/A N/A avgt 25 1313.186 ? 9.654 ns/op StringIndexOf.advancedWithShortSub4Chars N/A N/A N/A avgt 25 2488.046 ? 8.964 ns/op StringIndexOf.constantPattern N/A N/A N/A avgt 25 79.567 ? 5.082 ns/op StringIndexOf.searchChar16LongSuccess N/A N/A N/A avgt 25 251.484 ? 3.302 ns/op StringIndexOf.searchChar16LongWithOffsetSuccess N/A N/A N/A avgt 25 256.214 ? 3.778 ns/op StringIndexOf.searchChar16MediumSuccess N/A N/A N/A avgt 25 133.622 ? 3.497 ns/op StringIndexOf.searchChar16MediumWithOffsetSuccess N/A N/A N/A avgt 25 139.377 ? 3.008 ns/op StringIndexOf.searchChar16ShortSuccess N/A N/A N/A avgt 25 35.788 ? 2.936 ns/op StringIndexOf.searchChar16ShortWithOffsetSuccess N/A N/A N/A avgt 25 37.000 ? 2.983 ns/op StringIndexOf.searchCharLongSuccess N/A N/A N/A avgt 25 124.275 ? 4.894 ns/op StringIndexOf.searchCharMediumSuccess N/A N/A N/A avgt 25 65.132 ? 3.882 ns/op StringIndexOf.searchCharShortSuccess N/A N/A N/A avgt 25 35.020 ? 3.418 ns/op StringIndexOf.searchString16LongLatinSuccess N/A N/A N/A avgt 25 595.135 ? 5.635 ns/op StringIndexOf.searchString16LongSuccess N/A N/A N/A avgt 25 630.710 ? 3.627 ns/op StringIndexOf.searchString16LongWithOffsetLatinSuccess N/A N/A N/A avgt 25 321.968 ? 3.086 ns/op StringIndexOf.searchString16LongWithOffsetSuccess N/A N/A N/A avgt 25 344.868 ? 5.492 ns/op StringIndexOf.searchString16MediumLatinSuccess N/A N/A N/A avgt 25 268.289 ? 7.435 ns/op StringIndexOf.searchString16MediumSuccess N/A N/A N/A avgt 25 276.393 ? 3.831 ns/op StringIndexOf.searchString16MediumWithOffsetLatinSuccess N/A N/A N/A avgt 25 161.604 ? 2.949 ns/op StringIndexOf.searchString16MediumWithOffsetSuccess N/A N/A N/A avgt 25 166.575 ? 3.478 ns/op StringIndexOf.searchString16ShortLatinSuccess N/A N/A N/A avgt 25 390.758 ? 5.794 ns/op StringIndexOf.searchString16ShortSuccess N/A N/A N/A avgt 25 55.287 ? 4.530 ns/op StringIndexOf.searchString16ShortWithOffsetLatinSuccess N/A N/A N/A avgt 25 48.239 ? 1.333 ns/op StringIndexOf.searchString16ShortWithOffsetSuccess N/A N/A N/A avgt 25 51.657 ? 2.762 ns/op StringIndexOf.success N/A N/A N/A avgt 25 83.580 ? 3.200 ns/op StringIndexOf.successBig N/A N/A N/A avgt 25 6253.601 ? 13.245 ns/op StringIndexOfChar.latin1_AVX2_String 100000 1000 1999 avgt 25 180259.333 ? 428.243 ns/op StringIndexOfChar.latin1_AVX2_char 100000 1000 1999 avgt 25 103301.911 ? 157.780 ns/op StringIndexOfChar.latin1_SSE4_String 100000 1000 1999 avgt 25 106739.090 ? 206.242 ns/op StringIndexOfChar.latin1_SSE4_char 100000 1000 1999 avgt 25 75027.524 ? 208.941 ns/op StringIndexOfChar.latin1_Short_String 100000 1000 1999 avgt 25 102724.833 ? 231.911 ns/op StringIndexOfChar.latin1_Short_char 100000 1000 1999 avgt 25 81018.525 ? 138.541 ns/op StringIndexOfChar.latin1_mixed_String 100000 1000 1999 avgt 25 184633.008 ? 209.443 ns/op StringIndexOfChar.latin1_mixed_char 100000 1000 1999 avgt 25 116350.746 ? 298.832 ns/op StringIndexOfChar.utf16_AVX2_String 100000 1000 1999 avgt 25 110819.605 ? 137.955 ns/op StringIndexOfChar.utf16_AVX2_char 100000 1000 1999 avgt 25 79956.001 ? 254.436 ns/op StringIndexOfChar.utf16_SSE4_String 100000 1000 1999 avgt 25 75500.341 ? 186.736 ns/op StringIndexOfChar.utf16_SSE4_char 100000 1000 1999 avgt 25 64974.675 ? 211.639 ns/op StringIndexOfChar.utf16_Short_String 100000 1000 1999 avgt 25 71304.026 ? 163.559 ns/op StringIndexOfChar.utf16_Short_char 100000 1000 1999 avgt 25 70843.242 ? 173.108 ns/op StringIndexOfChar.utf16_mixed_String 100000 1000 1999 avgt 25 191690.983 ? 301.041 ns/op StringIndexOfChar.utf16_mixed_char 100000 1000 1999 avgt 25 149988.445 ? 175.224 ns/op Thead, before: Benchmark (loops) (pathCnt) (rngSeed) Mode Cnt Score Error Units StringIndexOf.advancedWithMediumSub N/A N/A N/A avgt 25 2734.898 ? 61.540 ns/op StringIndexOf.advancedWithShortSub1 N/A N/A N/A avgt 25 2440.471 ? 90.996 ns/op StringIndexOf.advancedWithShortSub2 N/A N/A N/A avgt 25 722.081 ? 29.674 ns/op StringIndexOf.advancedWithShortSub2Chars N/A N/A N/A avgt 25 679.410 ? 5.793 ns/op StringIndexOf.advancedWithShortSub3Chars N/A N/A N/A avgt 25 875.206 ? 26.224 ns/op StringIndexOf.advancedWithShortSub4Chars N/A N/A N/A avgt 25 747.692 ? 5.600 ns/op StringIndexOf.constantPattern N/A N/A N/A avgt 25 69.154 ? 0.647 ns/op StringIndexOf.searchChar16LongSuccess N/A N/A N/A avgt 25 172.494 ? 0.754 ns/op StringIndexOf.searchChar16LongWithOffsetSuccess N/A N/A N/A avgt 25 177.181 ? 0.126 ns/op StringIndexOf.searchChar16MediumSuccess N/A N/A N/A avgt 25 106.646 ? 1.143 ns/op StringIndexOf.searchChar16MediumWithOffsetSuccess N/A N/A N/A avgt 25 109.219 ? 1.165 ns/op StringIndexOf.searchChar16ShortSuccess N/A N/A N/A avgt 25 40.604 ? 0.316 ns/op StringIndexOf.searchChar16ShortWithOffsetSuccess N/A N/A N/A avgt 25 40.440 ? 0.514 ns/op StringIndexOf.searchCharLongSuccess N/A N/A N/A avgt 25 96.637 ? 0.335 ns/op StringIndexOf.searchCharMediumSuccess N/A N/A N/A avgt 25 60.237 ? 1.648 ns/op StringIndexOf.searchCharShortSuccess N/A N/A N/A avgt 25 37.428 ? 0.623 ns/op StringIndexOf.searchString16LongLatinSuccess N/A N/A N/A avgt 25 277.862 ? 12.231 ns/op StringIndexOf.searchString16LongSuccess N/A N/A N/A avgt 25 332.158 ? 0.254 ns/op StringIndexOf.searchString16LongWithOffsetLatinSuccess N/A N/A N/A avgt 25 398.582 ? 0.380 ns/op StringIndexOf.searchString16LongWithOffsetSuccess N/A N/A N/A avgt 25 422.520 ? 0.153 ns/op StringIndexOf.searchString16MediumLatinSuccess N/A N/A N/A avgt 25 135.033 ? 2.969 ns/op StringIndexOf.searchString16MediumSuccess N/A N/A N/A avgt 25 157.165 ? 0.459 ns/op StringIndexOf.searchString16MediumWithOffsetLatinSuccess N/A N/A N/A avgt 25 178.419 ? 1.152 ns/op StringIndexOf.searchString16MediumWithOffsetSuccess N/A N/A N/A avgt 25 189.184 ? 0.507 ns/op StringIndexOf.searchString16ShortLatinSuccess N/A N/A N/A avgt 25 189.720 ? 5.050 ns/op StringIndexOf.searchString16ShortSuccess N/A N/A N/A avgt 25 48.456 ? 0.015 ns/op StringIndexOf.searchString16ShortWithOffsetLatinSuccess N/A N/A N/A avgt 25 41.523 ? 0.261 ns/op StringIndexOf.searchString16ShortWithOffsetSuccess N/A N/A N/A avgt 25 44.079 ? 0.142 ns/op StringIndexOf.success N/A N/A N/A avgt 25 56.303 ? 0.506 ns/op StringIndexOf.successBig N/A N/A N/A avgt 25 240.224 ? 0.718 ns/op StringIndexOfChar.latin1_AVX2_String 100000 1000 1999 avgt 25 151718.306 ? 697.735 ns/op StringIndexOfChar.latin1_AVX2_char 100000 1000 1999 avgt 25 101266.052 ? 975.917 ns/op StringIndexOfChar.latin1_SSE4_String 100000 1000 1999 avgt 25 101792.535 ? 341.851 ns/op StringIndexOfChar.latin1_SSE4_char 100000 1000 1999 avgt 25 55309.860 ? 154.954 ns/op StringIndexOfChar.latin1_Short_String 100000 1000 1999 avgt 25 94692.722 ? 413.354 ns/op StringIndexOfChar.latin1_Short_char 100000 1000 1999 avgt 25 60527.606 ? 534.854 ns/op StringIndexOfChar.latin1_mixed_String 100000 1000 1999 avgt 25 154694.070 ? 323.422 ns/op StringIndexOfChar.latin1_mixed_char 100000 1000 1999 avgt 25 102887.596 ? 123.646 ns/op StringIndexOfChar.utf16_AVX2_String 100000 1000 1999 avgt 25 102949.366 ? 2005.041 ns/op StringIndexOfChar.utf16_AVX2_char 100000 1000 1999 avgt 25 57791.800 ? 104.712 ns/op StringIndexOfChar.utf16_SSE4_String 100000 1000 1999 avgt 25 62716.138 ? 163.635 ns/op StringIndexOfChar.utf16_SSE4_char 100000 1000 1999 avgt 25 46677.973 ? 161.807 ns/op StringIndexOfChar.utf16_Short_String 100000 1000 1999 avgt 25 56375.027 ? 486.974 ns/op StringIndexOfChar.utf16_Short_char 100000 1000 1999 avgt 25 50512.176 ? 383.844 ns/op StringIndexOfChar.utf16_mixed_String 100000 1000 1999 avgt 25 145740.443 ? 484.267 ns/op StringIndexOfChar.utf16_mixed_char 100000 1000 1999 avgt 25 127834.969 ? 130.643 ns/op thead, after: Benchmark (loops) (pathCnt) (rngSeed) Mode Cnt Score Error Units StringIndexOf.advancedWithMediumSub N/A N/A N/A avgt 25 3377.943 ? 42.496 ns/op StringIndexOf.advancedWithShortSub1 N/A N/A N/A avgt 25 2567.466 ? 57.557 ns/op StringIndexOf.advancedWithShortSub2 N/A N/A N/A avgt 25 844.403 ? 6.488 ns/op StringIndexOf.advancedWithShortSub2Chars N/A N/A N/A avgt 25 892.346 ? 11.231 ns/op StringIndexOf.advancedWithShortSub3Chars N/A N/A N/A avgt 25 942.688 ? 19.306 ns/op StringIndexOf.advancedWithShortSub4Chars N/A N/A N/A avgt 25 761.535 ? 20.112 ns/op StringIndexOf.constantPattern N/A N/A N/A avgt 25 75.172 ? 0.294 ns/op StringIndexOf.searchChar16LongSuccess N/A N/A N/A avgt 25 172.765 ? 1.537 ns/op StringIndexOf.searchChar16LongWithOffsetSuccess N/A N/A N/A avgt 25 177.554 ? 0.515 ns/op StringIndexOf.searchChar16MediumSuccess N/A N/A N/A avgt 25 105.234 ? 1.079 ns/op StringIndexOf.searchChar16MediumWithOffsetSuccess N/A N/A N/A avgt 25 107.671 ? 1.415 ns/op StringIndexOf.searchChar16ShortSuccess N/A N/A N/A avgt 25 40.933 ? 0.015 ns/op StringIndexOf.searchChar16ShortWithOffsetSuccess N/A N/A N/A avgt 25 42.273 ? 2.262 ns/op StringIndexOf.searchCharLongSuccess N/A N/A N/A avgt 25 99.018 ? 1.945 ns/op StringIndexOf.searchCharMediumSuccess N/A N/A N/A avgt 25 62.872 ? 3.143 ns/op StringIndexOf.searchCharShortSuccess N/A N/A N/A avgt 25 36.762 ? 0.029 ns/op StringIndexOf.searchString16LongLatinSuccess N/A N/A N/A avgt 25 395.942 ? 0.239 ns/op StringIndexOf.searchString16LongSuccess N/A N/A N/A avgt 25 328.769 ? 0.298 ns/op StringIndexOf.searchString16LongWithOffsetLatinSuccess N/A N/A N/A avgt 25 312.369 ? 0.601 ns/op StringIndexOf.searchString16LongWithOffsetSuccess N/A N/A N/A avgt 25 422.857 ? 0.483 ns/op StringIndexOf.searchString16MediumLatinSuccess N/A N/A N/A avgt 25 175.366 ? 0.034 ns/op StringIndexOf.searchString16MediumSuccess N/A N/A N/A avgt 25 153.542 ? 0.474 ns/op StringIndexOf.searchString16MediumWithOffsetLatinSuccess N/A N/A N/A avgt 25 146.393 ? 0.080 ns/op StringIndexOf.searchString16MediumWithOffsetSuccess N/A N/A N/A avgt 25 175.485 ? 12.868 ns/op StringIndexOf.searchString16ShortLatinSuccess N/A N/A N/A avgt 25 253.175 ? 1.237 ns/op StringIndexOf.searchString16ShortSuccess N/A N/A N/A avgt 25 46.278 ? 0.316 ns/op StringIndexOf.searchString16ShortWithOffsetLatinSuccess N/A N/A N/A avgt 25 42.041 ? 0.566 ns/op StringIndexOf.searchString16ShortWithOffsetSuccess N/A N/A N/A avgt 25 44.513 ? 0.976 ns/op StringIndexOf.success N/A N/A N/A avgt 25 58.469 ? 0.017 ns/op StringIndexOf.successBig N/A N/A N/A avgt 25 240.645 ? 0.649 ns/op StringIndexOfChar.latin1_AVX2_String 100000 1000 1999 avgt 25 137297.837 ? 1618.438 ns/op StringIndexOfChar.latin1_AVX2_char 100000 1000 1999 avgt 25 99919.463 ? 264.771 ns/op StringIndexOfChar.latin1_SSE4_String 100000 1000 1999 avgt 25 93552.042 ? 412.514 ns/op StringIndexOfChar.latin1_SSE4_char 100000 1000 1999 avgt 25 55130.042 ? 228.381 ns/op StringIndexOfChar.latin1_Short_String 100000 1000 1999 avgt 25 93682.963 ? 448.951 ns/op StringIndexOfChar.latin1_Short_char 100000 1000 1999 avgt 25 60450.415 ? 544.678 ns/op StringIndexOfChar.latin1_mixed_String 100000 1000 1999 avgt 25 139723.661 ? 656.951 ns/op StringIndexOfChar.latin1_mixed_char 100000 1000 1999 avgt 25 102253.415 ? 189.882 ns/op StringIndexOfChar.utf16_AVX2_String 100000 1000 1999 avgt 25 101267.586 ? 437.666 ns/op StringIndexOfChar.utf16_AVX2_char 100000 1000 1999 avgt 25 58385.242 ? 423.666 ns/op StringIndexOfChar.utf16_SSE4_String 100000 1000 1999 avgt 25 61231.849 ? 111.539 ns/op StringIndexOfChar.utf16_SSE4_char 100000 1000 1999 avgt 25 46524.978 ? 171.727 ns/op StringIndexOfChar.utf16_Short_String 100000 1000 1999 avgt 25 56955.300 ? 115.976 ns/op StringIndexOfChar.utf16_Short_char 100000 1000 1999 avgt 25 50042.089 ? 353.580 ns/op StringIndexOfChar.utf16_mixed_String 100000 1000 1999 avgt 25 156943.226 ? 260.089 ns/op StringIndexOfChar.utf16_mixed_char 100000 1000 1999 avgt 25 129073.240 ? 124.931 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1590050223 From kvn at openjdk.org Wed Jun 14 04:49:43 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 04:49:43 GMT Subject: RFR: 8309978: [x64] Fix useless padding Message-ID: Fixed typo in `IntelJccErratum::compute_padding()`. Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 nop # 16 bytes pad for loops and calls 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 049 jle,s B4 P=0.667944 C=6785.000000 Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. Added new IR test. Tested tier1-3, xcomp, stress. ------------- Commit messages: - JDK-8309978: [x64] Fix useless padding Changes: https://git.openjdk.org/jdk/pull/14461/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14461&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309978 Stats: 99 lines in 4 files changed: 96 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14461/head:pull/14461 PR: https://git.openjdk.org/jdk/pull/14461 From christian.hagedorn at oracle.com Wed Jun 14 06:03:29 2023 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 14 Jun 2023 08:03:29 +0200 Subject: Use IR test framework In-Reply-To: References: Message-ID: Hi Cesar That's not possible, unfortunately. If you think it's a more common and useful check to have, please file an RFE and I or someone else can give it some more thought :-) Best regards, Christian On 13.06.23 20:01, Cesar Soares Lucas wrote: > Hi there! > > ? > > In the IR test framework, is it possible to express a check that validates the > output of phase N only if some IR nodes were present in compilation phase N-1? > > ? > > ? > > Thanks > > Cesar > From fgao at openjdk.org Wed Jun 14 06:26:56 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 14 Jun 2023 06:26:56 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 06:46:30 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Add vm.flagless back in for LoopArrayIndexComputeTest.java > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. Hi @eme64 , nice rewrite! May I ask if you have any benchmark data of misaligned-load-store cases for other data types? For example, `Double` or `Long` on 128-bit machines (maybe aarch64 asimd). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1590549717 From epeter at openjdk.org Wed Jun 14 06:40:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jun 2023 06:40:57 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 06:24:29 GMT, Fei Gao wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Add vm.flagless back in for LoopArrayIndexComputeTest.java > >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Hi @eme64 , nice rewrite! > > May I ask if you have any benchmark data of misaligned-load-store cases for other data types? For example, `Double` or `Long` on 128-bit machines (maybe aarch64 asimd). @fg1417 Ok, I will expand the misaligned load-store case for some other data types and test it on my two machines again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1590563259 From chagedorn at openjdk.org Wed Jun 14 07:07:56 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Jun 2023 07:07:56 GMT Subject: RFR: 8309978: [x64] Fix useless padding In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 04:42:22 GMT, Vladimir Kozlov wrote: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. Otherwise, the fix looks good. Thanks for adding an IR test! test/hotspot/jtreg/compiler/c2/irTests/TestPadding.java line 50: > 48: test(i); > 49: tpf.b1++; // to take both branches in test() > 50: } `test_runner()` will be invoked 2000 times (default warm-up) before the explicit compilation request of `test()` by the IR framework. So, this loop will run 2000 * 11000 times. Do you need that many iterations or can the loop be removed such that we only have 2000 warm-up iterations? I.e. something like: @Run(test = "test") public static void test_runner() { tpf = new TestPadding(); test(42); tpf.b1++; // to take both branches in test() } If you need more iterations, you could still specify `@Warmup(12345)` at `test_runner()` to get more profiling in before compilation of `test()`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14461#pullrequestreview-1478618687 PR Review Comment: https://git.openjdk.org/jdk/pull/14461#discussion_r1229110689 From thartmann at openjdk.org Wed Jun 14 07:50:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jun 2023 07:50:57 GMT Subject: RFR: 8293069: Make -XX:+Verbose less verbose In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 15:17:35 GMT, Eric Nothum wrote: > 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. > > 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). > I also rearranged the if statement around Verbose and removed the TODO comment. > @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. Looks good to me. Thanks for fixing this annoyance. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14420#pullrequestreview-1478741823 From pli at openjdk.org Wed Jun 14 08:13:58 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 14 Jun 2023 08:13:58 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 06:38:21 GMT, Emanuel Peter wrote: >>> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Hi @eme64 , nice rewrite! >> >> May I ask if you have any benchmark data of misaligned-load-store cases for other data types? For example, `Double` or `Long` on 128-bit machines (maybe aarch64 asimd). > > @fg1417 Ok, I will expand the misaligned load-store case for some other data types and test it on my two machines again! Hi @eme64 , I don't see any problem after going through everything you wrote. But as I'm not an official reviewer and don't have enough confidence on complex changes, I hope someone who is more familiar with this part can have a review. BTW: Have you generated and run some [_JavaFuzzer_](https://github.com/shipilev/JavaFuzzer) tests for this patch? Based on my personal experience, it's quite helpful for finding hidden bugs in SuperWord or other complex loop optimizations. And a few more comments on this: > 3. We could even create a VectorTransformGraph from a single iteration loop, and try to widen the instructions there. If this succeeds we do not have to unroll before vectorizing. This is essentially a traditional loop vectorizer. Except that we can also run the SuperWord algorith over it first to see if we have already any parallelism in the single iteration loop. And then widen that. This makes it a hybrid vectorizer. Not having to unroll means direct time savings, but also that we could vectorize larger loops in the first place, since we would not hit the node limit for unrolling. What we are current doing for https://bugs.openjdk.org/browse/JDK-8308994 is like a **traditional loop vectorizer** - it can vectorize loops without unrolling. It can also support **strided accesses** (gather/scatter) with a few updates. But our current implementation is outside SuperWord and for post loops only. Perhaps a **hybrid vectorizer** implemented in SuperWord is a better ideal. We will push our draft patch to GitHub soon for your feedback. Currently I'm finishing some routines before I can push the code. It's expected to be done in a few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1590693552 From chagedorn at openjdk.org Wed Jun 14 08:24:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 Jun 2023 08:24:02 GMT Subject: RFR: 8293069: Make -XX:+Verbose less verbose In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 15:17:35 GMT, Eric Nothum wrote: > 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. > > 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). > I also rearranged the if statement around Verbose and removed the TODO comment. > @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14420#pullrequestreview-1478804790 From mdoerr at openjdk.org Wed Jun 14 08:32:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 14 Jun 2023 08:32:11 GMT Subject: Integrated: 8309613: [Windows] hs_err files sometimes miss information about the code containing the error In-Reply-To: References: Message-ID: On Wed, 7 Jun 2023 14:32:13 GMT, Martin Doerr wrote: > We have seen hs_err files for errors triggered by C2 compiled methods which miss the most relevant information: the C2 method (see JBS issue for more details). I have found a possibility to add it. Please take a look and provide feedback. > > Testing: > > diff --git a/src/hotspot/share/opto/parse1.cpp b/src/hotspot/share/opto/parse1.cpp > index f179d3ba88d..c35a1ac595e 100644 > --- a/src/hotspot/share/opto/parse1.cpp > +++ b/src/hotspot/share/opto/parse1.cpp > @@ -1210,6 +1210,12 @@ void Parse::do_method_entry() { > make_dtrace_method_entry(method()); > } > > + if (UseNewCode) { > + Node* halt = _gvn.transform(new HaltNode(control(), frameptr(), "Requested Halt!")); > + C->root()->add_req(halt); > + set_control(halt); > + } > + > #ifdef ASSERT > // Narrow receiver type when it is too broad for the method being parsed. > if (!method()->is_static()) { > > > "java -XX:+UseNewCode -version" shows the following output (when no hsdis lib is provided): > > --------------- T H R E A D --------------- > > Current thread (0x0000024daebb2b30): JavaThread "main" [_thread_in_Java, id=30876, stack(0x000000cdacc00000,0x000000cdacd00000) (1024K)] > > Stack: [0x000000cdacc00000,0x000000cdacd00000] > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [jvm.dll+0x6ca5b9] os::win32::platform_print_native_stack+0xd9 (os_windows_x86.cpp:236) > V [jvm.dll+0x8a3af1] VMError::report+0xd61 (vmError.cpp:991) > V [jvm.dll+0x8a5d6e] VMError::report_and_die+0x5fe (vmError.cpp:1797) > V [jvm.dll+0x283061] report_fatal+0x71 (debug.cpp:212) > V [jvm.dll+0x621c3e] MacroAssembler::debug64+0x8e (macroAssembler_x86.cpp:829) > C 0x0000024dc1553cf4 > > The last pc belongs to nmethod (printed below). > > Compiled method (c2) 92 16 4 java.lang.Object:: (1 bytes) > total in heap [0x0000024dc1553b10,0x0000024dc1553d50] = 576 > relocation [0x0000024dc1553c70,0x0000024dc1553c88] = 24 > main code [0x0000024dc1553ca0,0x0000024dc1553d00] = 96 > stub code [0x0000024dc1553d00,0x0000024dc1553d18] = 24 > metadata [0x0000024dc1553d18,0x0000024dc1553d20] = 8 > scopes data [0x0000024dc1553d20,0x0000024dc1553d28] = 8 > scopes pcs [0x0000024dc1553d28,0x0000024dc1553d48] = 32 > dependencies [0x0000024dc1553d48,0x0000024dc1553d50] = 8 > > [Constant Pool (empty)] > > [MachCode] > [Entry Point] > # {method} {0x0000000800478d78} '' '()V' in 'java/lang/Object' > # [sp+0x20] (sp of caller) > 0x0000024dc1553ca0: 448b 5208 | 49bb 0000 | 0000 0800... This pull request has now been integrated. Changeset: bd79db39 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/bd79db3930f192f6742e29a63a6d1c3bc3dd3385 Stats: 56 lines in 10 files changed: 41 ins; 0 del; 15 mod 8309613: [Windows] hs_err files sometimes miss information about the code containing the error Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/14358 From dlong at openjdk.org Wed Jun 14 08:34:11 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 14 Jun 2023 08:34:11 GMT Subject: RFR: 8293069: Make -XX:+Verbose less verbose In-Reply-To: References: Message-ID: <6RxotBpMpBfK6SDzQ-1ab34GTDoVt-zrwdYPbQcTI8w=.df578356-d62c-40fb-96a1-84d74309b8e7@github.com> On Mon, 12 Jun 2023 15:17:35 GMT, Eric Nothum wrote: > 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. > > 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). > I also rearranged the if statement around Verbose and removed the TODO comment. > @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. The TODO is still relevant (we could chose a winner based on shorting string length, for example), but I don't think it's important enough for an RFE. Removing the TODO is fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14420#issuecomment-1590728046 From duke at openjdk.org Wed Jun 14 09:16:59 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 14 Jun 2023 09:16:59 GMT Subject: RFR: 8293069: Make -XX:+Verbose less verbose In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 15:17:35 GMT, Eric Nothum wrote: > 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. > > 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). > I also rearranged the if statement around Verbose and removed the TODO comment. > @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. Thanks for your feedback! In that case I won't create an RFE and proceed with the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14420#issuecomment-1590811156 From rcastanedalo at openjdk.org Wed Jun 14 09:54:33 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Jun 2023 09:54:33 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' Message-ID: This changeset skips the verification at the end of `SubTypeNode::Ideal()` if the bottom type of `obj_or_subklass` is TOP, to avoid violating the contract of `LoadKlassNode::make()`. This can happen for example in transient scenarios where `obj_or_subklass` is a projection of the TOP node, see the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8303513) for more details. The proposed fix has low risk, since it affects debug-only code. ##### Testing: - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode) - original RunThese8M test (using `-XX:+UseZGC -XX:+ZGenerational` on linux-x64, 20 repetitions) Deriving a minimal regression test is ongoing work, but might take some time due to the complex nature of the failure (see analysis on JBS). To reduce noise in test pipelines and ease work on other open RunThese8M issues such as [JDK-8308048](https://bugs.openjdk.org/browse/JDK-8308048), I propose to integrate this fix first and contribute the minimal regression test later as a follow-up enhancement. ------------- Commit messages: - Skip verification if the bottom type of obj_or_subklass is TOP Changes: https://git.openjdk.org/jdk/pull/14463/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14463&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303513 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14463.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14463/head:pull/14463 PR: https://git.openjdk.org/jdk/pull/14463 From thartmann at openjdk.org Wed Jun 14 10:35:55 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jun 2023 10:35:55 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > This changeset skips the verification at the end of `SubTypeNode::Ideal()` if the bottom type of `obj_or_subklass` is TOP, to avoid violating the contract of `LoadKlassNode::make()`. This can happen for example in transient scenarios where `obj_or_subklass` is a projection of the TOP node, see the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8303513) for more details. The proposed fix has low risk, since it affects debug-only code. > > ##### Testing: > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode) > > - original RunThese8M test (using `-XX:+UseZGC -XX:+ZGenerational` on linux-x64, 20 repetitions) > > Deriving a minimal regression test is ongoing work, but might take some time due to the complex nature of the failure (see analysis on JBS). To reduce noise in test pipelines and ease work on other open RunThese8M issues such as [JDK-8308048](https://bugs.openjdk.org/browse/JDK-8308048), I propose to integrate this fix first and contribute the minimal regression test later as a follow-up enhancement. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14463#pullrequestreview-1479069737 From thartmann at openjdk.org Wed Jun 14 10:46:56 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jun 2023 10:46:56 GMT Subject: RFR: 8309978: [x64] Fix useless padding In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 04:42:22 GMT, Vladimir Kozlov wrote: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14461#pullrequestreview-1479089241 From epeter at openjdk.org Wed Jun 14 11:04:49 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jun 2023 11:04:49 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v5] In-Reply-To: References: Message-ID: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - bench000: add other type examples - bench100: added versions for more types (misaligned load store) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14096/files - new: https://git.openjdk.org/jdk/pull/14096/files/06cc1c37..78bce308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=03-04 Stats: 155 lines in 1 file changed: 153 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From shade at openjdk.org Wed Jun 14 11:08:57 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 Jun 2023 11:08:57 GMT Subject: RFR: 8309978: [x64] Fix useless padding In-Reply-To: References: Message-ID: <_1bPK0E62PUCvJfFX_z3FgrEhriMGq9CowQ6iz_6vWM=.ffaa16d1-6999-4ce7-b217-dec825d088a7@github.com> On Wed, 14 Jun 2023 04:42:22 GMT, Vladimir Kozlov wrote: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. Looks fine to me, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14461#pullrequestreview-1479124998 From thartmann at openjdk.org Wed Jun 14 11:12:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 Jun 2023 11:12:00 GMT Subject: RFR: 8309847: FrameForm and RegisterForm constructors should initialize all members In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 02:26:01 GMT, Vladimir Petko wrote: > This PR fixes missing constructor initialisations in formsopt.cpp. > This PR does not implement [CppCoreGuidelines#Rc-in-class-initializer](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-in-class-initializer ) to keep the style consistent and minimise changes. Looks good to me too. I'll sponsor after some quick sanity testing. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14435#pullrequestreview-1479130577 From epeter at openjdk.org Wed Jun 14 11:13:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jun 2023 11:13:24 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into JDK-8308606 - bench000: add other type examples - bench100: added versions for more types (misaligned load store) - Add vm.flagless back in for LoopArrayIndexComputeTest.java - removed AlignVector from IR framework again, do that in RFE - IR whitelist AlignVector, require it false in the newly added tests - Merge branch 'master' into JDK-8308606 - Merge branch 'master' into JDK-8308606 - remove some outdated comments - Benchmark VectorAlignment - ... and 4 more: https://git.openjdk.org/jdk/compare/d437a63a...0740b7bc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14096/files - new: https://git.openjdk.org/jdk/pull/14096/files/78bce308..0740b7bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14096&range=04-05 Stats: 19628 lines in 533 files changed: 10260 ins; 7763 del; 1605 mod Patch: https://git.openjdk.org/jdk/pull/14096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14096/head:pull/14096 PR: https://git.openjdk.org/jdk/pull/14096 From epeter at openjdk.org Wed Jun 14 11:30:00 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jun 2023 11:30:00 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 11:13:24 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308606 > - bench000: add other type examples > - bench100: added versions for more types (misaligned load store) > - Add vm.flagless back in for LoopArrayIndexComputeTest.java > - removed AlignVector from IR framework again, do that in RFE > - IR whitelist AlignVector, require it false in the newly added tests > - Merge branch 'master' into JDK-8308606 > - Merge branch 'master' into JDK-8308606 > - remove some outdated comments > - Benchmark VectorAlignment > - ... and 4 more: https://git.openjdk.org/jdk/compare/d73a98f8...0740b7bc I'm collecting the new benchmark results here, so that we see the effect of misaligned load-stores. I have a series of control cases (aligned), and a series of misaligned cases. ------------- Machine: 11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16. With AVX512 support. With patch: TODO Master: TODO ------------- In comparison on a aarch64 machine with asimd support: With patch: TODO Master: TODO ------------- **Discussion** TODO ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1591010661 From epeter at openjdk.org Wed Jun 14 11:32:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 Jun 2023 11:32:57 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v4] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 08:10:51 GMT, Pengfei Li wrote: >> @fg1417 Ok, I will expand the misaligned load-store case for some other data types and test it on my two machines again! > > Hi @eme64 , I don't see any problem after going through everything you wrote. But as I'm not an official reviewer and don't have enough confidence on complex changes, I hope someone who is more familiar with this part can have a review. > > BTW: Have you generated and run some [_JavaFuzzer_](https://github.com/shipilev/JavaFuzzer) tests for this patch? Based on my personal experience, it's quite helpful for finding hidden bugs in SuperWord or other complex loop optimizations. > > And a few more comments on this: >> 3. We could even create a VectorTransformGraph from a single iteration loop, and try to widen the instructions there. If this succeeds we do not have to unroll before vectorizing. This is essentially a traditional loop vectorizer. Except that we can also run the SuperWord algorith over it first to see if we have already any parallelism in the single iteration loop. And then widen that. This makes it a hybrid vectorizer. Not having to unroll means direct time savings, but also that we could vectorize larger loops in the first place, since we would not hit the node limit for unrolling. > > What we are current doing for https://bugs.openjdk.org/browse/JDK-8308994 is like a **traditional loop vectorizer** - it can vectorize loops without unrolling. It can also support **strided accesses** (gather/scatter) with a few updates. But our current implementation is outside SuperWord and for post loops only. Perhaps a **hybrid vectorizer** implemented in SuperWord is a better ideal. We will push our draft patch to GitHub soon for your feedback. Currently I'm finishing some routines before I can push the code. It's expected to be done in a few days. @pfustc We by default run the fuzzer for a few runs, but I'm running it a bit more now just to get a bit more confidence. I'm looking forward to your draft PR. Maybe we can work together towards a hybrid-vectorizer. I plan to keep working on SuperWord and vectorization in general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1591013863 From duke at openjdk.org Wed Jun 14 11:50:07 2023 From: duke at openjdk.org (Vladimir Petko) Date: Wed, 14 Jun 2023 11:50:07 GMT Subject: Integrated: 8309847: FrameForm and RegisterForm constructors should initialize all members In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 02:26:01 GMT, Vladimir Petko wrote: > This PR fixes missing constructor initialisations in formsopt.cpp. > This PR does not implement [CppCoreGuidelines#Rc-in-class-initializer](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-in-class-initializer ) to keep the style consistent and minimise changes. This pull request has now been integrated. Changeset: e3d6fc87 Author: Vladimir Petko Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/e3d6fc875b98c9ac2e63aec4a52bcf1515d797df Stats: 11 lines in 1 file changed: 9 ins; 1 del; 1 mod 8309847: FrameForm and RegisterForm constructors should initialize all members Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14435 From rcastanedalo at openjdk.org Wed Jun 14 12:01:58 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 Jun 2023 12:01:58 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 10:32:59 GMT, Tobias Hartmann wrote: > Looks good to me. Thanks for reviewing, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14463#issuecomment-1591054326 From jsjolen at openjdk.org Wed Jun 14 12:34:53 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 14 Jun 2023 12:34:53 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 10:17:46 GMT, Johan Sj?len wrote: > Hi, > > Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. > > Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. > > I'm currently running tier1-3 tests. > > Thanks for considering this, > Johan Thanks Tobias, I took your changes and applied them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14391#issuecomment-1591097852 From jsjolen at openjdk.org Wed Jun 14 12:34:51 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 14 Jun 2023 12:34:51 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage [v2] In-Reply-To: References: Message-ID: > Hi, > > Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. > > Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. > > I'm currently running tier1-3 tests. > > Thanks for considering this, > Johan Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/matcher.cpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/compile.hpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14391/files - new: https://git.openjdk.org/jdk/pull/14391/files/565b3df4..a2f6aee4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14391&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14391&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14391.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14391/head:pull/14391 PR: https://git.openjdk.org/jdk/pull/14391 From kvn at openjdk.org Wed Jun 14 16:40:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 16:40:00 GMT Subject: RFR: 8309978: [x64] Fix useless padding [v2] In-Reply-To: References: Message-ID: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address Christian comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14461/files - new: https://git.openjdk.org/jdk/pull/14461/files/5c711eff..a70193c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14461&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14461&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14461/head:pull/14461 PR: https://git.openjdk.org/jdk/pull/14461 From kvn at openjdk.org Wed Jun 14 16:40:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 16:40:03 GMT Subject: RFR: 8309978: [x64] Fix useless padding In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 04:42:22 GMT, Vladimir Kozlov wrote: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. Thank you, Alexey and Tobias, for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14461#issuecomment-1591615446 From kvn at openjdk.org Wed Jun 14 16:40:04 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 16:40:04 GMT Subject: RFR: 8309978: [x64] Fix useless padding [v2] In-Reply-To: References: Message-ID: <8PlPMOxeNDunZ95bu8GcEf4bPbitULSj7DmhWO1p0r0=.6f73e9da-6c4a-46eb-9837-1c971ff915d8@github.com> On Wed, 14 Jun 2023 07:03:14 GMT, Christian Hagedorn wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address Christian comments > > test/hotspot/jtreg/compiler/c2/irTests/TestPadding.java line 50: > >> 48: test(i); >> 49: tpf.b1++; // to take both branches in test() >> 50: } > > `test_runner()` will be invoked 2000 times (default warm-up) before the explicit compilation request of `test()` by the IR framework. So, this loop will run 2000 * 11000 times. Do you need that many iterations or can the loop be removed such that we only have 2000 warm-up iterations? I.e. something like: > > @Run(test = "test") > public static void test_runner() { > tpf = new TestPadding(); > test(42); > tpf.b1++; // to take both branches in test() > > } > > If you need more iterations, you could still specify `@Warmup(12345)` at `test_runner()` to get more profiling in before compilation of `test()`. Thank you, Christian. I updated the test as you suggested and verified that generated code for `test()` stay the same with 2000 default iterations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14461#discussion_r1229895753 From kvn at openjdk.org Wed Jun 14 17:16:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 17:16:57 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > This changeset skips the verification at the end of `SubTypeNode::Ideal()` if the bottom type of `obj_or_subklass` is TOP, to avoid violating the contract of `LoadKlassNode::make()`. This can happen for example in transient scenarios where `obj_or_subklass` is a projection of the TOP node, see the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8303513) for more details. The proposed fix has low risk, since it affects debug-only code. > > ##### Testing: > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode) > > - original RunThese8M test (using `-XX:+UseZGC -XX:+ZGenerational` on linux-x64, 20 repetitions) > > Deriving a minimal regression test is ongoing work, but might take some time due to the complex nature of the failure (see analysis on JBS). To reduce noise in test pipelines and ease work on other open RunThese8M issues such as [JDK-8308048](https://bugs.openjdk.org/browse/JDK-8308048), I propose to integrate this fix first and contribute the minimal regression test later as a follow-up enhancement. Looks good. I agree we separate RFE for test. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14463#pullrequestreview-1479952827 From kvn at openjdk.org Wed Jun 14 17:29:12 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 17:29:12 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 12:34:51 GMT, Johan Sj?len wrote: >> Hi, >> >> Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. >> >> Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. >> >> I'm currently running tier1-3 tests. >> >> Thanks for considering this, >> Johan > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/matcher.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/compile.hpp > > Co-authored-by: Tobias Hartmann Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14391#pullrequestreview-1479971732 From cslucas at openjdk.org Wed Jun 14 19:29:45 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 14 Jun 2023 19:29:45 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge branch 'openjdk:master' into rematerialization-of-merges - Rome minor refactorings. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges Catching up with master. - Address PR review 6: debug format output & some refactoring. - Catching up with master branch. Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address PR review 6: refactoring around rematerialization & improve test cases. - Address PR review 5: refactor on rematerialization & add tests. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=17 Stats: 2732 lines in 26 files changed: 2484 ins; 108 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From kvn at openjdk.org Wed Jun 14 20:27:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 Jun 2023 20:27:00 GMT Subject: RFR: 8309978: [x64] Fix useless padding [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 16:40:00 GMT, Vladimir Kozlov wrote: >> Fixed typo in `IntelJccErratum::compute_padding()`. >> >> Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. >> >> >> 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 >> nop # 16 bytes pad for loops and calls >> 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 >> 049 jle,s B4 P=0.667944 C=6785.000000 >> >> Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). >> >> For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. >> >> Added new IR test. Tested tier1-3, xcomp, stress. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address Christian comments `StressStackOverflow.java` test failure in GHA on 32-bit x86 I saw in other new PRs too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14461#issuecomment-1591926267 From vlivanov at openjdk.org Wed Jun 14 20:54:16 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 14 Jun 2023 20:54:16 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Wed, 14 Jun 2023 19:29:45 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe Overall, the testing went well. (It discovered some minor issues I commented about.) I'm rerunning some benchmarks which reported suspicious results. Will keep you posted. src/hotspot/share/opto/c2_globals.hpp line 473: > 471: " register allocation.") \ > 472: \ > 473: product(bool, ReduceAllocationMerges, true, \ I suggest to turn the flag into diagnostic one. There are much stricter requirements for product flags, so better to avoid introducing new ones. src/hotspot/share/opto/c2_globals.hpp line 476: > 474: "Try to simplify allocation merges before Scalar Replacement") \ > 475: \ > 476: develop(bool, TraceReduceAllocationMerges, false, \ The flag is debug-only while you use it in non-product code (`NOT_PRODUCT` macro). It doesn't break the build, but make the code under `NOT_PRODUCT` macro useless in optimized builds. You could either get rid of `NOT_PRODUCT` usages or turn the flag into `notproduct` one. src/hotspot/share/opto/c2compiler.cpp line 150: > 148: if (C.failure_reason_is(retry_no_reduce_allocation_merges())) { > 149: assert(do_reduce_allocation_merges, "must make progress"); > 150: do_reduce_allocation_merges = false; I consider the check here as a safety net which is intended to provide graceful degradation in performance if RAM optimization misbehaves for some reason. But bailing out an optimization is better than bailing out the whole compilation. I suggest to introduce new diagnostic flag (e.g., `VerifyReduceAllocationMerges`) and add a guarantee call here which signals whenever we encounter a problematic case. I'm fine with handling that as a separate enhancement (it makes sense to dump additional diagnostic info at the place where such bail outs are triggered ). test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java line 43: > 41: public static void main(String[] args) { > 42: TestFramework.runWithFlags("-XX:+ReduceAllocationMerges", > 43: "-XX:+TraceReduceAllocationMerges", `TraceReduceAllocationMerges` and `DeoptimizeALot` are not available in product binaries, so the test fails there. You need to either limit the test to debug builds only or add `-XX:+IgnoreUnrecognizedVMOptions`. If `ReduceAllocationMerges` is turned into diagnostic flag, you need to specify `-XX:+UnlockDiagnosticVMOptions`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1591966055 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230121828 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230145481 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230152798 PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1230119032 From sviswanathan at openjdk.org Wed Jun 14 23:55:58 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 14 Jun 2023 23:55:58 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 07:30:09 GMT, Emanuel Peter wrote: > vconvHF2F vconvHF2F and the reverse vconvF2HF are correct. The vec() operand class is defined in x86.ad. It is a generic vector, replaced by vecS, vecD, vecX, vecY, vecZ depending on the vector length. The operand class vecX e.g., allocates in register class vectorx_reg_vlbwdq. The vectorx_reg_vlbwdq is a dynamic register class which either gets the whole range from xmm0-xmm31 or the limited range xmm0-xmm15 based on whether AVX512vlbwdq feature availability. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1592136564 From qamai at openjdk.org Thu Jun 15 02:34:00 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 Jun 2023 02:34:00 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 09:23:04 GMT, Emanuel Peter wrote: >> Context: `Float.floatToFloat16` -> `vcvtps2ph`. >> >> **Problem** >> >> vcvtps2ph >> pre=Assembler::VEX_SIMD_66 >> opc=Assembler::VEX_OPCODE_0F_3A >> VEX.128.66.0F3A >> requires F16C >> >> https://www.felixcloutier.com/x86/vcvtps2ph >> >> So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. >> >> There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. >> >> So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. >> >> **Suggested Solution** >> As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. >> >> **Testing** >> I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). >> >> Running: tier1-6 + stress testing. > > @merykitty @sviswa7 @fg1417 Is there a way to stress-test the registers? It seems this bug only triggered because we had a moderately large unrolling factor, and then did not vectorize, leaving lots of instructions with probably a higher register pressure. Would be nice to have some sort of testing where we generate more (all?) of the possible register combinations. What do you think? @eme64 Yes that was my mistake, that node requires AVX512VL so `vlRegF` and `regF` are the same. > Is there a way to stress-test the registers? Can we randomise the allocated register during register allocation? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1592248071 From qamai at openjdk.org Thu Jun 15 02:36:00 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 Jun 2023 02:36:00 GMT Subject: RFR: 8308444: LoadStoreNode::result_not_used() is too conservative [v3] In-Reply-To: References: Message-ID: <85hYnf9ABQXXY71VqKzoppXrHDDglrpahVhr8NKwDqw=.f3a4733b-4600-4979-b3d1-5ece6c6d7e3c@github.com> On Sat, 10 Jun 2023 01:28:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > wrong operand Thanks very much for the reviews and testing. I will integrate the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14061#issuecomment-1592249854 From thartmann at openjdk.org Thu Jun 15 05:03:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 Jun 2023 05:03:58 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 12:34:51 GMT, Johan Sj?len wrote: >> Hi, >> >> Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. >> >> Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. >> >> I'm currently running tier1-3 tests. >> >> Thanks for considering this, >> Johan > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/matcher.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/compile.hpp > > Co-authored-by: Tobias Hartmann Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14391#pullrequestreview-1480680863 From thartmann at openjdk.org Thu Jun 15 05:15:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 Jun 2023 05:15:15 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v3] In-Reply-To: <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> Message-ID: <_6IqIlURP3_wInalIQU-I-PVYiNhoWE_Jf4l5o5fF6Q=.4495b547-9d50-4e28-b46c-8e1a80435c5a@github.com> On Tue, 13 Jun 2023 14:29:42 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Impressive work, Roland! I did not review the code in detail yet but here are some failures from preliminary testing: `applications/ctw/modules/java_base_2.java` and some other CTW tests fails on AArch64: # Internal Error (/open/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:548), pid=60565, tid=30979 # assert(offset_ok_for_immed(offset(), size)) failed: must be, was: 32776, 3 Current CompileTask: C1: 330461 92946 b 3 sun.security.tools.keytool.Resources:: (5678 bytes) Stack: [0x000000016cc14000,0x000000016ce17000], sp=0x000000016ce13ed0, free space=2047k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1385054] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x4bc (assembler_aarch64.hpp:548) V [libjvm.dylib+0x13859f0] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, char*)+0x40 V [libjvm.dylib+0x6a9518] report_vm_error(char const*, int, char const*, char const*, ...)+0x6c V [libjvm.dylib+0x112b08] Address::encode(Instruction_aarch64*) const+0x230 V [libjvm.dylib+0x1127d4] Assembler::ld_st2(Register, Address const&, int, int, int)+0x258 V [libjvm.dylib+0x42634c] LIR_Assembler::emit_opTypeCheck(LIR_OpTypeCheck*)+0x6f0 V [libjvm.dylib+0x412f44] LIR_OpTypeCheck::emit_code(LIR_Assembler*)+0x20 V [libjvm.dylib+0x41b7fc] LIR_Assembler::emit_lir_list(LIR_List*)+0x1f4 V [libjvm.dylib+0x41bae8] LIR_Assembler::emit_block(BlockBegin*)+0x154 V [libjvm.dylib+0x41b914] LIR_Assembler::emit_code(BlockList*)+0xa4 V [libjvm.dylib+0x3d0708] Compilation::emit_code_body()+0x150 V [libjvm.dylib+0x3d1234] Compilation::compile_java_method()+0x39c V [libjvm.dylib+0x3d15a4] Compilation::compile_method()+0x128 V [libjvm.dylib+0x3d1b18] Compilation::Compilation(AbstractCompiler*, ciEnv*, ciMethod*, int, BufferBlob*, bool, DirectiveSet*)+0x1e8 V [libjvm.dylib+0x3d45d0] Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x254 V [libjvm.dylib+0x620750] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa0c V [libjvm.dylib+0x61fa84] CompileBroker::compiler_thread_loop()+0x3c8 V [libjvm.dylib+0xa26908] JavaThread::thread_main_inner()+0x334 V [libjvm.dylib+0x12b0764] Thread::call_run()+0x134 V [libjvm.dylib+0x102317c] thread_native_entry(Thread*)+0x160 C [libsystem_pthread.dylib+0x706c] _pthread_start+0x94 `compiler/jvmci/compilerToVM/ReprofileTest.java` fails with: java.lang.AssertionError: 56 != 48 at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotMethodDataAccessor.getSize(HotSpotMethodDataAccessor.java:82) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotProfilingInfo.findBCI(HotSpotProfilingInfo.java:168) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotProfilingInfo.getExecutionCount(HotSpotProfilingInfo.java:138) at jdk.internal.vm.ci/jdk.vm.ci.meta.ProfilingInfo.toString(ProfilingInfo.java:149) at jdk.internal.vm.ci/jdk.vm.ci.hotspot.HotSpotProfilingInfo.toString(HotSpotProfilingInfo.java:202) at compiler.jvmci.compilerToVM.ReprofileTest.runSanityTest(ReprofileTest.java:101) at java.base/java.util.ArrayList.forEach(ArrayList.java:1593) at compiler.jvmci.compilerToVM.ReprofileTest.main(ReprofileTest.java:64) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1583) `serviceability/sa/ClhsdbCDSCore.java` fails with: # SIGSEGV (0xb) at pc=0x0000000103b40894, pid=6427, tid=9987 # # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-06-14-1110599.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-06-14-1110599.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # Problematic frame: # V [libjvm.dylib+0x1308894] Unsafe_PutInt(JNIEnv_*, _jobject*, _jobject*, long, int)+0x170 Stack: [0x000000016f7f8000,0x000000016f9fb000], sp=0x000000016f9fa8c0, free space=2058k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1308894] Unsafe_PutInt(JNIEnv_*, _jobject*, _jobject*, long, int)+0x170 j jdk.internal.misc.Unsafe.putInt(Ljava/lang/Object;JI)V+0 java.base at 22-internal j jdk.internal.misc.Unsafe.putInt(JI)V+4 java.base at 22-internal j CrashApp.main([Ljava/lang/String;)V+5 v ~StubRoutines::call_stub 0x0000000113e5417c V [libjvm.dylib+0x9f275c] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x648 V [libjvm.dylib+0xae32f8] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*)+0x25c V [libjvm.dylib+0xae9c7c] jni_CallStaticVoidMethod+0x248 C [libjli.dylib+0xb2bc] JavaMain+0xd60 C [libjli.dylib+0xd4c4] ThreadJavaMain+0xc C [libsystem_pthread.dylib+0x726c] _pthread_start+0x94 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j jdk.internal.misc.Unsafe.putInt(Ljava/lang/Object;JI)V+0 java.base at 22-internal j jdk.internal.misc.Unsafe.putInt(JI)V+4 java.base at 22-internal j CrashApp.main([Ljava/lang/String;)V+5 v ~StubRoutines::call_stub 0x0000000113e5417c `serviceability/sa/TestPrintMdo.java` fails with: stderr: [Exception in thread "main" java.lang.InternalError: 72 144 0 at jdk.hotspot.agent/sun.jvm.hotspot.oops.MethodData.dataAt(MethodData.java:282) at jdk.hotspot.agent/sun.jvm.hotspot.oops.MethodData.nextData(MethodData.java:318) at jdk.hotspot.agent/sun.jvm.hotspot.oops.MethodData.printDataOn(MethodData.java:359) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$19$1.visit(CommandProcessor.java:938) at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderData.classesDo(ClassLoaderData.java:107) at jdk.hotspot.agent/sun.jvm.hotspot.classfile.ClassLoaderDataGraph.classesDo(ClassLoaderDataGraph.java:84) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor$19.doit(CommandProcessor.java:926) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2212) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.executeCommand(CommandProcessor.java:2182) at jdk.hotspot.agent/sun.jvm.hotspot.CommandProcessor.run(CommandProcessor.java:2053) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.run(CLHSDB.java:112) at jdk.hotspot.agent/sun.jvm.hotspot.CLHSDB.main(CLHSDB.java:44) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runCLHSDB(SALauncher.java:281) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) `testlibrary_tests/ir_framework/tests/TestIRMatching.java` fails with: 1) Method "public boolean ir_framework.tests.CheckCastArray.array()" - [Failed IR rules: 2]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#CHECKCAST_ARRAY#_"}, applyIfOr={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 1: "(((?i:cmp|CLFI|CLR).*precise \[.*:|.*(?i:mov|or).*precise \[.*:.*\\R.*(cmp|CMP|CLR)))" - Matched forbidden node: * cmpl R10, narrowklass: precise [java/lang/Object: 0x00007fcab601f3e8 * (java/lang/Cloneable,java/io/Serializable): :Constant: * @IR rule 3: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#C#CHECKCAST_ARRAY_OF#_", "MyClasss", "_#C#CHECKCAST_ARRAY_OF#_", "Object"}, applyIfOr={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 2: "(((?i:cmp|CLFI|CLR).*precise \[.*Object:|.*(?i:mov|or).*precise \[.*Object:.*\\R.*(cmp|CMP|CLR)))" - Matched forbidden node: * cmpl R10, narrowklass: precise [java/lang/Object: 2) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, applyIfAnd={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfOr={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" - Matched forbidden node: * 1a5 call_leaf_nofp,runtime checkcast_arraycopy ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1592372554 From jwaters at openjdk.org Thu Jun 15 06:04:59 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 15 Jun 2023 06:04:59 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Bumping ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1592418574 From chagedorn at openjdk.org Thu Jun 15 06:11:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 Jun 2023 06:11:59 GMT Subject: RFR: 8309978: [x64] Fix useless padding [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 16:40:00 GMT, Vladimir Kozlov wrote: >> Fixed typo in `IntelJccErratum::compute_padding()`. >> >> Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. >> >> >> 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 >> nop # 16 bytes pad for loops and calls >> 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 >> 049 jle,s B4 P=0.667944 C=6785.000000 >> >> Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). >> >> For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. >> >> Added new IR test. Tested tier1-3, xcomp, stress. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address Christian comments Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14461#pullrequestreview-1480755379 From chagedorn at openjdk.org Thu Jun 15 06:12:01 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 Jun 2023 06:12:01 GMT Subject: RFR: 8309978: [x64] Fix useless padding [v2] In-Reply-To: <8PlPMOxeNDunZ95bu8GcEf4bPbitULSj7DmhWO1p0r0=.6f73e9da-6c4a-46eb-9837-1c971ff915d8@github.com> References: <8PlPMOxeNDunZ95bu8GcEf4bPbitULSj7DmhWO1p0r0=.6f73e9da-6c4a-46eb-9837-1c971ff915d8@github.com> Message-ID: On Wed, 14 Jun 2023 16:34:54 GMT, Vladimir Kozlov wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestPadding.java line 50: >> >>> 48: test(i); >>> 49: tpf.b1++; // to take both branches in test() >>> 50: } >> >> `test_runner()` will be invoked 2000 times (default warm-up) before the explicit compilation request of `test()` by the IR framework. So, this loop will run 2000 * 11000 times. Do you need that many iterations or can the loop be removed such that we only have 2000 warm-up iterations? I.e. something like: >> >> @Run(test = "test") >> public static void test_runner() { >> tpf = new TestPadding(); >> test(42); >> tpf.b1++; // to take both branches in test() >> >> } >> >> If you need more iterations, you could still specify `@Warmup(12345)` at `test_runner()` to get more profiling in before compilation of `test()`. > > Thank you, Christian. I updated the test as you suggested and verified that generated code for `test()` stay the same with 2000 default iterations. That looks good to me, thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14461#discussion_r1230479457 From vkempik at openjdk.org Thu Jun 15 06:25:08 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 15 Jun 2023 06:25:08 GMT Subject: RFR: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads [v7] In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 12:57:38 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V >> >> Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. >> The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) >> The other two produced about 100 events combined. >> Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. >> Numbers on hifive before and after applying the patch: >> >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op >> >> >> After: >> >> Benchmark Mode Cnt Score Error Units >> StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op >> >> >> Testing: tier1/tier2 is clean on hifive. > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > fix nits tier1/2 clean, so ------------- PR Comment: https://git.openjdk.org/jdk/pull/14320#issuecomment-1592436435 From vkempik at openjdk.org Thu Jun 15 06:25:11 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 15 Jun 2023 06:25:11 GMT Subject: Integrated: 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads In-Reply-To: References: Message-ID: On Mon, 5 Jun 2023 20:52:01 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads in String.indexOf intrinsic on RISC-V > > Initialy found these misaligned loads when profiling finagle-http test from renaissance suite. > The majority of trp_lam events (about 66k per finagle-http round) came at line 706 (https://github.com/openjdk/jdk/pull/14320/files#diff-35eb1d2f1e2f0514dd46bd7fbad49ff2c87703d5a3041a6433956df00a3fe6e6L706) > The other two produced about 100 events combined. > Later I've found this can partially be reproduced with StringIndexOf.advancedWithMediumSub. > Numbers on hifive before and after applying the patch: > > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 47031.406 ? 144.005 ns/op > > > After: > > Benchmark Mode Cnt Score Error Units > StringIndexOf.advancedWithMediumSub avgt 25 4256.830 ? 23.075 ns/op > > > Testing: tier1/tier2 is clean on hifive. This pull request has now been integrated. Changeset: 6b942893 Author: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/6b942893868fa1a64977288bdbdb1bbff8bd9d9c Stats: 84 lines in 3 files changed: 67 ins; 5 del; 12 mod 8309502: RISC-V: String.indexOf intrinsic may produce misaligned memory loads Reviewed-by: luhenry, fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14320 From rcastanedalo at openjdk.org Thu Jun 15 06:38:59 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jun 2023 06:38:59 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 17:13:54 GMT, Vladimir Kozlov wrote: > Looks good. I agree we separate RFE for test. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14463#issuecomment-1592449284 From jsjolen at openjdk.org Thu Jun 15 08:40:10 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 15 Jun 2023 08:40:10 GMT Subject: RFR: 8309717: C2: Remove Arena::move_contents usage [v2] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 12:34:51 GMT, Johan Sj?len wrote: >> Hi, >> >> Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. >> >> Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. >> >> I'm currently running tier1-3 tests. >> >> Thanks for considering this, >> Johan > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/matcher.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/share/opto/compile.hpp > > Co-authored-by: Tobias Hartmann Thank you for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14391#issuecomment-1592610015 From jsjolen at openjdk.org Thu Jun 15 08:40:12 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 15 Jun 2023 08:40:12 GMT Subject: Integrated: 8309717: C2: Remove Arena::move_contents usage In-Reply-To: References: Message-ID: On Fri, 9 Jun 2023 10:17:46 GMT, Johan Sj?len wrote: > Hi, > > Instead of using `Arena::move_contents` we can just see the arena swap as a form of double buffering, reducing this to a pointer swap and a clear. This allows us to remove `Arena::move_contents`, cleaning up the arena code. > > Since this requires allocating another pointer for `Compile`, I took the time to move some members around in order to reduce the padding. This means that this patch does *not* introduce a size change for `Compile`. > > I'm currently running tier1-3 tests. > > Thanks for considering this, > Johan This pull request has now been integrated. Changeset: 4c0e1642 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/4c0e164238458e0e95770a855ba84bb265ff0397 Stats: 53 lines in 5 files changed: 19 ins; 22 del; 12 mod 8309717: C2: Remove Arena::move_contents usage Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14391 From rcastanedalo at openjdk.org Thu Jun 15 10:11:15 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jun 2023 10:11:15 GMT Subject: Integrated: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > This changeset skips the verification at the end of `SubTypeNode::Ideal()` if the bottom type of `obj_or_subklass` is TOP, to avoid violating the contract of `LoadKlassNode::make()`. This can happen for example in transient scenarios where `obj_or_subklass` is a projection of the TOP node, see the [JBS issue description](https://bugs.openjdk.org/browse/JDK-8303513) for more details. The proposed fix has low risk, since it affects debug-only code. > > ##### Testing: > > - tier1-3 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode) > > - original RunThese8M test (using `-XX:+UseZGC -XX:+ZGenerational` on linux-x64, 20 repetitions) > > Deriving a minimal regression test is ongoing work, but might take some time due to the complex nature of the failure (see analysis on JBS). To reduce noise in test pipelines and ease work on other open RunThese8M issues such as [JDK-8308048](https://bugs.openjdk.org/browse/JDK-8308048), I propose to integrate this fix first and contribute the minimal regression test later as a follow-up enhancement. This pull request has now been integrated. Changeset: 83d92672 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/83d92672d4c2637fc37ddd873533c85a9b083904 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14463 From rcastanedalo at openjdk.org Thu Jun 15 10:33:21 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jun 2023 10:33:21 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' Message-ID: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> This pull request contains a backport of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513). The original changes are reviewed by Tobias Hartmann (@TobiHartmann) and Vladimir Kozlov (@vnkozlov). ------------- Commit messages: - Backport 83d92672d4c2637fc37ddd873533c85a9b083904 Changes: https://git.openjdk.org/jdk21/pull/21/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=21&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303513 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk21/pull/21.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/21/head:pull/21 PR: https://git.openjdk.org/jdk21/pull/21 From duke at openjdk.org Thu Jun 15 11:20:10 2023 From: duke at openjdk.org (Daniel Skantz) Date: Thu, 15 Jun 2023 11:20:10 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs Message-ID: ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. Testing: tier1-tier3. Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) ------------- Commit messages: - latest - fix compileonly - Remove unecessary Xbatch - Tweak comment - WIP close - WIP - WIP with some checks - WIP fix with validation Changes: https://git.openjdk.org/jdk/pull/14492/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14492&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301489 Stats: 143 lines in 2 files changed: 142 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14492/head:pull/14492 PR: https://git.openjdk.org/jdk/pull/14492 From thartmann at openjdk.org Thu Jun 15 11:35:55 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 Jun 2023 11:35:55 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> References: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> Message-ID: On Thu, 15 Jun 2023 10:25:30 GMT, Roberto Casta?eda Lozano wrote: > This pull request contains a backport of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513). The original changes are reviewed by Tobias Hartmann (@TobiHartmann) and Vladimir Kozlov (@vnkozlov). Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/21#pullrequestreview-1481358682 From rcastanedalo at openjdk.org Thu Jun 15 12:10:11 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jun 2023 12:10:11 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> References: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> Message-ID: <8Mn7gE18TyxnKQ5yR_9FqzIlE66wdqKBHC5Oawy8mPs=.59a56139-a96b-46da-aa4c-c37df9b4c591@github.com> On Thu, 15 Jun 2023 10:25:30 GMT, Roberto Casta?eda Lozano wrote: > This pull request contains a backport of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513). The original changes are reviewed by Tobias Hartmann (@TobiHartmann) and Vladimir Kozlov (@vnkozlov). Thanks for reviewing this backport as well, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/21#issuecomment-1592919782 From rcastanedalo at openjdk.org Thu Jun 15 12:13:00 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 Jun 2023 12:13:00 GMT Subject: Integrated: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> References: <0a9-PNFq0GF5wHzLeDCZZBAHg4Ka7b9Miqu-SPdrHns=.a502e810-2259-40ef-9512-5d229dd466c6@github.com> Message-ID: On Thu, 15 Jun 2023 10:25:30 GMT, Roberto Casta?eda Lozano wrote: > This pull request contains a backport of [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513). The original changes are reviewed by Tobias Hartmann (@TobiHartmann) and Vladimir Kozlov (@vnkozlov). This pull request has now been integrated. Changeset: 39e98e7b Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk21/commit/39e98e7bbf278b8772350fd28d2bd8ad8cb06315 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' Reviewed-by: thartmann Backport-of: 83d92672d4c2637fc37ddd873533c85a9b083904 ------------- PR: https://git.openjdk.org/jdk21/pull/21 From dnsimon at openjdk.org Thu Jun 15 12:50:25 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 15 Jun 2023 12:50:25 GMT Subject: RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v5] In-Reply-To: References: Message-ID: <_d9zCpW3VU681oBDZEfOtFPsQuL1q645BMaGNTOwnbQ=.bbb2322b-cf0c-41d9-a5f6-775be66a02fa@github.com> On Mon, 6 Mar 2023 14:26:19 GMT, Roland Westrelin wrote: >> The loop that doesn't vectorize is: >> >> >> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) { >> for (int i = start; i < stop; i++) { >> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]); >> } >> } >> >> >> It's from a micro-benchmark in the panama >> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing >> because it finds it cannot properly align the loop and, from the >> comment in the code, that: >> >> >> // Can't allow vectorization of unaligned memory accesses with the >> // same type since it could be overlapped accesses to the same array. >> >> >> The test for "same type" is implemented by looking at the memory >> operation type which in this case is overly conservative as the loop >> above is reading and writing with long loads/stores but from and to >> arrays of different types that can't overlap. Actually, with such >> mismatched accesses, it's also likely an incorrect test (reading and >> writing could be to the same array with loads/stores that use >> different operand size) eventhough I couldn't write a test case that >> would trigger an incorrect execution. >> >> As a fix, I propose implementing the "same type" test by looking at >> memory aliases instead. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > improved test test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java line 38: > 36: * @bug 8300258 > 37: * @key randomness > 38: * @requires (os.simpleArch == "x64") | (os.simpleArch == "aarch64") Does this require `@requires compiler.c2`? What is the expectation if this (or any IR) test runs on GraalVM? cc @chhagedorn ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12440#discussion_r1230958136 From roland at openjdk.org Thu Jun 15 13:03:21 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 Jun 2023 13:03:21 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v4] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: test failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/987d8b4b..a2c8055c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=02-03 Stats: 34 lines in 4 files changed: 6 ins; 16 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From epeter at openjdk.org Thu Jun 15 13:06:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 Jun 2023 13:06:25 GMT Subject: RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v5] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 14:26:19 GMT, Roland Westrelin wrote: >> The loop that doesn't vectorize is: >> >> >> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) { >> for (int i = start; i < stop; i++) { >> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]); >> } >> } >> >> >> It's from a micro-benchmark in the panama >> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing >> because it finds it cannot properly align the loop and, from the >> comment in the code, that: >> >> >> // Can't allow vectorization of unaligned memory accesses with the >> // same type since it could be overlapped accesses to the same array. >> >> >> The test for "same type" is implemented by looking at the memory >> operation type which in this case is overly conservative as the loop >> above is reading and writing with long loads/stores but from and to >> arrays of different types that can't overlap. Actually, with such >> mismatched accesses, it's also likely an incorrect test (reading and >> writing could be to the same array with loads/stores that use >> different operand size) eventhough I couldn't write a test case that >> would trigger an incorrect execution. >> >> As a fix, I propose implementing the "same type" test by looking at >> memory aliases instead. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > improved test @dougxc I've just been discussing with @TobiHartmann @chhagedorn . The IR tests do not necessarily `@requires compiler.c2`. However, the IR-matching rules can only be executed if C2 is actually present (requires IR printing of C2 nodes), and no non-whitelisted flags are present (they would potentially change the IR graph, and hence what we match against). I don't know how you run with GraalVM, but if you were to use flags like `-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler`, then the IR test would recognize those as non-whitelisted and just run the tests, but without IR matching. It is possible that some IR tests currently do have `@requires compiler.c2`, it should be possible to remove them, you might have to experiment a bit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12440#issuecomment-1593004945 From roland at openjdk.org Thu Jun 15 13:08:36 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 Jun 2023 13:08:36 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v5] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/a2c8055c..6daa01d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From roland at openjdk.org Thu Jun 15 13:08:37 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 Jun 2023 13:08:37 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v3] In-Reply-To: <_6IqIlURP3_wInalIQU-I-PVYiNhoWE_Jf4l5o5fF6Q=.4495b547-9d50-4e28-b46c-8e1a80435c5a@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <1F4sfZNmddE4W2Y2Uc0ABPaLnJ_rl96t9h8k7A-blbc=.39aa22cd-61b5-4f90-963e-7e6840bc4362@github.com> <_6IqIlURP3_wInalIQU-I-PVYiNhoWE_Jf4l5o5fF6Q=.4495b547-9d50-4e28-b46c-8e1a80435c5a@github.com> Message-ID: On Thu, 15 Jun 2023 05:12:31 GMT, Tobias Hartmann wrote: > I did not review the code in detail yet but here are some failures from preliminary testing: Thanks for running tests. New commit should fix the issues you reported. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1593004358 From duke at openjdk.org Thu Jun 15 13:12:21 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 15 Jun 2023 13:12:21 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Message-ID: **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. ------------- Commit messages: - JDK-8309266: Cosmetic change to the if statement - JDK-8309266: 1) Added TestLoopLimitOverflowDuringCCP to the jtreg tests 2) Verify in LoopLimitNode::Value that the input nodes are ConI type nodes. Changes: https://git.openjdk.org/jdk/pull/14490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309266 Stats: 59 lines in 2 files changed: 55 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14490/head:pull/14490 PR: https://git.openjdk.org/jdk/pull/14490 From dnsimon at openjdk.org Thu Jun 15 13:38:21 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 15 Jun 2023 13:38:21 GMT Subject: RFR: 8300258: C2: vectorization fails on simple ByteBuffer loop [v5] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 14:26:19 GMT, Roland Westrelin wrote: >> The loop that doesn't vectorize is: >> >> >> public static void testByteLong4(byte[] dest, long[] src, int start, int stop) { >> for (int i = start; i < stop; i++) { >> UNSAFE.putLongUnaligned(dest, 8 * i + baseOffset, src[i]); >> } >> } >> >> >> It's from a micro-benchmark in the panama >> repo. `SuperWord::find_adjacent_refs() `prevents it from vectorizing >> because it finds it cannot properly align the loop and, from the >> comment in the code, that: >> >> >> // Can't allow vectorization of unaligned memory accesses with the >> // same type since it could be overlapped accesses to the same array. >> >> >> The test for "same type" is implemented by looking at the memory >> operation type which in this case is overly conservative as the loop >> above is reading and writing with long loads/stores but from and to >> arrays of different types that can't overlap. Actually, with such >> mismatched accesses, it's also likely an incorrect test (reading and >> writing could be to the same array with loads/stores that use >> different operand size) eventhough I couldn't write a test case that >> would trigger an incorrect execution. >> >> As a fix, I propose implementing the "same type" test by looking at >> memory aliases instead. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > improved test Ok, thanks for the explanation. As long as the tests' correctness do not depend on C2, then they will pass on GraalVM, even if they are really testing much in that case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12440#issuecomment-1593079076 From roland at openjdk.org Thu Jun 15 15:04:12 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 Jun 2023 15:04:12 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 10:43:53 GMT, Eric Nothum wrote: > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. Let's say CCP is running. Init and limit are not `ConINode`s at this point but once CCP is over, it will have discovered that they are actually constants so will make them `ConINode`s. The change you're making will have prevented CCP from propagating the constants through `LoopLimitNode` making CCP more pessimistic that it needs to be. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14490#issuecomment-1593242034 From kvn at openjdk.org Thu Jun 15 15:36:11 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 Jun 2023 15:36:11 GMT Subject: Integrated: 8309978: [x64] Fix useless padding In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 04:42:22 GMT, Vladimir Kozlov wrote: > Fixed typo in `IntelJccErratum::compute_padding()`. > > Due to the typo (`mach` instead of `next`) useless padding could be generated because size of `mach` instruction (which is `cmp` in this case and big) counted twice. As result combined size most likely cross 32-bytes cache line boundary and padding is generated to avoid that. > > > 030 B2: # out( B4 B3 ) <- in( B1 ) Freq: 0.999999 > nop # 16 bytes pad for loops and calls > 040 cmpb [R12 + R10 << 3 + #144] (compressed oop addressing), #42 > 049 jle,s B4 P=0.667944 C=6785.000000 > > Note: only some x86 CPUs are [affected](https://github.com/openjdk/jdk/blob/ba837b4bfa2dea85653d8a8fccd0817a569b4378/src/hotspot/cpu/x86/vm_version_x86.cpp#L1957). > > For new IR test to work I moved `PHASE_FINAL_CODE` IR print inside `PhaseOutput` scope because padding nodes (`NOP` mach nodes) are present only in this phase IR. > > Added new IR test. Tested tier1-3, xcomp, stress. This pull request has now been integrated. Changeset: 0038491a Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/0038491abda51b8bd39fabed53624c10abcfe077 Stats: 96 lines in 4 files changed: 93 ins; 2 del; 1 mod 8309978: [x64] Fix useless padding Reviewed-by: chagedorn, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/14461 From qamai at openjdk.org Thu Jun 15 16:03:12 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 Jun 2023 16:03:12 GMT Subject: Integrated: 8308444: LoadStoreNode::result_not_used() is too conservative In-Reply-To: References: Message-ID: On Fri, 19 May 2023 16:19:42 GMT, Quan Anh Mai wrote: > Hi, > > This patch improves the implementation of `LoadStoreNode::result_not_used()` to be less conservative and verifies that the preferable node is matched for `getAndAdd`. Please kindly review, thanks a lot. This pull request has now been integrated. Changeset: 947f1497 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/947f14977a4d1ded839712aea020eaa87c23a23f Stats: 279 lines in 6 files changed: 253 ins; 0 del; 26 mod 8308444: LoadStoreNode::result_not_used() is too conservative Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14061 From duke at openjdk.org Thu Jun 15 16:14:58 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 15 Jun 2023 16:14:58 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 10:43:53 GMT, Eric Nothum wrote: > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. yes that's an issue, I did not realize that the change could slow the CCP down this way. I guess, defaulting to bottom in case of an overflow might be the better option then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14490#issuecomment-1593357726 From prr at openjdk.org Thu Jun 15 23:26:01 2023 From: prr at openjdk.org (Phil Race) Date: Thu, 15 Jun 2023 23:26:01 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1089: > 1087: entry_point(); > 1088: colorBits = (jint*)safe_Malloc(MAX_ICON_SIZE * MAX_ICON_SIZE * sizeof(jint)); > 1089: GetDIBits(dc, iconInfo.hbmColor, 0, iconSize, colorBits, &bmi, DIB_RGB_COLORS); I just can't tell if the updates you are making as a result of the jni_md.h change are really the right ones and some of them don't look that way You wrote "As listed above, the native Windows API routines that the java.desktop code calls are actually expecting ints," Per the windows docs for GetDIBits() it takes an LPVOID for the parameter these are used for. For the cases above if takes an LPVOID which is really just a void*, and using jint in the malloc is just weird since jint doesn't mean anything to GDI And clearly this change requires running a whole load of client tests on windows [*] but I have no idea what your reply to my question about that means. What is "-permissive" ? [*] and I have no idea what VM etc tests need to be run just for the JNI change ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1231621441 From jwaters at openjdk.org Fri Jun 16 00:43:19 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 16 Jun 2023 00:43:19 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 15 Jun 2023 23:22:48 GMT, Phil Race wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the code that is actually warning > > src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1089: > >> 1087: entry_point(); >> 1088: colorBits = (jint*)safe_Malloc(MAX_ICON_SIZE * MAX_ICON_SIZE * sizeof(jint)); >> 1089: GetDIBits(dc, iconInfo.hbmColor, 0, iconSize, colorBits, &bmi, DIB_RGB_COLORS); > > I just can't tell if the updates you are making as a result of the jni_md.h change are really the right ones and some of them don't look that way > > You wrote > "As listed above, the native Windows API routines that the java.desktop code calls are actually expecting ints," > > Per the windows docs for GetDIBits() it takes an LPVOID for the parameter these are used for. > For the cases above if takes an LPVOID which is really just a void*, and using jint in the malloc is just weird since jint doesn't mean anything to GDI > > And clearly this change requires running a whole load of client tests on windows [*] > but I have no idea what your reply to my question about that means. What is "-permissive" ? > > [*] and I have no idea what VM etc tests need to be run just for the JNI change -permissive- is a compiler switch that forces the Microsoft Visual C compiler to be stricter in compiling C and C++ and makes it enforce the standard much more aggressively. It's becoming less permissive with every iteration of the Microsoft compiler and is stated to become enabled by default eventually by Microsoft. One of the consequences of this is that in the future our code cannot so loosely treat int and long as the same type on Windows (even though they are ultimately the same size in compiled code), as far as the compiler is concerned, they are semantically 2 entirely different types. That's the complication this Pull Request is trying to preempt by changing the jni.h typedef for Windows I missed the LPVOID change in GetDIBits, but for the other changes I really only followed the existing declarations, the jint for colorBits is because the call to SetIntArrayRegion takes a jint as a parameter. Let me know if I should change the declaration regardless, though Also, we have David for the VM reviews ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1231658403 From fgao at openjdk.org Fri Jun 16 03:36:13 2023 From: fgao at openjdk.org (Fei Gao) Date: Fri, 16 Jun 2023 03:36:13 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 11:27:35 GMT, Emanuel Peter wrote: > aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases. Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results: VectorAlignment.VectorAlignmentSuperWord.bench000B_control 2048 0 avgt 152.831 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000C_control 2048 0 avgt 285.819 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000D_control 2048 0 avgt 749.996 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000F_control 2048 0 avgt 396.433 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000I_control 2048 0 avgt 560.767 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000L_control 2048 0 avgt 1131.909 ns/op VectorAlignment.VectorAlignmentSuperWord.bench000S_control 2048 0 avgt 285.215 ns/op VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 562.436 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load 2048 0 avgt 152.459 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load 2048 0 avgt 290.888 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load 2048 0 avgt 754.443 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load 2048 0 avgt 386.633 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load 2048 0 avgt 560.587 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load 2048 0 avgt 1134.492 ns/op VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load 2048 0 avgt 284.768 ns/op I believe that the perf gap between the vectorized misaligned cases and the vectorized aligned cases may become smaller and sometimes prospectively can be removed on newer `aarch64` machines. Also, I strongly agree on your conclusion: it is clearly profitable to vectorize these misaligned cases. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1594036940 From roland at openjdk.org Fri Jun 16 07:15:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 Jun 2023 07:15:55 GMT Subject: RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 Message-ID: In JDK-8300257, I removed by mistake some logic that's required to compute the loop alignment. This change puts it back. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/14508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14508&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308855 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14508/head:pull/14508 PR: https://git.openjdk.org/jdk/pull/14508 From gcao at openjdk.org Fri Jun 16 09:34:03 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 16 Jun 2023 09:34:03 GMT Subject: RFR: 8310192: RISC-V: Merge vector min & max instructs with similar match rules Message-ID: Hi, We merged vector min and max instructions with similar matching rules in this PR, and modified some comments of the copy_memory function in stubGenerator_riscv.cpp. Please take a look and have some reviews. Thanks a lot. ## Testing: - [x] Tier1 tests (release) - [x] Tier2 tests (release) - [x] Tier3 tests (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - RISC-V: Merge vector min & max instructs with similar match rules Changes: https://git.openjdk.org/jdk/pull/14510/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14510&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310192 Stats: 134 lines in 4 files changed: 12 ins; 97 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/14510.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14510/head:pull/14510 PR: https://git.openjdk.org/jdk/pull/14510 From thartmann at openjdk.org Fri Jun 16 09:37:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 Jun 2023 09:37:39 GMT Subject: RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic Message-ID: We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. Thanks, Tobias ------------- Commit messages: - 8310126: C1: Missing receiver null check in Reference::get intrinsic Changes: https://git.openjdk.org/jdk/pull/14511/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14511&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310126 Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14511.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14511/head:pull/14511 PR: https://git.openjdk.org/jdk/pull/14511 From roland at openjdk.org Fri Jun 16 09:44:57 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 Jun 2023 09:44:57 GMT Subject: RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:28:38 GMT, Tobias Hartmann wrote: > We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. > > Thanks, > Tobias Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14511#pullrequestreview-1483109688 From thartmann at openjdk.org Fri Jun 16 09:56:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 Jun 2023 09:56:01 GMT Subject: RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:28:38 GMT, Tobias Hartmann wrote: > We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. > > Thanks, > Tobias Thanks, Roland! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14511#issuecomment-1594423318 From duke at openjdk.org Fri Jun 16 10:01:12 2023 From: duke at openjdk.org (Eric Nothum) Date: Fri, 16 Jun 2023 10:01:12 GMT Subject: Integrated: 8293069: Make -XX:+Verbose less verbose In-Reply-To: References: Message-ID: On Mon, 12 Jun 2023 15:17:35 GMT, Eric Nothum wrote: > 1) Added PrintOpto guard to noisy Verbose prints in Compile::process_for_unstable_if_traps and Parse::catch_call_exceptions. > > 2) Removed noisy Verbose in ciEnv::record_best_dyno_loc, which looked like a leftover from implementation of [JDK-8271911](https://bugs.openjdk.org/browse/JDK-8271911). > I also rearranged the if statement around Verbose and removed the TODO comment. > @dean-long is the TODO still relevant? In case it is still relevant, I think we should create a separate enhancement for further investigation rather than keeping it in the comments. This pull request has now been integrated. Changeset: 238c51e6 Author: Eric Nothum Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/238c51e621d17a24d35085a54c129ce99ad6d0d8 Stats: 9 lines in 3 files changed: 0 ins; 6 del; 3 mod 8293069: Make -XX:+Verbose less verbose Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14420 From shade at openjdk.org Fri Jun 16 10:13:00 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 16 Jun 2023 10:13:00 GMT Subject: RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:28:38 GMT, Tobias Hartmann wrote: > We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. > > Thanks, > Tobias Looks fine. Seems to be the only place where it is missing. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14511#pullrequestreview-1483153243 From thartmann at openjdk.org Fri Jun 16 10:26:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 Jun 2023 10:26:02 GMT Subject: RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:28:38 GMT, Tobias Hartmann wrote: > We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. > > Thanks, > Tobias Thanks, Aleksey! Yes, I also inspected the other places manually and couldn't find more issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14511#issuecomment-1594459962 From thartmann at openjdk.org Fri Jun 16 10:35:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 Jun 2023 10:35:10 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 Message-ID: The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. Thanks, Tobias ------------- Commit messages: - 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 Changes: https://git.openjdk.org/jdk/pull/14514/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14514&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310143 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14514/head:pull/14514 PR: https://git.openjdk.org/jdk/pull/14514 From luhenry at openjdk.org Fri Jun 16 10:48:00 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 16 Jun 2023 10:48:00 GMT Subject: RFR: 8310192: RISC-V: Merge vector min & max instructs with similar match rules In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:27:39 GMT, Gui Cao wrote: > Hi, We merged vector min and max instructions with similar matching rules in this PR, and modified some comments of the copy_memory function in stubGenerator_riscv.cpp. > We can use Float256VectorTests.java Double256VectorTests.java to emit vmax_fp/vmin_fp nodes and the compilation log is as follows: > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmax_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmin_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Marked as reviewed by luhenry (Committer). src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 860: > 858: // of copy. If is_aligned is false, we align the source address. > 859: // > 860: /* Why the removal of that comment? ------------- PR Review: https://git.openjdk.org/jdk/pull/14510#pullrequestreview-1483200976 PR Review Comment: https://git.openjdk.org/jdk/pull/14510#discussion_r1232088111 From gcao at openjdk.org Fri Jun 16 12:13:01 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 16 Jun 2023 12:13:01 GMT Subject: RFR: 8310192: RISC-V: Merge vector min & max instructs with similar match rules In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:44:59 GMT, Ludovic Henry wrote: > Why the removal of that comment? Thanks for the review, this comment does not match the actual assembly logic, where the align logic of source and dst address is judged first, not only align the source address. https://github.com/openjdk/jdk/blob/b412fc79c3c2548df10918090beedaf6b2d08d96/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L987-L993 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14510#discussion_r1232166847 From coleenp at openjdk.org Fri Jun 16 12:41:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 16 Jun 2023 12:41:09 GMT Subject: RFR: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code Message-ID: This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. ------------- Commit messages: - 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code Changes: https://git.openjdk.org/jdk/pull/14505/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14505&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310027 Stats: 50 lines in 8 files changed: 0 ins; 0 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/14505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14505/head:pull/14505 PR: https://git.openjdk.org/jdk/pull/14505 From kvn at openjdk.org Fri Jun 16 14:23:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Jun 2023 14:23:57 GMT Subject: RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: <9QqExTa_9OK8IhemmDdNYm9RgASvR-ZEvCe_M8DWTGE=.adf951cc-12d1-4e47-a4b1-5cdedc9ba86c@github.com> On Fri, 16 Jun 2023 07:09:18 GMT, Roland Westrelin wrote: > In JDK-8300257, I removed by mistake some logic that's required to > compute the loop alignment. This change puts it back. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14508#pullrequestreview-1483623749 From kvn at openjdk.org Fri Jun 16 14:29:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Jun 2023 14:29:03 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:28:12 GMT, Tobias Hartmann wrote: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. > > Thanks, > Tobias Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14514#pullrequestreview-1483633778 From kvn at openjdk.org Fri Jun 16 14:31:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 Jun 2023 14:31:03 GMT Subject: RFR: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code In-Reply-To: References: Message-ID: <0tz_iKvd2zVu_UsgVVht5gBzlrutB-VsJZY3UAWGFW0=.80f07b1c-17b5-4834-a5c0-9f7bbfff6ffc@github.com> On Thu, 15 Jun 2023 22:49:36 GMT, Coleen Phillimore wrote: > This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. > Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14505#pullrequestreview-1483641236 From rcastanedalo at openjdk.org Fri Jun 16 14:58:11 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Jun 2023 14:58:11 GMT Subject: RFR: 8303513: C2: LoadKlassNode::make fails with 'expecting TypeKlassPtr' In-Reply-To: References: Message-ID: <9p-5hZ8GcErLvggI1jBUMpON6KE8ncNpei3FHRAzAKg=.8ee23a84-da02-4f75-b035-288c7ad428f1@github.com> On Wed, 14 Jun 2023 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > To reduce noise in test pipelines and ease work on other open RunThese8M issues such as [JDK-8308048](https://bugs.openjdk.org/browse/JDK-8308048), I propose to integrate this fix first and contribute the minimal regression test later as a follow-up enhancement. Reported in [JDK-8310219](https://bugs.openjdk.org/browse/JDK-8310219). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14463#issuecomment-1594827282 From ecaspole at openjdk.org Fri Jun 16 15:01:53 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Fri, 16 Jun 2023 15:01:53 GMT Subject: RFR: 8309976: A JMH to create a lot of classes and compiled methods Message-ID: Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. The defaults are set very low by default and the intent is that they would be customized for any given study. ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8309976 - 8309976: A JMH that can create a lot of classes and compiled methods Changes: https://git.openjdk.org/jdk/pull/14521/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309976 Stats: 447 lines in 1 file changed: 447 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14521/head:pull/14521 PR: https://git.openjdk.org/jdk/pull/14521 From rcastanedalo at openjdk.org Fri Jun 16 15:24:01 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 Jun 2023 15:24:01 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:28:12 GMT, Tobias Hartmann wrote: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. > > Thanks, > Tobias test/hotspot/jtreg/compiler/compilercontrol/share/MultiCommand.java line 83: > 81: } else { > 82: md = AbstractTestBase.METHOD_GEN.generateRandomDescriptor(exec); > 83: isValid = false; Is there a (remote) chance that `generateRandomDescriptor()` generates a valid descriptor? Would the compiler control test fail in that case due to a "false negative" ("expected to fail but did not fail")? Maybe a clarifying comment here would help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14514#discussion_r1232393843 From roland at openjdk.org Fri Jun 16 16:15:18 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 Jun 2023 16:15:18 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop Message-ID: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> The test contains a loop nest with 2 loops. The outer loop is an irreducible loop. The safepoint for that loop is also in the inner loop. Because `IdealLoopTree::check_safepts()` ignores irreducible loops, that safepoint is not marked as being required and is eliminated from the inner loop. The inner loop is then optimized out and the outer loop becomes an infinite loop with no safepoint (a single node loop). That, in turn, causes the loop to be eliminated because it has not use and the assert fires. The fix I propose is to make `IdealLoopTree::check_safepts()` work with irreducible loops. I think `IdealLoopTree::allpaths_check_safepts()` can be used for that. When working on this I wondered if that method could be called with a loop whose head has more than 3 inputs. I couldn't write a test case with an irreducible loop whose head had more than 3 inputs but I added an assert in the method and ran some testing. That assert fired so I also propose to tweak the method so it's robust in that case. ------------- Commit messages: - test Changes: https://git.openjdk.org/jdk/pull/14522/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14522&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307927 Stats: 188 lines in 3 files changed: 142 ins; 3 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/14522.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14522/head:pull/14522 PR: https://git.openjdk.org/jdk/pull/14522 From redestad at openjdk.org Fri Jun 16 17:20:28 2023 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 16 Jun 2023 17:20:28 GMT Subject: RFR: 8309976: A JMH to create a lot of classes and compiled methods In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 14:54:46 GMT, Eric Caspole wrote: > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. Consider changing the bug summary to "Add microbenchmark for stressing code cache". ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14521#pullrequestreview-1483990672 From shade at openjdk.org Fri Jun 16 17:31:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 16 Jun 2023 17:31:03 GMT Subject: RFR: 8309976: A JMH to create a lot of classes and compiled methods In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 14:54:46 GMT, Eric Caspole wrote: > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. I think the benchmark code needs massaging for style and other issues. See e.g. the cursory review: test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 2: > 1: /* > 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. The format looks odd, missing comma? test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 76: > 74: public int recurse; > 75: > 76: // How many instances of each generated class to create and call in the measured phase Suggestion: // How many instances of each generated class to create and call in the measurement phase test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 87: > 85: Map table1 = new HashMap<>(); > 86: ArrayList> mapList = new ArrayList<>(); > 87: Map instList = new HashMap<>(); All these have too generic names, not clear what they hold. test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 188: > 186: + " " > 187: + " " > 188: + " public Integer get( Map m, String k, Integer depth) { " Suggestion: + " public Integer get(Map m, String k, Integer depth) { " ...and later... test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 438: > 436: Integer result = callTheMethod(m, r, k, map); > 437: assert result != null && result >= v; > 438: sum += result; This performs boxed addition. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14521#pullrequestreview-1483780194 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1232411469 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1232411699 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1232412769 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1232536237 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1232536502 From never at openjdk.org Fri Jun 16 17:40:20 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 16 Jun 2023 17:40:20 GMT Subject: RFR: 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 08:24:23 GMT, Gerg? Barany wrote: > `jdk.vm.ci.amd64.AMD64#getLargestStorableKind(RegisterCategory)` unconditionally returns `AMD64Kind.MASK64` for mask registers. This is only correct if the target supports AVX512BW. On other AVX512 versions this should be `MASK16`. > > The Graal compiler uses this method to determine how to spill a given register. An incorrect size will lead to compilation errors due to trying to emit a move with a size that is not supported by the target. I have manually verified that this fixes those problems. Looks good ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14441#pullrequestreview-1484020268 From coleenp at openjdk.org Fri Jun 16 18:45:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 16 Jun 2023 18:45:09 GMT Subject: RFR: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 22:49:36 GMT, Coleen Phillimore wrote: > This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. > Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. Thanks Vladimir. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14505#issuecomment-1595132102 From vlivanov at openjdk.org Sat Jun 17 00:44:23 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 17 Jun 2023 00:44:23 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Wed, 14 Jun 2023 19:29:45 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe Testing results (both functional and performance) are good. In addition, I tested with a `C.failure_reason_is(retry_no_reduce_allocation_merges()) == true` guarantee and there were no failures observed. Once you address my latest comments I'll mark the PR as reviewed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1595520571 From jsjolen at openjdk.org Sat Jun 17 16:17:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Sat, 17 Jun 2023 16:17:21 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked Message-ID: Hi, `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. Please consider, thanks. Johan ------------- Commit messages: - Add in missing ResourceMark Changes: https://git.openjdk.org/jdk/pull/14530/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310264 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14530/head:pull/14530 PR: https://git.openjdk.org/jdk/pull/14530 From duke at openjdk.org Sun Jun 18 21:20:22 2023 From: duke at openjdk.org (TheFarlandsExplorer15) Date: Sun, 18 Jun 2023 21:20:22 GMT Subject: RFR: 8291003: ARM32: constant_table.size assertion [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jul 2022 19:47:34 GMT, Boris Ulasevich wrote: >> This change fixes assertion condition as per the recent [JDK-8287373](https://bugs.openjdk.org/browse/JDK-8287373) change: the size of constants section is aligned up according to the settings of the next section (instructions section). > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > align_at_start api clarification, use explicit section index in the expression src/hotspot/cpu/arm/arm.ad line 239: > 237: Register r = as_Register(ra_->get_encode(this)); > 238: CodeSection* consts_section = __ code()->consts(); > 239: int consts_size = consts_section->align_at_start(consts_section->size()); - int consts_size = consts_section->align_at_start(consts_section->size()); I have no idea what that means and: + // constants section size is aligned according to the align_at_start settings of the next section And it is: int consts_size = CodeSection::align_at_Start(consts_section->size(), CodeBuffer::SECT_INSTS); Still no idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9672#discussion_r1233383234 From duke at openjdk.org Mon Jun 19 01:45:59 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 19 Jun 2023 01:45:59 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v2] In-Reply-To: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: > This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. > > VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. > > This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). > > Test: > All vector and vectorapi test passed. > > Performance: > The benchmark function is like: > > > @Benchmark > public static int testInt() { > int res = 0; > for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) { > VectorMask m = VectorMask.fromArray(INT_SPECIES, ia, i); > res += m.firstTrue(); > } > > return res; > } > > > Following data is collected on a 128-bit Neon machine. > > Benchmark Before After Unit > testInt 22214.740 25627.833 ops/ms > testLong 11649.898 13698.535 ops/ms > > [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() > [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 > [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Chang Peng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into optimize_firsttrue2e4e_neon - 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). Test: All vector and vectorapi test passed. Performance: The benchmark function is like: ``` @Benchmark public static int testInt() { int res = 0; for (int i = 0; i < LENGTH; i += INT_SPECIES.length()) { VectorMask m = VectorMask.fromArray(INT_SPECIES, ia, i); res += m.firstTrue(); } return res; } ``` Following data is collected on a 128-bit Neon machine. Benchmark Before After Unit testInt 22214.740 25627.833 ops/ms testLong 11649.898 13698.535 ops/ms [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Change-Id: I4a2de805ffa4469f88d510c96617eae165f0e025 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14373/files - new: https://git.openjdk.org/jdk/pull/14373/files/24b6d738..d8507105 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=00-01 Stats: 82117 lines in 1520 files changed: 59805 ins; 16698 del; 5614 mod Patch: https://git.openjdk.org/jdk/pull/14373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373 PR: https://git.openjdk.org/jdk/pull/14373 From xgong at openjdk.org Mon Jun 19 01:57:15 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 Jun 2023 01:57:15 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 Message-ID: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) public static void testAndMaskSameValue1() The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) public static void testAndMaskSameValue1() This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. ------------- Commit messages: - 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 Changes: https://git.openjdk.org/jdk/pull/14533/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14533&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309894 Stats: 61 lines in 4 files changed: 37 ins; 9 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/14533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14533/head:pull/14533 PR: https://git.openjdk.org/jdk/pull/14533 From duke at openjdk.org Mon Jun 19 02:06:27 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 19 Jun 2023 02:06:27 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: > This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. > > VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. > > This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). > > Test: > All vector and vectorapi test passed. > > Performance: > The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. > > Following data is collected on a 128-bit Neon machine. > > Benchmark (inputs) Mode Before After Units > MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms > > [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() > [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 > [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Update MaskQueryOperationsBenchmark.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14373/files - new: https://git.openjdk.org/jdk/pull/14373/files/d8507105..62a6522c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=01-02 Stats: 195 lines in 1 file changed: 103 ins; 34 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/14373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373 PR: https://git.openjdk.org/jdk/pull/14373 From never at openjdk.org Mon Jun 19 02:31:14 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 19 Jun 2023 02:31:14 GMT Subject: RFR: 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind In-Reply-To: References: Message-ID: On Tue, 13 Jun 2023 08:24:23 GMT, Gerg? Barany wrote: > `jdk.vm.ci.amd64.AMD64#getLargestStorableKind(RegisterCategory)` unconditionally returns `AMD64Kind.MASK64` for mask registers. This is only correct if the target supports AVX512BW. On other AVX512 versions this should be `MASK16`. > > The Graal compiler uses this method to determine how to spill a given register. An incorrect size will lead to compilation errors due to trying to emit a move with a size that is not supported by the target. I have manually verified that this fixes those problems. The linux-x86 failure looks unrelated so the testing looks clean to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14441#issuecomment-1596391842 From gbarany at openjdk.org Mon Jun 19 02:31:16 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Mon, 19 Jun 2023 02:31:16 GMT Subject: Integrated: 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind In-Reply-To: References: Message-ID: <8gdpyrYyan1AzwfKBU9LxszA8EuQCGXwWwtSpWPsq8A=.9705176f-b706-422f-ae7e-ef2a384dbcec@github.com> On Tue, 13 Jun 2023 08:24:23 GMT, Gerg? Barany wrote: > `jdk.vm.ci.amd64.AMD64#getLargestStorableKind(RegisterCategory)` unconditionally returns `AMD64Kind.MASK64` for mask registers. This is only correct if the target supports AVX512BW. On other AVX512 versions this should be `MASK16`. > > The Graal compiler uses this method to determine how to spill a given register. An incorrect size will lead to compilation errors due to trying to emit a move with a size that is not supported by the target. I have manually verified that this fixes those problems. This pull request has now been integrated. Changeset: 492d25c8 Author: Gerg? Barany Committer: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/492d25c8df0f818d6f6e3a18a82bfad8fa95c282 Stats: 11 lines in 1 file changed: 9 ins; 0 del; 2 mod 8309601: [JVMCI] AMD64#getLargestStorableKind returns incorrect mask kind Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/14441 From thartmann at openjdk.org Mon Jun 19 05:12:19 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 05:12:19 GMT Subject: Integrated: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:28:38 GMT, Tobias Hartmann wrote: > We crash in C1 compiled code due a missing null check on the argument of the `Reference::get` method. The problem is that after [JDK-8201543](https://bugs.openjdk.org/browse/JDK-8201543), see [here](https://hg.openjdk.org/jdk/jdk/rev/4bb58f644e4e#l43.46), no `CodeEmitInfo` is passed to `access_load_at` and therefore no implicit null check is emitted. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 02aaab12 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/02aaab12e331e5a4c249f1d281c4439e2e7c914f Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod 8310126: C1: Missing receiver null check in Reference::get intrinsic Reviewed-by: roland, shade ------------- PR: https://git.openjdk.org/jdk/pull/14511 From thartmann at openjdk.org Mon Jun 19 05:29:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 05:29:06 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked In-Reply-To: References: Message-ID: On Sat, 17 Jun 2023 16:08:53 GMT, Johan Sj?len wrote: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1485429468 From chagedorn at openjdk.org Mon Jun 19 06:23:06 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jun 2023 06:23:06 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked In-Reply-To: References: Message-ID: On Sat, 17 Jun 2023 16:08:53 GMT, Johan Sj?len wrote: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Looks good! I'm wondering, why we don't stack allocate both `Node_List` instead of `new Node_List()`. But regardless of that, we should indeed add a `ResourceMark`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1485480358 From thartmann at openjdk.org Mon Jun 19 06:33:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 06:33:08 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 11:13:02 GMT, Daniel Skantz wrote: > ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. > > Testing: tier1-tier3. > > Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. > Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. > > Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. > > ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) Great work investigating this, Daniel! >From your comments in JBS, it seems that the underlying issue is additional exception edges in the graph that affect dominator computation. Could you elaborate a bit more on that with respect to the example that you provided in the PR description? I'm not an expert in C1 though (paging @veresov and @rwestrel as the author of JDK-7153771). Thanks, Tobias src/hotspot/share/c1/c1_ValueMap.cpp line 367: > 365: bool _valid = true; > 366: > 367: void visit(Value* vp) { Since `Value` is already a pointer type, can't we use `Value v` here? src/hotspot/share/c1/c1_ValueMap.cpp line 375: > 373: > 374: public: > 375: bool is_valid() {return _valid; } Suggestion: bool is_valid() { return _valid; } src/hotspot/share/c1/c1_ValueMap.cpp line 380: > 378: #ifdef ASSERT > 379: assert(insert != nullptr, "insertion point should not be null"); > 380: #endif Suggestion: assert(insert != nullptr, "insertion point should not be null"); ------------- PR Review: https://git.openjdk.org/jdk/pull/14492#pullrequestreview-1485433201 PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1233546636 PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1233545065 PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1233544383 From thartmann at openjdk.org Mon Jun 19 06:40:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 06:40:07 GMT Subject: RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: <4c9Fl7kRlVtr-XUHuOnOXHk0hZRMR-vjdw8DNCkKkuQ=.05ce24bf-0247-4dff-9728-65d542f33997@github.com> On Fri, 16 Jun 2023 07:09:18 GMT, Roland Westrelin wrote: > In JDK-8300257, I removed by mistake some logic that's required to > compute the loop alignment. This change puts it back. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14508#pullrequestreview-1485500509 From roland at openjdk.org Mon Jun 19 07:05:28 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 07:05:28 GMT Subject: RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: <4c9Fl7kRlVtr-XUHuOnOXHk0hZRMR-vjdw8DNCkKkuQ=.05ce24bf-0247-4dff-9728-65d542f33997@github.com> References: <4c9Fl7kRlVtr-XUHuOnOXHk0hZRMR-vjdw8DNCkKkuQ=.05ce24bf-0247-4dff-9728-65d542f33997@github.com> Message-ID: On Mon, 19 Jun 2023 06:37:29 GMT, Tobias Hartmann wrote: >> In JDK-8300257, I removed by mistake some logic that's required to >> compute the loop alignment. This change puts it back. > > Looks good. @TobiHartmann @vnkozlov thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14508#issuecomment-1596619979 From roland at openjdk.org Mon Jun 19 07:05:30 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 07:05:30 GMT Subject: Integrated: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 07:09:18 GMT, Roland Westrelin wrote: > In JDK-8300257, I removed by mistake some logic that's required to > compute the loop alignment. This change puts it back. This pull request has now been integrated. Changeset: 266f9838 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/266f9838ee28fb49b5368fc9778854c456b02b7c Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8308855: ARM32: TestBooleanVector crashes after 8300257 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14508 From fyang at openjdk.org Mon Jun 19 07:23:09 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 Jun 2023 07:23:09 GMT Subject: RFR: 8310192: RISC-V: Merge vector min & max instructs with similar match rules In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:27:39 GMT, Gui Cao wrote: > Hi, We merged vector min and max instructions with similar matching rules in this PR, and modified some comments of the copy_memory function in stubGenerator_riscv.cpp. > We can use Float256VectorTests.java Double256VectorTests.java to emit vmax_fp/vmin_fp nodes and the compilation log is as follows: > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmax_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmin_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14510#pullrequestreview-1485563905 From duke at openjdk.org Mon Jun 19 07:29:23 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 07:29:23 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v2] In-Reply-To: References: Message-ID: > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. Eric Nothum has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - merge - JDK-8309266: Cosmetic change to the if statement - JDK-8309266: 1) Added TestLoopLimitOverflowDuringCCP to the jtreg tests 2) Verify in LoopLimitNode::Value that the input nodes are ConI type nodes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14490/files - new: https://git.openjdk.org/jdk/pull/14490/files/04a5a3d5..35017b8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=00-01 Stats: 25527 lines in 777 files changed: 13951 ins; 8958 del; 2618 mod Patch: https://git.openjdk.org/jdk/pull/14490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14490/head:pull/14490 PR: https://git.openjdk.org/jdk/pull/14490 From shade at openjdk.org Mon Jun 19 07:32:06 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 Jun 2023 07:32:06 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked In-Reply-To: References: Message-ID: On Sat, 17 Jun 2023 16:08:53 GMT, Johan Sj?len wrote: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Looking at https://github.com/openjdk/jdk/commit/f0d08c04f1312045c5e6f77935b04fe90967a186 -- it would seem that `def` and `phi` just missed the allocation in `split_arena`? I think it would a cleaner/safer to change `def`/`phi` to be allocated in `split_arena`, rather than doing the blank `ResourceMark` here. Are we sure nothing RA-allocated is used after we leave `Split`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14530#issuecomment-1596656404 From duke at openjdk.org Mon Jun 19 07:58:05 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 19 Jun 2023 07:58:05 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> Message-ID: On Thu, 8 Jun 2023 09:15:27 GMT, Andrew Haley wrote: > Where is the benchmark? You don't seem to have included it in this PR. @theRealAph Sorry for the delay. Original performance was measured by a simple benchmark only measuring firstTrue()'s performance written by myself. When I wanted to add it to JDK I found an existing benchmark used to measure different mask operations' performance ([jdk/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java at master ? openjdk/jdk ? GitHub](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java)). I tried to measure firstTrue()'s performance by this benchmark, but I found Blackhole?s proportion of hottest region is too high, like following: [Hottest Methods (after inlining)].............................................................. 57.15% c2, level 4 org.openjdk.jmh.infra.Blackhole::consumeFull, version 867 41.99% c2, level 4 org.openjdk.bench.jdk.incubator.vector.jmh_generated.MaskQueryOperationsBenchmark_testFirstTrueLong_jmhTest::testFirstTrueLong_thrpt_jmhStub, version 883. So I spent some time on fixing this benchmark to measure mask operations' performance effectively. After this update, the proportion of blackhole is below 10% for each benchmark function. And I also updated the performance of firstTrue() measured by this benchmark when there are only 2 or 4 elements in boolean masks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1596687337 From duke at openjdk.org Mon Jun 19 08:17:31 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 08:17:31 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: JDK-8309266: reverted previous changes. assert was adapted and in case of overflow bottom is now returned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14490/files - new: https://git.openjdk.org/jdk/pull/14490/files/35017b8d..af53ebea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=01-02 Stats: 15 lines in 1 file changed: 7 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14490/head:pull/14490 PR: https://git.openjdk.org/jdk/pull/14490 From duke at openjdk.org Mon Jun 19 08:27:08 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 08:27:08 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 08:17:31 GMT, Eric Nothum wrote: >> **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 >> >> **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. >> >> Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. >> >> By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8309266: reverted previous changes. assert was adapted and in case of overflow bottom is now returned I have now reverted my previous changes. As new fix, 1) I have changed the assert to only trigger if the input nodes are `ConINode` and an overflow happens 2) if an overflow is detected, `bottom_type()` is returned. This will now not slow down CCP anymore and the test case will execute properly now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14490#issuecomment-1596736240 From epeter at openjdk.org Mon Jun 19 08:53:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jun 2023 08:53:16 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Message-ID: Removed a spurious assert before optimization bailout. I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** ------------- Commit messages: - make test restrictions tighter - fixed IR rule of new test - fix whitespace - fix bailout of another assert - improved tests - 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Changes: https://git.openjdk.org/jdk/pull/14494/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14494&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310130 Stats: 145 lines in 3 files changed: 141 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14494/head:pull/14494 PR: https://git.openjdk.org/jdk/pull/14494 From rcastanedalo at openjdk.org Mon Jun 19 08:57:11 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Jun 2023 08:57:11 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 11:13:02 GMT, Daniel Skantz wrote: > ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. > > Testing: tier1-tier3. > > Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. > Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. > > Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. > > ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) Good catch, analysis, and regression test Daniel. Great that you could generalize the fix to cover all other LICM cases. Just a couple of comments/suggestions. src/hotspot/share/c1/c1_ValueMap.cpp line 362: > 360: } > 361: > 362: class CheckInsertionPoint: public ValueVisitor { For consistency with other similar cases in `c1_ValueMap.cpp`: Suggestion: class CheckInsertionPoint : public ValueVisitor { src/hotspot/share/c1/c1_ValueMap.cpp line 421: > 419: // Check that insertion point has higher dom depth than all inputs to cur > 420: CheckInsertionPoint v(_insertion_point); > 421: cur->input_values_do(&v); For compilation efficiency, would it be possible to perform this computation only when `cur_invariant` holds? You could for example encapsulate the creation of `v`, call to `cur->input_values_do(&v)`, and `v.is_valid()` check into an auxiliary function `is_dominated_by_inputs(_insertion_point, cur)` or similar and call that function below (`if (cur_invariant && is_dominated_by_inputs(_insertion_point, cur)) {`). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14492#pullrequestreview-1485684935 PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1233707598 PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1233714992 From roland at openjdk.org Mon Jun 19 09:05:20 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 09:05:20 GMT Subject: RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases Message-ID: Before 8275201, loading the element klass of an array returned: TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); that is exact if the array type is exact. I changed it to: tkls->is_aryklassptr()->elem(); When the array type is exact (newly allocated array for instance) but the element class has subclasses, this doesn't return an exact class (so the logic is different from the one that was there before). That affects array store checks that no longer constant fold. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/14536/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14536&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310299 Stats: 64 lines in 3 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14536.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14536/head:pull/14536 PR: https://git.openjdk.org/jdk/pull/14536 From aph at openjdk.org Mon Jun 19 09:55:08 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 19 Jun 2023 09:55:08 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> Message-ID: On Mon, 19 Jun 2023 07:53:23 GMT, Chang Peng wrote: > Sorry for the delay. Original performance was measured by a simple benchmark only measuring firstTrue()'s performance written by myself. When I wanted to add it to JDK I found an existing benchmark used to measure different mask operations' performance ([jdk/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java at master ? openjdk/jdk ? GitHub](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java)). I tried to measure firstTrue()'s performance by this benchmark, but I found Blackhole?s proportion of hottest region is too high, like following: Can you please send the entire output of JMH? Blackhole should not appear at all in the output because it's been intrinsified. I'd like to know why the intrinsic isn't working for you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1596871646 From duke at openjdk.org Mon Jun 19 10:08:05 2023 From: duke at openjdk.org (Chang Peng) Date: Mon, 19 Jun 2023 10:08:05 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> Message-ID: On Mon, 19 Jun 2023 09:52:22 GMT, Andrew Haley wrote: > > Sorry for the delay. Original performance was measured by a simple benchmark only measuring firstTrue()'s performance written by myself. When I wanted to add it to JDK I found an existing benchmark used to measure different mask operations' performance ([jdk/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java at master ? openjdk/jdk ? GitHub](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/jdk/incubator/vector/MaskQueryOperationsBenchmark.java)). I tried to measure firstTrue()'s performance by this benchmark, but I found Blackhole?s proportion of hottest region is too high, like following: > > Can you please send the entire output of JMH? Blackhole should not appear at all in the output because it's been intrinsified. I'd like to know why the intrinsic isn't working for you. Output before this patch: https://gist.github.com/changpeng1997/734aa176577bfff56f5a87db9c8db69a Output after this patch: https://gist.github.com/changpeng1997/73098069b8f814310d6606dfd7dc56c5 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1596894355 From aph at openjdk.org Mon Jun 19 10:11:08 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 19 Jun 2023 10:11:08 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Mon, 19 Jun 2023 02:06:27 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update MaskQueryOperationsBenchmark.java > Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) Could you ty this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1596900531 From thartmann at openjdk.org Mon Jun 19 11:25:19 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:25:19 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Set isValid properly and inside of the loop - Merge branch 'JDK-8310143' of https://github.com/TobiHartmann/jdk into JDK-8310143 - 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14514/files - new: https://git.openjdk.org/jdk/pull/14514/files/61f44405..689bf927 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14514&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14514&range=00-01 Stats: 4331 lines in 207 files changed: 3184 ins; 303 del; 844 mod Patch: https://git.openjdk.org/jdk/pull/14514.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14514/head:pull/14514 PR: https://git.openjdk.org/jdk/pull/14514 From thartmann at openjdk.org Mon Jun 19 11:25:20 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:25:20 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:28:12 GMT, Tobias Hartmann wrote: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. > > Thanks, > Tobias Thanks for the reviews, Vladimir and Roberto! My testing caught failures because I accidentally declared the `isValid` field outside of the loop but it should be re-set to `true` after each iteration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14514#issuecomment-1597004486 From thartmann at openjdk.org Mon Jun 19 11:25:23 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:25:23 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 15:20:50 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Set isValid properly and inside of the loop >> - Merge branch 'JDK-8310143' of https://github.com/TobiHartmann/jdk into JDK-8310143 >> - 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 > > test/hotspot/jtreg/compiler/compilercontrol/share/MultiCommand.java line 83: > >> 81: } else { >> 82: md = AbstractTestBase.METHOD_GEN.generateRandomDescriptor(exec); >> 83: isValid = false; > > Is there a (remote) chance that `generateRandomDescriptor()` generates a valid descriptor? Would the compiler control test fail in that case due to a "false negative" ("expected to fail but did not fail")? Maybe a clarifying comment here would help. Yes, I think that could happen. I updated that line to use `md.isValid()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14514#discussion_r1233914640 From thartmann at openjdk.org Mon Jun 19 11:26:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:26:01 GMT Subject: Withdrawn: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:28:12 GMT, Tobias Hartmann wrote: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. > > Thanks, > Tobias This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14514 From thartmann at openjdk.org Mon Jun 19 11:30:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:30:24 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 11:25:19 GMT, Tobias Hartmann wrote: >> The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Set isValid properly and inside of the loop > - Merge branch 'JDK-8310143' of https://github.com/TobiHartmann/jdk into JDK-8310143 > - 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 I screwed up the merge and re-opened the PR with https://github.com/openjdk/jdk/pull/14538. Sorry for the noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14514#issuecomment-1597014161 From thartmann at openjdk.org Mon Jun 19 11:37:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:37:10 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 Message-ID: The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. Thanks, Tobias ------------- Commit messages: - Screwed up the merge Changes: https://git.openjdk.org/jdk/pull/14538/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14538&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310143 Stats: 14 lines in 3 files changed: 3 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14538/head:pull/14538 PR: https://git.openjdk.org/jdk/pull/14538 From thartmann at openjdk.org Mon Jun 19 11:47:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 11:47:14 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v5] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 15 Jun 2023 13:08:36 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces `testlibrary_tests/ir_framework/tests/TestIRMatching.java` still fails: `Failed IR Rules (21) of Methods (8)`: 1) Method "public void ir_framework.tests.Traps.classCheck()" - [Failed IR rules: 3]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#TRAP#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PrintIdeal": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*uncommon_trap.*reason)" - Matched forbidden nodes (2): * 39 CallStaticJava === 33 6 7 8 9 (38 1 1 10 29 ) [[ 40 ]] # Static uncommon_trap(reason * 60 CallStaticJava === 245 6 7 8 9 (59 1 1 10 24 ) [[ 61 ]] # Static uncommon_trap(reason * @IR rule 2: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CLASS_CHECK_TRAP#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PrintIdeal": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*uncommon_trap.*class_check)" - Matched forbidden node: * 60 CallStaticJava === 245 6 7 8 9 (59 1 1 10 24 ) [[ 61 ]] # Static uncommon_trap(reason='class_check * @IR rule 3: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#NULL_CHECK_TRAP#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PrintIdeal": - failOn: Graph contains forbidden nodes: * Constraint 1: "(\\d+(\\s){2}(CallStaticJava.*)+(\\s){2}===.*uncommon_trap.*null_check)" - Matched forbidden node: * 39 CallStaticJava === 33 6 7 8 9 (38 1 1 10 29 ) [[ 40 ]] # Static uncommon_trap(reason='null_check [...] ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1597036069 From rcastanedalo at openjdk.org Mon Jun 19 11:47:22 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Jun 2023 11:47:22 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 Message-ID: This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grain tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8310220](https://bugs.openjdk.org/browse/JDK-8310220), and can be also useful for educational purposes: ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. Finally, the existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset): ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) The changeset increases the number of graph dumps per compilation at print levels 3 and 4 by 30-40%. This additional overhead is in my opinion justified by the value provided by the additional dumps, and the high print level at which they are produced. #### Testing - tier1-3 (linux-x64; release and debug mode). - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 and 4. ------------- Commit messages: - Dump graph before IGVN (by popular demand) and after IGVN (for symmetry) - Update IGV's README - Promote PHASE_MACH_ANALYSIS dump to print level 3 (since it runs once per compilation) - Dump Ideal graph after each IGVN step (in print level 4) Changes: https://git.openjdk.org/jdk/pull/14537/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14537&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310220 Stats: 23 lines in 4 files changed: 12 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/14537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14537/head:pull/14537 PR: https://git.openjdk.org/jdk/pull/14537 From volker.simonis at gmail.com Mon Jun 19 12:07:58 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 19 Jun 2023 14:07:58 +0200 Subject: Question regarding ReplayCompiles and multiple inlining Message-ID: Hi, I try to reproduce a compiler issue with a ReplayDataFile but unfortunately I can't reproduce the crash. I hacked the VM to print out the the inlining tree just before the crashes and realized that the original inlining differes from the inlining done by ReplayCompiles. In my specific case I have the following inlining pattern during the crash (`foo::f1()` gets inlined twice into `foo::f0() `): . . @ 57 foo::f0() inline (hot) @ 48 foo::f1() inline (hot) @ 2 bar::f2() inline (hot) . . @ 48 foo::f1() inline (hot) @ 2 bar::f2() NodeCountInliningCutoff In the ReplayDataFile (in the `inline` part of the `compile` line) both, `foo::f1()` and `bar::f2()` are recorded only once (because they have the same bci, name/signature and inlining depth). When running the replay, I get the following inlining pattern: . . @ 57 foo::f0() force inline by ciReplay @ 48 foo::f1() force inline by ciReplay @ 2 bar::f2() force inline by ciReplay . . @ 48 foo::f1() force inline by ciReplay @ 2 bar::f2() force inline by ciReplay This is clearly different because in the replay we inline `bar::f2()` a second time (while in the original run it was skipped due to NodeCountInliningCutoff). >From looking at `find_ciInlineRecord()` [1], it looks like the replay file only records the bci, inlining depth and method name/signature for an inlinee? How is this supposed to work if a method is inlined differently at the same level like in this example? Notice that I'm currently working with JDK 17 (because my problem doesn't reproduce with HEAD) but it seems the relevant code hasn't changed much in this area since JDK 17. Please let me know if this is a known problem and if there's any way to workaround it? Thank you and best regards, Volker [1] https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp#L992 From roland at openjdk.org Mon Jun 19 12:22:56 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 12:22:56 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - more test failures - Merge branch 'master' into JDK-8308869 - whitespaces - test failures - review - 32 bit fix - white spaces - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/6daa01d0..684f7520 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=04-05 Stats: 55722 lines in 1001 files changed: 38895 ins; 13822 del; 3005 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From roland at openjdk.org Mon Jun 19 12:23:17 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 12:23:17 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v5] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Mon, 19 Jun 2023 11:44:22 GMT, Tobias Hartmann wrote: > `testlibrary_tests/ir_framework/tests/TestIRMatching.java` still fails: `Failed IR Rules (21) of Methods (8)`: Thanks for running testing again. Hopefully the new commit fixes `TestIRMatching.java` for good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1597089696 From roland at openjdk.org Mon Jun 19 12:28:08 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 Jun 2023 12:28:08 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 08:17:31 GMT, Eric Nothum wrote: >> **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 >> >> **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. >> >> Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. >> >> By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8309266: reverted previous changes. assert was adapted and in case of overflow bottom is now returned Looks reasonable to me. src/hotspot/share/opto/loopnode.cpp line 2316: > 2314: // Assert checks for overflow only if all input nodes are ConINodes, as during CCP > 2315: // there might be a temporary overflow from PhiNodes see JDK-8309266 > 2316: assert(in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI() \ Do we really need a backslash here? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14490#pullrequestreview-1486109092 PR Review Comment: https://git.openjdk.org/jdk/pull/14490#discussion_r1233984481 From duke at openjdk.org Mon Jun 19 12:50:00 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 12:50:00 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v4] In-Reply-To: References: Message-ID: <8zoAgs1KCQSNKcvYI0CJ3ne-oIW91pnUwa4zyQ617ew=.b4a8d7d2-c0a8-4e48-b146-4172c8460350@github.com> > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: JDK-8309266: cosmetic change in assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14490/files - new: https://git.openjdk.org/jdk/pull/14490/files/af53ebea..1b3989a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14490&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14490/head:pull/14490 PR: https://git.openjdk.org/jdk/pull/14490 From duke at openjdk.org Mon Jun 19 12:50:05 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 12:50:05 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:25:30 GMT, Roland Westrelin wrote: >> Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8309266: reverted previous changes. assert was adapted and in case of overflow bottom is now returned > > src/hotspot/share/opto/loopnode.cpp line 2316: > >> 2314: // Assert checks for overflow only if all input nodes are ConINodes, as during CCP >> 2315: // there might be a temporary overflow from PhiNodes see JDK-8309266 >> 2316: assert(in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI() \ > > Do we really need a backslash here? No not really, I just thought the line was getting long ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14490#discussion_r1234008854 From thartmann at openjdk.org Mon Jun 19 12:56:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 12:56:14 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Missed another usage of Executor constructor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14538/files - new: https://git.openjdk.org/jdk/pull/14538/files/4d24727f..f477f9e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14538&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14538&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14538.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14538/head:pull/14538 PR: https://git.openjdk.org/jdk/pull/14538 From rcastanedalo at openjdk.org Mon Jun 19 12:56:15 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Jun 2023 12:56:15 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:51:54 GMT, Tobias Hartmann wrote: >> The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Missed another usage of Executor constructor Thanks for addressing my question! Changes in `MultiCommand.java` look good. Marked as reviewed by rcastanedalo (Reviewer). test/hotspot/jtreg/compiler/compilercontrol/share/scenario/Executor.java line 65: > 63: */ > 64: public Executor(List vmOptions, Map states, > 65: List jcmdCommands) { `compiler.compilercontrol.jcmd.StressAddJcmdBase` extends `Executor`, so its constructor needs to be updated. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14538#pullrequestreview-1486154925 PR Review: https://git.openjdk.org/jdk/pull/14538#pullrequestreview-1486161829 PR Review Comment: https://git.openjdk.org/jdk/pull/14538#discussion_r1234014073 From rcastanedalo at openjdk.org Mon Jun 19 12:56:16 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 Jun 2023 12:56:16 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: <88wWCFd59Mm9rIqg7bCdgAoPZxPvrsdqTCo2B9tbqpc=.0dd9fd9a-c4a2-45cf-aed6-c5026fe7b4f8@github.com> On Mon, 19 Jun 2023 12:47:39 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Missed another usage of Executor constructor > > test/hotspot/jtreg/compiler/compilercontrol/share/scenario/Executor.java line 65: > >> 63: */ >> 64: public Executor(List vmOptions, Map states, >> 65: List jcmdCommands) { > > `compiler.compilercontrol.jcmd.StressAddJcmdBase` extends `Executor`, so its constructor needs to be updated. I see you just addressed this in [f477f9e](https://github.com/openjdk/jdk/pull/14538/commits/f477f9e2d5c38d248fa22ff0550b84390fa075ef), thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14538#discussion_r1234017461 From chagedorn at openjdk.org Mon Jun 19 12:59:09 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jun 2023 12:59:09 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v4] In-Reply-To: <8zoAgs1KCQSNKcvYI0CJ3ne-oIW91pnUwa4zyQ617ew=.b4a8d7d2-c0a8-4e48-b146-4172c8460350@github.com> References: <8zoAgs1KCQSNKcvYI0CJ3ne-oIW91pnUwa4zyQ617ew=.b4a8d7d2-c0a8-4e48-b146-4172c8460350@github.com> Message-ID: On Mon, 19 Jun 2023 12:50:00 GMT, Eric Nothum wrote: >> **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 >> >> **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. >> >> Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. >> >> By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8309266: cosmetic change in assert Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14490#pullrequestreview-1486175138 From chagedorn at openjdk.org Mon Jun 19 12:59:11 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jun 2023 12:59:11 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:44:39 GMT, Eric Nothum wrote: >> src/hotspot/share/opto/loopnode.cpp line 2316: >> >>> 2314: // Assert checks for overflow only if all input nodes are ConINodes, as during CCP >>> 2315: // there might be a temporary overflow from PhiNodes see JDK-8309266 >>> 2316: assert(in(Init)->is_ConI() && in(Limit)->is_ConI() && in(Stride)->is_ConI() \ >> >> Do we really need a backslash here? > > No not really, I just thought the line was getting long You can wrap lines in asserts without backslashes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14490#discussion_r1234026309 From thartmann at openjdk.org Mon Jun 19 13:02:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 13:02:12 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: <1JckAHxiKOnwpimJFm005uNRAhc9Z-eG8M_iobB67Js=.326592f8-89fb-494c-98e8-72bf239fe500@github.com> On Mon, 19 Jun 2023 12:56:14 GMT, Tobias Hartmann wrote: >> The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Missed another usage of Executor constructor Thanks again for the review, Roberto! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14538#issuecomment-1597145749 From thartmann at openjdk.org Mon Jun 19 13:02:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 Jun 2023 13:02:14 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: <88wWCFd59Mm9rIqg7bCdgAoPZxPvrsdqTCo2B9tbqpc=.0dd9fd9a-c4a2-45cf-aed6-c5026fe7b4f8@github.com> References: <88wWCFd59Mm9rIqg7bCdgAoPZxPvrsdqTCo2B9tbqpc=.0dd9fd9a-c4a2-45cf-aed6-c5026fe7b4f8@github.com> Message-ID: On Mon, 19 Jun 2023 12:49:43 GMT, Roberto Casta?eda Lozano wrote: >> test/hotspot/jtreg/compiler/compilercontrol/share/scenario/Executor.java line 65: >> >>> 63: */ >>> 64: public Executor(List vmOptions, Map states, >>> 65: List jcmdCommands) { >> >> `compiler.compilercontrol.jcmd.StressAddJcmdBase` extends `Executor`, so its constructor needs to be updated. > > I see you just addressed this in [f477f9e](https://github.com/openjdk/jdk/pull/14538/commits/f477f9e2d5c38d248fa22ff0550b84390fa075ef), thanks! Thanks, I noticed as well and updated the file in parallel to you review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14538#discussion_r1234029479 From duke at openjdk.org Mon Jun 19 13:25:13 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 19 Jun 2023 13:25:13 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v3] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:56:13 GMT, Christian Hagedorn wrote: >> No not really, I just thought the line was getting long > > You can wrap lines in asserts without backslashes. Ahh yes. Not sure why, I probably just used the backslash after having read it in some other parts of the code. Sorry for the confusion from my side ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14490#discussion_r1234060136 From epeter at openjdk.org Mon Jun 19 13:44:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jun 2023 13:44:10 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 03:30:58 GMT, Fei Gao wrote: >> I'm collecting the new benchmark results here, so that we see the effect of misaligned load-stores. >> I have a series of control cases (aligned), and a series of misaligned cases. >> >> ------------- >> Machine: 11th Gen Intel? Core? i7-11850H @ 2.50GHz ? 16. With AVX512 support. >> >> With patch: >> >> Benchmark (COUNT) (seed) Mode Cnt Score Error Units >> VectorAlignment.VectorAlignmentNoSuperWord.bench000B_control 2048 0 avgt 2465.281 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000C_control 2048 0 avgt 2467.440 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000D_control 2048 0 avgt 1276.895 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000F_control 2048 0 avgt 1313.390 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000I_control 2048 0 avgt 2465.260 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000L_control 2048 0 avgt 2469.814 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench000S_control 2048 0 avgt 2466.305 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench001_control 2048 0 avgt 2470.130 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench100B_misaligned_load 2048 0 avgt 2463.569 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench100C_misaligned_load 2048 0 avgt 2467.426 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench100D_misaligned_load 2048 0 avgt 1244.256 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench100F_misaligned_load 2048 0 avgt 1268.847 ns/op >> VectorAlignment.VectorAlignmentNoSuperWord.bench100I_misaligned_load 2048 0 avgt 2465.870 ns/op >> VectorAlignment.VectorAlign... > >> aarch64 asimd: vectorizing the misaligned cases leads to clear performance win compared to non-vectorization. However, we can see that the vectorized misaligned cases are consistently a bit slower than the vectorized aligned cases. > > Hi @eme64 , thanks for your perf data! I also tried your new benchmark on some latest `aarch64` machines using `asimd`. Here are part of results: > > VectorAlignment.VectorAlignmentSuperWord.bench000B_control 2048 0 avgt 152.831 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000C_control 2048 0 avgt 285.819 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000D_control 2048 0 avgt 749.996 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000F_control 2048 0 avgt 396.433 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000I_control 2048 0 avgt 560.767 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000L_control 2048 0 avgt 1131.909 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench000S_control 2048 0 avgt 285.215 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench001_control 2048 0 avgt 562.436 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100B_misaligned_load 2048 0 avgt 152.459 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100C_misaligned_load 2048 0 avgt 290.888 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100D_misaligned_load 2048 0 avgt 754.443 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100F_misaligned_load 2048 0 avgt 386.633 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100I_misaligned_load 2048 0 avgt 560.587 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100L_misaligned_load 2048 0 avgt 1134.492 ns/op > VectorAlignment.VectorAlignmentSuperWord.bench100S_misaligned_load 2048 ... @fg1417 perfect, thanks for looking into that! Is there something you still want me to change on this RFE? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1597213651 From epeter at openjdk.org Mon Jun 19 13:46:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 Jun 2023 13:46:11 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 02:30:53 GMT, Quan Anh Mai wrote: >> @merykitty @sviswa7 @fg1417 Is there a way to stress-test the registers? It seems this bug only triggered because we had a moderately large unrolling factor, and then did not vectorize, leaving lots of instructions with probably a higher register pressure. Would be nice to have some sort of testing where we generate more (all?) of the possible register combinations. What do you think? > > @eme64 Yes that was my mistake, that node requires AVX512VL so `vlRegF` and `regF` are the same. > >> Is there a way to stress-test the registers? > > Can we randomise the allocated register during register allocation? > > Thanks. @merykitty Yes, randomization would be great. I don't know much about the register allocator, so feel free to do something like that if you want and have time ;) @sviswa7 Is there something you want me to change still? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1597219792 From cslucas at openjdk.org Mon Jun 19 15:46:34 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 19 Jun 2023 15:46:34 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <-A7bd8C0q5o1WuRSeSkYYnUoApV4s9uijPmiNB2Wteo=.c5bc944c-88a3-4228-bd41-091ac6c8fb1d@github.com> On Sat, 17 Jun 2023 00:41:32 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Merge branch 'openjdk:master' into rematerialization-of-merges >> - Rome minor refactorings. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> Catching up with master. >> - Address PR review 6: debug format output & some refactoring. >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe > > Testing results (both functional and performance) are good. > > In addition, I tested with a guarantee that no `retry_no_reduce_allocation_merges()` failures are observed and there were no failures observed. > > Once you address my latest comments I'll mark the PR as reviewed. Thank you once more for the comments @iwanowww . I?ll address them asap. Can I ask what requirements are there for a product flag? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1597395112 From chagedorn at openjdk.org Mon Jun 19 16:00:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 Jun 2023 16:00:58 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:56:14 GMT, Tobias Hartmann wrote: >> The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Missed another usage of Executor constructor Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14538#pullrequestreview-1486515640 From fgao at openjdk.org Tue Jun 20 02:43:20 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 20 Jun 2023 02:43:20 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: <-xYDUunR4hkbjns3y9i0fJt1piblt8Np7iJGt3ZFVMg=.dc125802-17a2-4072-afe6-0624d36a35ae@github.com> On Wed, 14 Jun 2023 11:13:24 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308606 > - bench000: add other type examples > - bench100: added versions for more types (misaligned load store) > - Add vm.flagless back in for LoopArrayIndexComputeTest.java > - removed AlignVector from IR framework again, do that in RFE > - IR whitelist AlignVector, require it false in the newly added tests > - Merge branch 'master' into JDK-8308606 > - Merge branch 'master' into JDK-8308606 > - remove some outdated comments > - Benchmark VectorAlignment > - ... and 4 more: https://git.openjdk.org/jdk/compare/b4a23bb4...0740b7bc LGTM. Thanks for your work. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/14096#pullrequestreview-1487205202 From epeter at openjdk.org Tue Jun 20 05:55:15 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 Jun 2023 05:55:15 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 Message-ID: This is another case where imprecise type computation leads to corrupted control flow. The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. **Testing** Attached one regression test (takes less than `1.5sec`). Tested up to tier6 and stress testing. **Running** -------- **Details** ![example](https://github.com/openjdk/jdk/assets/32593061/92d5d801-1a41-4dfd-ab39-cd6b69527dcd) 621 CountedLoop (pre-loop) 622 Phi (tripcount, minint...7) 612 AddI (incr, int) ---> with patch type minint...4 650 CmpI / 702 Bool (cannot detect that tripcount < limit) ---> with patch it can detect it! 871 ConvL2I (loop limit, 8...maxint) 703 If (zero-trip-guard) 704 IfFalse (projection towards main-loop) 654 CastII (value of 612 AddI, if zero-trip-guard goes to main-loop, so value > limit, hence type 9...maxint) 809 CastII (remembers that tripcount cannot overflow, so has type minint...7) (when the Opaque nodes vanish, then the two types create an empty range, and TOP starts corrupting the CFG) ------------- Commit messages: - notify counted loop incr - added -XX:+UnlockDiagnosticVMOptions to test - merge with master - 8308504: C2: "malformed control flow" after JDK-8303466 Changes: https://git.openjdk.org/jdk/pull/14331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308504 Stats: 127 lines in 4 files changed: 105 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/14331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14331/head:pull/14331 PR: https://git.openjdk.org/jdk/pull/14331 From thartmann at openjdk.org Tue Jun 20 06:29:18 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 20 Jun 2023 06:29:18 GMT Subject: RFR: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 12:56:14 GMT, Tobias Hartmann wrote: >> The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Missed another usage of Executor constructor Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14538#issuecomment-1598186073 From haosun at openjdk.org Tue Jun 20 06:41:04 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 20 Jun 2023 06:41:04 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 Message-ID: `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value Option 'UseSHA3Intrinsics' should be enabled by default The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 ------------- Commit messages: - 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 Changes: https://git.openjdk.org/jdk/pull/14551/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14551&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309109 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14551.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14551/head:pull/14551 PR: https://git.openjdk.org/jdk/pull/14551 From jwaters at openjdk.org Tue Jun 20 07:14:11 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 20 Jun 2023 07:14:11 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Bumping ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1598234398 From duke at openjdk.org Tue Jun 20 08:38:11 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 20 Jun 2023 08:38:11 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs [v2] In-Reply-To: References: Message-ID: > ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. > > Testing: tier1-tier3. > > Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. > Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. > > Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. > > ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14492/files - new: https://git.openjdk.org/jdk/pull/14492/files/b4c550a0..ebb6655a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14492&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14492&range=00-01 Stats: 17 lines in 1 file changed: 7 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14492/head:pull/14492 PR: https://git.openjdk.org/jdk/pull/14492 From aph at openjdk.org Tue Jun 20 08:38:21 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 Jun 2023 08:38:21 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 In-Reply-To: References: Message-ID: <1RyJNK0oT-Z8AuDV8MW9WievErroIxB9ly7AZpDXAMM=.72276808-68cf-462e-af75-51fb3e7eb7ef@github.com> On Tue, 20 Jun 2023 06:35:15 GMT, Hao Sun wrote: > `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. > > As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: > > > JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value > Option 'UseSHA3Intrinsics' should be enabled by default > > > The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. > > Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. > > Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. > > Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. > > [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java line 31: > 29: * @library /test/lib / > 30: * @requires vm.flagless > 31: * @requires os.arch == "aarch64" & os.family == "mac" I think this requires a comment. Something to the effect that we don't always enable the use of SHA3 instructions because they don't always help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14551#discussion_r1234931737 From duke at openjdk.org Tue Jun 20 08:38:14 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 20 Jun 2023 08:38:14 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 05:34:47 GMT, Tobias Hartmann wrote: >> Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > src/hotspot/share/c1/c1_ValueMap.cpp line 367: > >> 365: bool _valid = true; >> 366: >> 367: void visit(Value* vp) { > > Since `Value` is already a pointer type, can't we use `Value v` here? I am not sure if this is possible without changing the ValueVisitor ([ref](https://java.se.oracle.com/source/xref/jdk-jdk/jdk-open/src/hotspot/share/c1/c1_Instruction.hpp#123)) itself ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1234929080 From duke at openjdk.org Tue Jun 20 08:41:06 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 20 Jun 2023 08:41:06 GMT Subject: RFR: 8301489: ShortLoopOptimizer might lift instructions before their inputs [v2] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 06:30:12 GMT, Tobias Hartmann wrote: > Great work investigating this, Daniel! > > From your comments in JBS, it seems that the underlying issue is additional exception edges in the graph that affect dominator computation. Could you elaborate a bit more on that with respect to the example that you provided in the PR description? > > I'm not an expert in C1 though (paging @veresov and @rwestrel as the author of JDK-7153771). > > Thanks, Tobias Thank you very much for review! My understanding is, this might have been introduced in JDK-7153771, where we have extra edges to the exception handler of all successors during dominator calculation -- `Additional edge to xhandler of all our successors`. The client RangeCheckElimination optimization needed this additional information, but the short loop optimization was not updated accordingly. In this specific case, I observed using print debugging and -XX:TraceLinearScanLevel=4, that the dominator of B10 is being calculated as common_dominator(B7, B1). But B1 has dominator B0 because of the extra edge from B14 to B1 (the exception handler of B3). The short loop optimization lifts loop invariant instructions to the dominator of the loop header, which in this case becomes B0. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14492#issuecomment-1598357520 From jsjolen at openjdk.org Tue Jun 20 09:07:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 20 Jun 2023 09:07:03 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v2] In-Reply-To: References: Message-ID: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Johan Sj?len has updated the pull request incrementally with three additional commits since the last revision: - Skip the resource mark, add in separate temporary arenas for each. - Allocate whole of VectorSet on arena - Stack allocate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14530/files - new: https://git.openjdk.org/jdk/pull/14530/files/1b0c549a..820b0841 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=00-01 Stats: 20 lines in 1 file changed: 4 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14530/head:pull/14530 PR: https://git.openjdk.org/jdk/pull/14530 From duke at openjdk.org Tue Jun 20 10:21:51 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 20 Jun 2023 10:21:51 GMT Subject: RFR: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs [v3] In-Reply-To: References: Message-ID: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> > ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. > > Testing: tier1-tier3. > > Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. > Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. > > Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. > > ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Tweak test #iterations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14492/files - new: https://git.openjdk.org/jdk/pull/14492/files/ebb6655a..074d1ed7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14492&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14492&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14492/head:pull/14492 PR: https://git.openjdk.org/jdk/pull/14492 From jsjolen at openjdk.org Tue Jun 20 10:23:29 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 20 Jun 2023 10:23:29 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v3] In-Reply-To: References: Message-ID: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Move these defs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14530/files - new: https://git.openjdk.org/jdk/pull/14530/files/820b0841..e6ab09ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=01-02 Stats: 9 lines in 1 file changed: 4 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14530/head:pull/14530 PR: https://git.openjdk.org/jdk/pull/14530 From jsjolen at openjdk.org Tue Jun 20 10:35:25 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 20 Jun 2023 10:35:25 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Be conservative in sizing to be close to original behavior ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14530/files - new: https://git.openjdk.org/jdk/pull/14530/files/e6ab09ed..7035bb4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14530/head:pull/14530 PR: https://git.openjdk.org/jdk/pull/14530 From jsjolen at openjdk.org Tue Jun 20 10:35:28 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 20 Jun 2023 10:35:28 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v3] In-Reply-To: References: Message-ID: <2yWltBq9ODonDLUyXLthAnWboLSvelk2ONWPA7J16bE=.c07c0cf7-d4c7-40d1-87ec-4c88aaa2fa83@github.com> On Tue, 20 Jun 2023 10:23:29 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Move these defs Hi, In pursuit of answering your questions I saw some more opportunities for improvement. I did end up removing the `ResourceMark` as the size of `Split` makes it difficult to see whether it changes something that's used later on or not. First of all, the `VectorSet`s allocated their elements on the arena, but themselves on the resource area, this was another unhandled memory leak. I fixed that by allocating the VSets themselves on the arena, also. This removes a leak of `spill_cnt*56` bytes. Second of all @shipilev asked a good question, and I think that we can move `defs` and `phis` into the arena. I was worried about the sizes of these arrays being quite large and thus triggering a lot of reallocations, but they're typically quite small (I measured) while sometimes reaching to defs being 256 and phis 512 elements. I picked a conservative lower bound of 8/16. For what it's worth, the number of `VectorSet`s seems to strongly correlate with the size of the sets. a future RFE might make these larger by default. All of these measurements are done by running a Spring Hello World-app as a substitute for a 'real'/'typical' workload. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14530#issuecomment-1598520575 From fparain at openjdk.org Tue Jun 20 12:37:02 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 20 Jun 2023 12:37:02 GMT Subject: RFR: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 22:49:36 GMT, Coleen Phillimore wrote: > This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. > Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. Looks good to me ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14505#pullrequestreview-1488044615 From coleenp at openjdk.org Tue Jun 20 13:38:19 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 20 Jun 2023 13:38:19 GMT Subject: RFR: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 22:49:36 GMT, Coleen Phillimore wrote: > This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. > Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. Thanks for reviewing, Fred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14505#issuecomment-1598802128 From coleenp at openjdk.org Tue Jun 20 13:38:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 20 Jun 2023 13:38:20 GMT Subject: Integrated: 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 22:49:36 GMT, Coleen Phillimore wrote: > This change adds casts to nmethod and compiled method offset and size functions that return int, and checked_casts where it's not obvious or already checked that the cast is correct. > Tested with tier1 on Oracle platforms, and tier1-4 linux and windows. This pull request has now been integrated. Changeset: e1906e76 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/e1906e76412fa506cf72447dcb9adc896b92ae81 Stats: 50 lines in 8 files changed: 0 ins; 0 del; 50 mod 8310027: Fix -Wconversion warnings in nmethod and compiledMethod related code Reviewed-by: kvn, fparain ------------- PR: https://git.openjdk.org/jdk/pull/14505 From duke at openjdk.org Tue Jun 20 13:42:05 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 20 Jun 2023 13:42:05 GMT Subject: RFR: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer [v4] In-Reply-To: References: <8zoAgs1KCQSNKcvYI0CJ3ne-oIW91pnUwa4zyQ617ew=.b4a8d7d2-c0a8-4e48-b146-4172c8460350@github.com> Message-ID: On Mon, 19 Jun 2023 12:56:41 GMT, Christian Hagedorn wrote: >> Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8309266: cosmetic change in assert > > Looks good! Thanks @chhagedorn and @rwestrel for the feedback and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14490#issuecomment-1598811919 From duke at openjdk.org Tue Jun 20 13:56:17 2023 From: duke at openjdk.org (Eric Nothum) Date: Tue, 20 Jun 2023 13:56:17 GMT Subject: Integrated: JDK-8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 10:43:53 GMT, Eric Nothum wrote: > **Acknowledgments**: Thanks to @quadhier for the preliminary work on this issue: https://github.com/openjdk/jdk/pull/14353 > > **JDK-8309266**: TestLoopLimitOverflowDuringCCP.java causes an assertion error (overflow check) in LoopLimitNode::Value. To fix the issue I added a check in LoopLimitNode::Value that verifies that the input nodes are ConI type nodes. > > Previously, TestLoopLimitOverflowDuringCCP would cause the assertion error in LoopLimitNode::Value, during PhaseCCP::analyze. The problem originated from PhaseCCP initializing all types to TOP, resulting in the Phi node from `int limit = flag ? 1000 : Integer.MAX_VALUE` being temporarily considered as Integer.MAX_VALUE. This happens as the Node for the Integer.MAX_VALUE case was already analyzed by CCP while the Node for the 1000 case was still initialized to TOP. When resolving the value of the Phi node, Integer.MAX_VALUE and TOP get merged to Integer.MAX_VALUE, which is then processed by LoopLimitNode::Value as a constant, resulting in the integer overflow. > > By checking that the input nodes are ConI nodes in LoopLimitNode::Value, we avoid Phi nodes being misinterpreted during PhaseCCP. If the Phi nodes turn out to be constant they should rather be first transformed to ConI nodes. This pull request has now been integrated. Changeset: 4a9cc8a0 Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/4a9cc8a000cafb3ad77a33710054b567e8553652 Stats: 61 lines in 2 files changed: 59 ins; 0 del; 2 mod 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14490 From ecaspole at openjdk.org Tue Jun 20 14:01:14 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Tue, 20 Jun 2023 14:01:14 GMT Subject: RFR: 8309976: A JMH to create a lot of classes and compiled methods [v2] In-Reply-To: References: Message-ID: <-aFFr3LZWMdvUMhkoJBXHkcOYnA8Zk6ieBXVM0j5Ufc=.de0557d9-d084-4017-bf42-71286acb825b@github.com> > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. Eric Caspole has updated the pull request incrementally with two additional commits since the last revision: - Update test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java Co-authored-by: Aleksey Shipil?v - Update test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14521/files - new: https://git.openjdk.org/jdk/pull/14521/files/6922b9dd..e182ab98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14521/head:pull/14521 PR: https://git.openjdk.org/jdk/pull/14521 From kvn at openjdk.org Tue Jun 20 16:10:15 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 Jun 2023 16:10:15 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 11:13:24 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308606 > - bench000: add other type examples > - bench100: added versions for more types (misaligned load store) > - Add vm.flagless back in for LoopArrayIndexComputeTest.java > - removed AlignVector from IR framework again, do that in RFE > - IR whitelist AlignVector, require it false in the newly added tests > - Merge branch 'master' into JDK-8308606 > - Merge branch 'master' into JDK-8308606 > - remove some outdated comments > - Benchmark VectorAlignment > - ... and 4 more: https://git.openjdk.org/jdk/compare/b7339abc...0740b7bc Nice re-write. Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14096#pullrequestreview-1488512253 From vlivanov at openjdk.org Tue Jun 20 16:47:16 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jun 2023 16:47:16 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: <-A7bd8C0q5o1WuRSeSkYYnUoApV4s9uijPmiNB2Wteo=.c5bc944c-88a3-4228-bd41-091ac6c8fb1d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <-A7bd8C0q5o1WuRSeSkYYnUoApV4s9uijPmiNB2Wteo=.c5bc944c-88a3-4228-bd41-091ac6c8fb1d@github.com> Message-ID: <72OcyhmFKGyTwDy8LQ0blp5HG5dg5l9OsU5dh9osVxo=.73b3a79e-ff24-4f41-b39b-650a9036ee76@github.com> On Mon, 19 Jun 2023 15:36:15 GMT, Cesar Soares Lucas wrote: > Can I ask what requirements are there for a product flag? Product flags are treated as part of public API of the JVM. So, changes in behavior have to through CSR process. Also, a product flag has to be deprecated/obsoleted first before it can be removed which takes multiple releases to happen. Better to avoid introducing new product flags unless it is well-justified or necessary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1599149350 From kvn at openjdk.org Tue Jun 20 16:55:04 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 Jun 2023 16:55:04 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 10:36:21 GMT, Emanuel Peter wrote: > This is another case where imprecise type computation leads to corrupted control flow. > > The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. > > Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). > > Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. > > **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. > > **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. > > I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). > > **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. > > **Testing** Attached one regre... src/hotspot/share/opto/addnode.cpp line 439: > 437: if (hi < (jlong)min_jint || lo > (jlong)max_jint) { > 438: // [lo, hi] is outside of int range -> never valid > 439: assert(false, "is there any such case?"); // TODO remove Please, dump nodes/types to get more information when we hit this assert. src/hotspot/share/opto/addnode.cpp line 462: > 460: assert((jlong)min_jint <= lo && > 461: lo <= hi && > 462: hi <= (jlong)max_jint, "no overflow"); May be print `lo`, `hi` values in assert. test/hotspot/jtreg/compiler/loopopts/TestLoopIncrNoOverflow.java line 29: > 27: * @summary With JDK-8303466 the unroll-limit became more precise, and does not overflow. > 28: * We must ensure that the type of the loop incr also does not overflow. > 29: * @run main/othervm -Xcomp -XX:-TieredCompilation Should we require C2 enabled for this test? `StressIGVN` is C2 flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14331#discussion_r1235522017 PR Review Comment: https://git.openjdk.org/jdk/pull/14331#discussion_r1235535906 PR Review Comment: https://git.openjdk.org/jdk/pull/14331#discussion_r1235517492 From kvn at openjdk.org Tue Jun 20 17:19:05 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 Jun 2023 17:19:05 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 14:42:14 GMT, Emanuel Peter wrote: > Removed a spurious assert before optimization bailout. > > I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. > > I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. > > I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. > Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** Looks fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14494#pullrequestreview-1488632276 From ecaspole at openjdk.org Tue Jun 20 20:13:19 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Tue, 20 Jun 2023 20:13:19 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v3] In-Reply-To: References: Message-ID: > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: Fix copyright header and apply Alekseys comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14521/files - new: https://git.openjdk.org/jdk/pull/14521/files/e182ab98..8fdb3c96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=01-02 Stats: 23 lines in 1 file changed: 0 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/14521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14521/head:pull/14521 PR: https://git.openjdk.org/jdk/pull/14521 From vlivanov at openjdk.org Tue Jun 20 23:39:06 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 Jun 2023 23:39:06 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Mon, 19 Jun 2023 12:22:56 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - more test failures > - Merge branch 'master' into JDK-8308869 > - whitespaces > - test failures > - review > - 32 bit fix > - white spaces > - fix & test Nice enhancement, Roland! (I believe the monomorphic case you refer to is covered by `GraphKit::maybe_cast_profiled_receiver()`.) I'm trying to understand what are the implications if you generate profile-based type guards early (during parsing). Any particular benefits from late expansion or downsides from early expansion? I'd expect that additional type info may be helpful (even though bimorphic/polymorphic cases are less useful than monomorphic one). Handling it during parsing would relieve `SubTypeCheck` from caring about profile data and enable placing an uncommon trap on slow path for bimorphic case. (Doing that during macro expansion would require `SubTypeCheckNode` to keep JVM state.) What are the implications if you make profiling changes separately? My understanding is you gain access to accurate probabilities and ability to distinguish between bi-/poly-/mega-morphic cases. Is it correct? Speaking of alternative ways to pass profile info around, you could just embed `ciCallProfile` in `SubTypeCheck`. Any particular reasons not to do so? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1599733943 From pli at openjdk.org Wed Jun 21 01:28:07 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 21 Jun 2023 01:28:07 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 14 Jun 2023 11:13:24 GMT, Emanuel Peter wrote: >> This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. >> >> As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. >> >> This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. >> >> Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. >> >> Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). >> >> **Changes to Tests** >> I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. >> >> `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into JDK-8308606 > - bench000: add other type examples > - bench100: added versions for more types (misaligned load store) > - Add vm.flagless back in for LoopArrayIndexComputeTest.java > - removed AlignVector from IR framework again, do that in RFE > - IR whitelist AlignVector, require it false in the newly added tests > - Merge branch 'master' into JDK-8308606 > - Merge branch 'master' into JDK-8308606 > - remove some outdated comments > - Benchmark VectorAlignment > - ... and 4 more: https://git.openjdk.org/jdk/compare/1b5dbcd6...0740b7bc Marked as reviewed by pli (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14096#pullrequestreview-1489475059 From haosun at openjdk.org Wed Jun 21 02:55:31 2023 From: haosun at openjdk.org (Hao Sun) Date: Wed, 21 Jun 2023 02:55:31 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: References: Message-ID: > `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. > > As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: > > > JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value > Option 'UseSHA3Intrinsics' should be enabled by default > > > The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. > > Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. > > Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. > > Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. > > [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 Hao Sun has updated the pull request incrementally with one additional commit since the last revision: Add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14551/files - new: https://git.openjdk.org/jdk/pull/14551/files/f41a2ebd..1304e146 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14551&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14551&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14551.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14551/head:pull/14551 PR: https://git.openjdk.org/jdk/pull/14551 From haosun at openjdk.org Wed Jun 21 02:55:31 2023 From: haosun at openjdk.org (Hao Sun) Date: Wed, 21 Jun 2023 02:55:31 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: <1RyJNK0oT-Z8AuDV8MW9WievErroIxB9ly7AZpDXAMM=.72276808-68cf-462e-af75-51fb3e7eb7ef@github.com> References: <1RyJNK0oT-Z8AuDV8MW9WievErroIxB9ly7AZpDXAMM=.72276808-68cf-462e-af75-51fb3e7eb7ef@github.com> Message-ID: On Tue, 20 Jun 2023 08:35:02 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > test/hotspot/jtreg/compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java line 31: > >> 29: * @library /test/lib / >> 30: * @requires vm.flagless >> 31: * @requires os.arch == "aarch64" & os.family == "mac" > > I think this requires a comment. Something to the effect that we don't always enable the use of SHA3 instructions because they don't always help. Agree. Updated in the latest commit. Would you mind taking another look? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14551#discussion_r1236220561 From fgao at openjdk.org Wed Jun 21 03:33:23 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 21 Jun 2023 03:33:23 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes Message-ID: Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Commit messages: - 8308340: C2: Idealize Fma nodes Changes: https://git.openjdk.org/jdk/pull/14576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308340 Stats: 581 lines in 18 files changed: 368 ins; 117 del; 96 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From epeter at openjdk.org Wed Jun 21 06:43:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 06:43:34 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 16:07:30 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8308606 >> - bench000: add other type examples >> - bench100: added versions for more types (misaligned load store) >> - Add vm.flagless back in for LoopArrayIndexComputeTest.java >> - removed AlignVector from IR framework again, do that in RFE >> - IR whitelist AlignVector, require it false in the newly added tests >> - Merge branch 'master' into JDK-8308606 >> - Merge branch 'master' into JDK-8308606 >> - remove some outdated comments >> - Benchmark VectorAlignment >> - ... and 4 more: https://git.openjdk.org/jdk/compare/4bec7aef...0740b7bc > > Nice re-write. Looks good to me. Thanks @vnkozlov @pfustc @fg1417 for the suggestions and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1600267740 From epeter at openjdk.org Wed Jun 21 06:43:35 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 06:43:35 GMT Subject: Integrated: 8308606: C2 SuperWord: remove alignment checks when not required In-Reply-To: References: Message-ID: On Tue, 23 May 2023 07:16:48 GMT, Emanuel Peter wrote: > This change should strictly expand the set of vectorized loops. And this change also makes `SuperWord` conceptually simpler. > > As discussed in https://github.com/openjdk/jdk/pull/12350, we should remove the alignment checks when alignment is actually not required (either by the hardware or explicitly asked for with `-XX:+AlignVector`). We did not do it directly in the same task to avoid too many changes of behavior. > > This alignment check was originally there instead of a proper dependency checker. Requiring alignments on the packs per memory slice meant that all vector lanes were aligned, and there could be no cross-iteration dependencies that lead to cycles. But this is not general enough (we may for example allow the vector lanes to cross at some point). And we now have proper independence checks in `SuperWord::combine_packs`, as well as the cycle check in `SuperWord::schedule`. > > Alignment is nice when we can make it happen, as it ensures that we do not have memory accesses across cache lines. But we should not prevent vectorization just because we cannot align all memory accesses for the same memory slice. As the benchmark shows below, we get a good speedup from vectorizing unaligned memory accesses. > > Note: this reduces the `CompileCommand Option Vectorize` flag to now only controlling if we use the `CloneMap` or not. Read more about that in this PR https://github.com/openjdk/jdk/pull/13930. In the benchmarks below you can find some examples that only vectorize with or only vectorize without the `Vectorize` flag. My goal is to eventually try out both approaches and pick the better one, removing the need for the flag entirely (see "**Unifying multiple SuperWord Strategies and beyond**" below). > > **Changes to Tests** > I could remove the `CompileCommand Option Vectorize` from `TestDependencyOffsets.java`, which means that those loops now vectorize without the need of the flag. > > `LoopArrayIndexComputeTest.java` had a few "negative" tests that expeced that there is no vectorization because of "dependencies". But they were not real dependencies since they were "read forward" cases. I now check that those do vectorize, and added symmetric tests that are "read backward" cases which should currently not vectorize. However, these are still not "real dependencies" either: the arrays that are used could in theory be proven to be not equal, and then the dependencies could be dropped. But I think it is ok to leave them as "negative" tests for now, until we add such opti... This pull request has now been integrated. Changeset: 886ac1c2 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/886ac1c261a1b7e91e3981e32810c405a0d90329 Stats: 592 lines in 5 files changed: 425 ins; 75 del; 92 mod 8308606: C2 SuperWord: remove alignment checks when not required Reviewed-by: fgao, kvn, pli ------------- PR: https://git.openjdk.org/jdk/pull/14096 From jwaters at openjdk.org Wed Jun 21 06:47:06 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 21 Jun 2023 06:47:06 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: <0zdhhpte62afwO6bR4wuM19NtaZ9tIUudr4f4S2sXdQ=.7ee922ac-8e24-45e5-8148-423cbbc348ee@github.com> On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Bumping ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1600271647 From roland at openjdk.org Wed Jun 21 07:05:07 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 Jun 2023 07:05:07 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> On Tue, 20 Jun 2023 23:35:59 GMT, Vladimir Ivanov wrote: > I'm trying to understand what are the implications if you generate profile-based type guards early (during parsing). Any particular benefits from late expansion or downsides from early expansion? I'd expect that additional type info may be helpful (even though bimorphic/polymorphic cases are less useful than monomorphic one). Late expansion allows some optimizations to trigger that wouldn't otherwise. Something like: if (o.klass = profile1) { goto success; } else if (o.klass = profile2) { goto success; } else if (o instanceof super) { goto success; } else { goto failure; } is unlikely to optimize as well as: if (o instanceof super) { } with split if, loop predication or finding a dominating if with an identical type check. > Handling it during parsing would relieve `SubTypeCheck` from caring about profile data and enable placing an uncommon trap on slow path for bimorphic case. (Doing that during macro expansion would require `SubTypeCheckNode` to keep JVM state.) Carrying the JVM state in the `SubTypeCheck` looks like too much extra complexity to me. With profiling of branches, we could get an uncommon trap in: if (!(a instanceof super)) { // never taken } even if profile data reports more than a single receiver at the checkcast. > What are the implications if you make profiling changes separately? My understanding is you gain access to accurate probabilities and ability to distinguish between bi-/poly-/mega-morphic cases. Is it correct? Pushing this without the change to profile data collection? We have no way to tell if, when profile data reports 2 classes, they are common or not. So yes, that's correct. > Speaking of alternative ways to pass profile info around, you could just embed `ciCallProfile` in `SubTypeCheck`. Any particular reasons not to do so? It felt easier in terms of memory management. If we have some extra data embedded in the `SubTypeCheck` node, is it a pointer or the full data structure? If it is a pointer, do we clone the data on node clone? Try to reclaim memory on node destruction? Where should the data live so it's not destroyed while we need it but doesn't live longer that's required? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1600291223 From thartmann at openjdk.org Wed Jun 21 07:09:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 07:09:14 GMT Subject: Integrated: 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 In-Reply-To: References: Message-ID: <-CAXCwPPcKccdHd62S0WEwfBZqRpufmqsA4WhgNV4DA=.d9df89c8-ed5e-40f2-960b-418d3abebd82@github.com> On Mon, 19 Jun 2023 11:27:10 GMT, Tobias Hartmann wrote: > The fix for [JDK-8282797](https://bugs.openjdk.org/browse/JDK-8282797) missed that randomly generated Compile Commands can be invalid. I verified with the corresponding seeds that all occurrences of this intermittent failure are now fixed. I also removed `Executor::isValid` because it's unused. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 67fbd873 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/67fbd87378a9b3861f1676977f9f2b36052add29 Stats: 15 lines in 4 files changed: 3 ins; 4 del; 8 mod 8310143: RandomCommandsTest fails due to unexpected VM exit code after JDK-8282797 Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14538 From chagedorn at openjdk.org Wed Jun 21 07:12:04 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 07:12:04 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 14:42:14 GMT, Emanuel Peter wrote: > Removed a spurious assert before optimization bailout. > > I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. > > I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. > > I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. > Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** Looks good! test/hotspot/jtreg/compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java line 43: > 41: public static void main(String[] args) { > 42: TestFramework.runWithFlags("-Xbatch", "-XX:-TieredCompilation", > 43: "-XX:CompileCommand=compileonly,compiler.loopopts.superword.TestUnorderedReductionPartialVectorization::test*"); It should probably also trigger without the flags by just specifying `TestFramework.run()` as `test1()` is not using other methods and the IR framework will implicitly use `-Xbatch` and wait for the compilation of `test1()` to be finished. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14494#pullrequestreview-1489912364 PR Review Comment: https://git.openjdk.org/jdk/pull/14494#discussion_r1236475698 From rcastanedalo at openjdk.org Wed Jun 21 08:08:06 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 Jun 2023 08:08:06 GMT Subject: RFR: 8310220: IGV: dump graph after each IGVN step at level 4 In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 10:21:57 GMT, Roberto Casta?eda Lozano wrote: > This changeset instruments Iterative GVN (IGVN) in C2 to dump the Ideal graph after each effective step (i.e. when the graph is rewritten or the recorded types are refined). This enables fine-grain tracing of IGVN transformation sequences using Ideal Graph Visualizer. This technique has proved useful for the investigation of [JDK-8310220](https://bugs.openjdk.org/browse/JDK-8310220), and can be also useful for educational purposes: > > ![igv-level4](https://github.com/openjdk/jdk/assets/8792647/56dc9729-d5eb-44f3-8614-dc72e17f1bef) > > These new dumps are emitted at print level 4 (`PrintIdealGraphLevel=4`), the highest level of detail. > > Following [feedback](https://bugs.openjdk.org/browse/JDK-8310220?focusedCommentId=14590132&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14590132) and offline discussions with Christian Hagedorn, the changeset also dumps the Ideal graph before and after IGVN at print level 3. This makes it possible to identify the source of graph changes between IGVN and other phases such as loop transformations. Finally, the existing phase `PHASE_MACH_ANALYSIS` is also promoted to print level 3, since it prints a single graph per compilation unit only (see print level documentation updates in this changeset): > > ![igv-level3](https://github.com/openjdk/jdk/assets/8792647/9bccc78b-13b8-428d-8c98-ef3f0f769f4c) > > The changeset increases the number of graph dumps per compilation at print levels 3 and 4 by 30-40%. This additional overhead is in my opinion justified by the value provided by the additional dumps, and the high print level at which they are produced. > > #### Testing > > - tier1-3 (linux-x64; release and debug mode). > > - Verified that thousands of new IGVN graph dumps are correctly opened and visualized with the Ideal Graph Visualizer, at print levels 3 and 4. Putting this changeset on hold, since I will not be able to react to reviewers' feedback in the next weeks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14537#issuecomment-1600380837 From aph at openjdk.org Wed Jun 21 08:20:06 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Jun 2023 08:20:06 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 02:55:31 GMT, Hao Sun wrote: >> `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. >> >> As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: >> >> >> JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value >> Option 'UseSHA3Intrinsics' should be enabled by default >> >> >> The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. >> >> Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. >> >> Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. >> >> Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. >> >> [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14551#pullrequestreview-1490095910 From jwaters at openjdk.org Wed Jun 21 08:22:05 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 21 Jun 2023 08:22:05 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: <8mSr3SYUR0t4eP-T6eJxu4EM4jhnz-2YMO8RBtrAoIQ=.99c452b4-dc5c-4579-a89b-d46d744a6579@github.com> On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1600400436 From pli at openjdk.org Wed Jun 21 08:36:33 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 21 Jun 2023 08:36:33 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization Message-ID: ## TL;DR This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. ------------- Commit messages: - JDK-8308994: C2: Re-implement experimental post loop vectorization Changes: https://git.openjdk.org/jdk/pull/14581/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308994 Stats: 2533 lines in 42 files changed: 1951 ins; 521 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/14581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14581/head:pull/14581 PR: https://git.openjdk.org/jdk/pull/14581 From pli at openjdk.org Wed Jun 21 08:36:33 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 21 Jun 2023 08:36:33 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. ## Background & Problems Post loop vectorization takes advantage of vector mask (predicate) features of some hardware platforms, such as x86 AVX-512 and AArch64 SVE, to vectorize tail iterations of loops for better performance. The existing implementation in the C2 compiler has a long history. It was first implemented in [JDK-8153998](https://bugs.openjdk.org/browse/JDK-8153998) in 2016 under a C2's experimental feature PostLoopMultiversioning to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, We took over [JDK-8183390](https://bugs.openjdk.org/browse/JDK-8183390) to fix and re-enable this feature. Several issues were fixed and AArch64 vector mask support was added at that time. As we proposed to make post loop vectorization non-experimental in future JDK releases, we did some stress tests early in this year but found more problems inside. The problems include stability, maintainability and performance. 1. Stability Multiple C2 crash or mis-compilation issues related to post loop vectorization were filed on JBS, including [JDK-8301657](https://bugs.openjdk.org/browse/JDK-8301657), [JDK-8301904](https://bugs.openjdk.org/browse/JDK-8301904), [JDK-8301944](https://bugs.openjdk.org/browse/JDK-8301944), [JDK-8304774](https://bugs.openjdk.org/browse/JDK-8304774), [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) and perhaps more with recent C2 patches. 2. Maintainability The original implementation is based on multi-versioned post loops and the code is mixed in SuperWord. But post loop vectorization does not actually use the SLP algorithm. So there is a lot of special handling for post loops in current SuperWord code. As more and more features are added in SuperWord, the legacy code is becoming more and more difficult to maintain and extend. 3. Performance Post loop vectorization was expected to bring obvious performance benefit for small iteration loops. But JMH tests showed it didn't. A main reason is that the multi-versioned vector post loop is jumped over from main loop's minimum-trip guard if the whole loop has very few iterations (read [JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084) to learn more). The previous implementation also has limited vectorization ability, such as it can only vectorize loop statements with single data size. ## About this patch The main idea of post loop vectorization is widening scalar operations in the post loop and adding vector masks to them. The whole logic is not related to the SLP algorithm and does not depend on pre-requisite transformations of SLP, such as loop unrolling. And considering that current `superword.[cpp|hpp]` are large enough, we propose to create new source files `vmaskloop.[cpp|hpp]` and a new ideal loop phase in class `VectorMaskedLoop` for this new implementation. To reduce duplicated code, this patch still reuses `SWPointer` data structures and utilities in SuperWord. The newly added code in `vmaskloop.[cpp|hpp]` can be thought of as an implementation of a brand-new vectorizer. As its vectorization approach is completely different from SLP, the vectorization ability is not the same. A major difference is for **partially vectorizable** loops, like the below case. for (int i = 0; i < SIZE; i++) { c[i] = a[i] + b[i]; // vectorizable statement k = 3 * k + 1; // non-vectorizable statement } In a partially vectorizable loop, only some statements in the loop body can be vectorized with vector masks. In the main loop vectorization, SLP can transform the vectorizable part and leave the non-vectorizable part as it is (unrolled only). But in post loop vectorization with vector masks, statements in the loop body should be either "all transformed" or "none transformed". Hence, we implement this new vectorizer as two stages - "analysis" and "transformation". At the analysis stage, we collect enough loop information and check the vectorizability of the whole loop. The ideal graph is "read-only" at this time. If a loop is considered vectorizable after analysis, the transformation stage begins and ideal graph transformations will be performed. The entry function `VectorMaskedLoop::try_vectorize_loop()` shows the two stages. ### The analysis stage The first step of analysis is collecting all loop nodes into some data structures. In C2's ideal graph, all scalar nodes in loop statements need to be replaced by corresponding vector nodes. Considering it's better to replace an ideal node after all its input nodes are replaced, in this step, a reverse post order (RPO) of all loop body nodes is created to facilitate later node replacement work in top-down (def-use) order. As we mentioned above, a loop may have multiple statements and only part of them are vectorizable. Even in a loop where all statements are vectorizable, we may vectorize them in different ways because they have different types. Therefore, for each scalar node in the loop body, we need to know which statement(s) it belongs to. That's the goal of our second step of the analysis. As current post loop vectorization only supports non-reductions, in this step, we start from store nodes and include their input nodes recursively to find statements. Note that a node may belong to multiple statements due to common sub-expressions. The final goal of the analysis is checking the vectorizability and finding future vector element types of operations in the loop. In this patch, we have a bunch of steps of checking the vectorizability and assigning vector element types to all scalar operations. Functions called in `VectorMaskedLoop::analyze_vectorizability()` implement all these steps. If a loop has different data types (like the below case), on some platforms, we need multiple types of vector masks. for (int i = 0; i < SIZE; i++) { ib[i] = ia[i]; // copy an array of 32-bit int db[i] = da[i]; // copy an array of 64-bit double } This case needs at least two types of vector masks on AArch64, int and double (long). To make vector mask generation simpler, we only generate one vector mask per loop iteration with the smallest data size in loop statements. We call this the **root vector mask** and all other vector masks can be extracted from this root mask. This is a bit complex and will be discussed later in the below transformation stage. ### The transformation stage Ideal graph transformations start after the whole loop is confirmed to be vectorizable. The first step of transformation is creating a vector mask tree which contains all masks needed by vectorized operations in the loop. The vector mask tree is always a perfect binary tree. Its root node is the root vector mask which is generated with the smallest data size in loop statements, as we mentioned above. The depth of the tree depends on how many different data sizes are used in total. If only one data size appears, the depth is one and the tree only has the root mask node. The maximum depth is four as all Java primitive types have four different data sizes (8-bit, 16-bit, 32-bit and 64-bit). Let's look at the above case again to understand more about the tree. for (int i = 0; i < SIZE; i++) { ib[i] = ia[i]; // copy an array of 32-bit int db[i] = da[i]; // copy an array of 64-bit double } This vectorizable loop copies an int array and a double array. As Java int and double have different sizes, the numbers of vector lanes for them are also different. Suppose the hardware uses 512-bit (64-byte) vectors, then one vector operation can process 16 ints, but only 8 doubles. However, with a certain loop stride, the number of elements processed in one loop iteration for either int or double should be the same. To solve this problem, operations with larger data sizes need to be duplicated in the transformation. In addition, their vector masks needed are only slices of the root vector mask because they process fewer lanes per operation. As data sizes of Java primitive types grow by a factor of two, the vector mask of every next larger data size takes the two halves of the previous mask of smaller data size. That's why we implement a perfect binary tree to extract vector masks. In above loop case, we create a vector mask tree of two levels. The root vector mask is generated for int vector operations. Double vector operations are duplicated once so they need two different masks extracted from the lower half and higher half of the int vector mask. They are located at the left child and right child of the root mask node respectively. After the vector mask tree is created, we can widen scalar operations in the loop body and add vector masks to them. This step is mainly about replacing scalar nodes by vector nodes with extra vector mask inputs. The replacement work is done in the reverse post order established at the beginning of the analysis stage and calls some existing utility functions in `vectornode.[cpp|hpp]`. For statements with larger data sizes, we need to duplicate the vectorized nodes. The number of copies to duplicate depends on how many times the data size is compared to the loop's smallest data size. Moreover, we need to adjust some vector nodes after duplication because duplicated operations need different vector mask or memory address input compared with the original ones. The last step of the transformation stage is updating the loop stride. Intuitively, we should multiply the stride value by the vector length of the smallest data size in the loop. But that's not quite good because in this way the induction variable is added too much in loop's last iteration. Consider below case where the loop induction variable is used after the loop. int i; for (i = 0; i < SIZE; i++) { c[i] = a[i] + b[i]; } return i; // `i` is used after loop For this case, if we increase `i` by the value of vector length in the vectorized loop, the return value may be incorrect because the number of elements processed in its last iteration is usually less than the vector length. To solve this problem, we turn to create a new loop increment node and use the output of a `VectorMaskTrueCountNode` as the vectorized loop's increment (or decrement for counting-down loops). This makes sure we increase (or decrease) the loop induction variable by an exact value in each vectorized iteration. ### New IR nodes Besides `vmaskloop.[cpp|hpp]`, this patch also makes some other changes to facilitate new post loop vectorization, including new IR nodes and small changes in C2's loop framework. We define below 3 new IR nodes in `vectornode.hpp`. #### `LoopVectorMaskNode` Current C2 code has an existing `VectorMaskGenNode` for vector mask generation which is used in the original implementation and some arraycopy intrinsics. This patch does not reuse it as it does not apply to large iteration post loops on x86. The `VectorMaskGenNode` uses an x86 [`BZHI`](https://www.felixcloutier.com/x86/bzhi.html) instruction which can only take low 8 bits of the length register while clearing upper bits of the mask value. In other words, it can generate correct vector masks only if the length input is less than 256. But post loop trip count can be greater than 256 due to super-unrolling of the main loop. In this new implementation, we use `LoopVectorMaskNode` which has two versions of matching rules on x86, one for small iteration loops and another general one for potentially large iterations. The general rule has additional "cmp + branch" instructions to generate all-true vector masks if the length input is greater or equal to 256. It's like a kind of saturation op eration. #### `ExtractHighMaskNode` & `ExtractLowMaskNode` These two nodes are used in pair in vector mask tree to extract vector mask slices from their parent. Please note that vector mask formats on x86 and AArch64 are different. X86 vector masks are always 64-bit but AArch64 uses scalable vector masks (one bit of vector mask corresponds to one byte in the vector). Their bit representations are not the same even for the same size of vector masks. For example, suppose a vector mask indicates that only the first five int elements are active in a 512-bit vector. Both x86 and AArch64 uses 64-bit vector masks. On x86, the five 1's are compacted at the least significant bits, like below. 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011111 But on AArch64, 1's are not compacted. Instead, each least significant bit per lane represents the activity, like below. 00000000 00000000 00000000 00000000 00000000 00000001 00010001 00010001 Therefore, this patch uses different operations for mask extraction in backend rules. On x86, a mask right shift is used for extracting the high part and no instruction is needed for the low part. On AArch64, a pair of mask unpacking instructions is used. ### New VM options The original implementation and all code related to multi-versioned post loops are deleted in this patch. This patch adds a new VM option named `UseMaskedLoop` for this new implementation. To reduce risks, we still propose to keep it experimental in the short term. You may add VM options `-XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop` to enable post loop vectorization after this patch. Another new VM option added in this patch is `TraceMaskedLoop`. You may add it to trace each step of the vectorization. ## Generated code With this patch, C2 can transform original scalar post loops to vector masked post loops on both x86 AVX-512 and AArch64 SVE. There is no code generation change in other parts of loops. for (int i = start; i < limit; i++) { c[i] = a[i] + b[i]; // a, b and c are arrays of int } For above simple case of int vector add, the assembly code generated for vector masked post loop on x86 is like below. LOOP: mov %r8d,%r9d sub %r11d,%r9d movabs $0xffff,%rcx bzhi %r9,%rcx,%rcx kmovq %rcx,%k7 vmovdqu32 0x10(%rdx,%r11,4),%zmm0{%k7}{z} vmovdqu32 0x10(%rsi,%r11,4),%zmm1{%k7}{z} kmovq %k7,%rbx popcnt %rbx,%r10 vpaddd %zmm1,%zmm0,%zmm0 vmovdqu32 %zmm0,0x10(%rax,%r11,4){%k7} add %r10d,%r11d cmp $0x21f,%r11d jl LOOP Note that above code snippet is generated for small iteration loops. For potentially large iterations, the `LoopVectorMaskNode` matches the other rule of x86 and generates additional "cmp + branch" instructions. Below is the assembly code generated from the same loop on AArch64. LOOP: whilelt p0.s, w2, w1 sbfiz x10, x2, #2, #32 add x11, x0, x10 add x12, x3, x10 add x11, x11, #0x10 ld1w {z16.s}, p0/z, [x11] add x11, x12, #0x10 ld1w {z17.s}, p0/z, [x11] cntp x11, p7, p0.s add z16.s, z16.s, z17.s add x10, x16, x10 add x10, x10, #0x10 st1w {z16.s}, p0, [x10] add w2, w2, w11 cmp w2, #0x21f b.lt LOOP ## Tests So far, we have done a bunch of tests for this re-implementation, including both correctness tests and performance tests. ### Correctness For correctness, we tested on some latest x86 AVX-512 and AArch64 SVE CPUs with full jtreg. We also ran 150,000 JavaFuzzer tests with multiple VM options. No test failure is found for current patch. We also added new IR rules in jtreg vectorization tests in this patch. ### Performance As C2's post loop is just a small portion of the original loop before iteration split. Usually, it doesn't run a lot of iterations. We don't expect the vectorization to bring obvious performance benefit if the original loop has a large trip count. So we just test JMH benchmark of small iteration loops. We write below JMH to test loops with trip counts from 0 to 200. import org.openjdk.jmh.annotations.*; import java.util.concurrent.TimeUnit; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) @Fork(value = 1) @Warmup(iterations = 3) public class TestSmallLoop { @Param({ "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", "98", "99", "100","101","102","103","104","105","106","107","108","109", "110","111","112","113","114","115","116","117","118","119", "120","121","122","123","124","125","126","127","128","129", "130","131","132","133","134","135","136","137","138","139", "140","141","142","143","144","145","146","147","148","149", "150","151","152","153","154","155","156","157","158","159", "160","161","162","163","164","165","166","167","168","169", "170","171","172","173","174","175","176","177","178","179", "180","181","182","183","184","185","186","187","188","189", "190","191","192","193","194","195","196","197","198","199", "200"}) private int size; private int[] a, b, c; @Benchmark public void addVector() { for (int i = 0; i < size; i++) { c[i] = a[i] + b[i]; } } @Setup public void setup() { a = new int[size]; b = new int[size]; c = new int[size]; } } Testing the performance of post loops is a bit tricky. The JMH numbers can be unstable because a certain loop may have different numbers of post-loop iterations in different runs. We know that C2 adjusts the pre-loop's trip count in order to make memory operations in the main loop as aligned as possible. So for a loop with array accesses, different array alignments may result in different pre-loop iterations, and eventually results in different numbers of iterations remaining in the post-loop. To eliminate this interference, we add an extra VM option `ObjectAlignmentInBytes` and set its value to `MaxVectorSize` to guarantee that post loops run the same number of iterations each time. Below line chart shows the test results on x86 AVX-512. The x-axis is the number of iterations and the y-axis is the loop execution time (smaller is better). ![JMH results on AVX-512](https://cr.openjdk.org/~pli/rfr/8308994/data-avx512.png) Before post loop vectorization, loop execution time increases in a zigzag pattern as the number of iteration increases. After post loop vectorization, the curve looks more stable. Obvious performance benefit is seen x86 when the trip count is greater than 150. We also tested the same JMH on AArch64 with 256-bit SVE and found more obvious performance benefit, even for smaller trip counts (see below chart). ![JMH results on SVE](https://cr.openjdk.org/~pli/rfr/8308994/data-sve.png) ## Future work This patch is a bit large. But there are more we can do in the future, including but not limited to extending the vectorizability to reduction operations and strided accesses. We are also considering applying this new vectorization approach to C2's normal loops (the loops before iteration-split) if it can bring more benefits compared with SLP in the future. In addition, adding more backend support, such as RVV (RISC-V Vector Extension), is also on the to-do list. ## Epilogue Thanks for reading all the way here. I'm looking forward to seeing your feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1600408191 From pli at openjdk.org Wed Jun 21 08:52:17 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 21 Jun 2023 08:52:17 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 06:39:58 GMT, Emanuel Peter wrote: >> Nice re-write. Looks good to me. > > Thanks @vnkozlov @pfustc @fg1417 for the suggestions and reviews! Hi @eme64 , I have just pushed our post loop patch to Github for review. I also attached some documents in the reply of the PR for reviewers to understand the code. See https://github.com/openjdk/jdk/pull/14581 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1600445377 From epeter at openjdk.org Wed Jun 21 08:58:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 08:58:02 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1600454917 From djelinski at openjdk.org Wed Jun 21 09:27:06 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 21 Jun 2023 09:27:06 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp line 475: > 473: topLevelWindow = hwnd; > 474: } else { > 475: EnumChildWindows(hwnd, (WNDENUMPROC) EnumChildProc, nullptr); Did you try to compile this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1236688830 From epeter at openjdk.org Wed Jun 21 09:42:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 09:42:16 GMT Subject: RFR: 8308606: C2 SuperWord: remove alignment checks when not required [v6] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:48:51 GMT, Pengfei Li wrote: >> Thanks @vnkozlov @pfustc @fg1417 for the suggestions and reviews! > > Hi @eme64 , I have just pushed our post loop patch to Github for review. I also attached some documents in the reply of the PR for reviewers to understand the code. See https://github.com/openjdk/jdk/pull/14581 @pfustc Thanks for the info, I'll look at it soon! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14096#issuecomment-1600518992 From xgong at openjdk.org Wed Jun 21 09:48:05 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 Jun 2023 09:48:05 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> Message-ID: <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> On Wed, 21 Jun 2023 08:55:19 GMT, Emanuel Peter wrote: > @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? > > https://github.com/openjdk/jdk/blob/8d899925dc281c5dabbef14d85a6df807f8d300e/src/hotspot/cpu/x86/vm_version_x86.cpp#L954-L955 > > Can you do a similar thing in `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp` ? > > It would be nice not to have to check for the flag and the features in every test, but just for the features. And the features should depend on what is present on the hardware, minus the restrictions by the flags. Thanks for looking at this PR @eme64 ! Yes, that's the main difference between aarch64 and x86 platforms. It actually makes things simpler that changing the CPU features based on the vm option. But per my understanding, CPU features are the hardware's feature which is the objective fact, while the `UseSVE` are the JVM's option that people can set different values. And they cannot be mixed. Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. I'm not sure, but I'm afraid doing such changes like x86 may have some risks in current aarch64's backend. > We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? For each SVE test, we have tried to add flag `UseSVE=1` in the test's `main` function to make sure this option is not changed by others, and current test is run with the expected sve feature. For example: public static void main(String[] args) { TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", "-XX:UseSVE=1"); } For this test, we cannot add such an option in the test file, since it is also used to test other platforms like x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1600528971 From epeter at openjdk.org Wed Jun 21 10:26:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 10:26:04 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> Message-ID: <_uo0O8gltMhBsKob4LX6Ri8o-403iDN9Kv3TB2n2hRg=.fbf02543-6c1c-4134-89e2-3331d348badc@github.com> On Wed, 21 Jun 2023 09:45:40 GMT, Xiaohong Gong wrote: >> @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? >> >> https://github.com/openjdk/jdk/blob/8d899925dc281c5dabbef14d85a6df807f8d300e/src/hotspot/cpu/x86/vm_version_x86.cpp#L954-L955 >> >> Can you do a similar thing in `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp` ? >> >> It would be nice not to have to check for the flag and the features in every test, but just for the features. And the features should depend on what is present on the hardware, minus the restrictions by the flags. > >> @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? >> >> https://github.com/openjdk/jdk/blob/8d899925dc281c5dabbef14d85a6df807f8d300e/src/hotspot/cpu/x86/vm_version_x86.cpp#L954-L955 >> >> Can you do a similar thing in `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp` ? >> >> It would be nice not to have to check for the flag and the features in every test, but just for the features. And the features should depend on what is present on the hardware, minus the restrictions by the flags. > > Thanks for looking at this PR @eme64 ! Yes, that's the main difference between aarch64 and x86 platforms. It actually makes things simpler that changing the CPU features based on the vm option. But per my understanding, CPU features are the hardware's feature which is the objective fact, while the `UseSVE` are the JVM's option that people can set different values. And they cannot be mixed. Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. I'm not sure, but I'm afraid doing such changes like x86 may have some risks in current aarch64's backend. > >> We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? > > For each SVE test, we have tried to add flag `UseSVE=1` in the test's `main` function to make sure this option is not changed by others, and current test is run with the expected sve feature. For example: > > public static void main(String[] args) { > TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", > "-XX:UseSVE=1"); > } > > For this test, we cannot add such an option in the test file, since it is also used to test other platforms like x86. @XiaohongGong I see, you are worried that it would take a lot of work in the `aarch64` code? So in the backend you are using the `UseSVE` flag instead of feature support? Are you going to add the `UseSVE` flag to all these cases too? emanuel at emanuel-oracle:/oracle-work/jdk-fork8/open$ grep sve test/hotspot/jtreg/compiler/ -r test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.AND_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.AND_VS, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VL, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.OR_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.OR_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VL, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.XOR_VS, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx512dq", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx512dq", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, I just really don't like the duplication of IR rules, and the checking of both the flag and the cpu feature. Another solution: we filter out the `sve` feature at the IR framework, if we have `UseSVE=0`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1600580843 From epeter at openjdk.org Wed Jun 21 10:34:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 Jun 2023 10:34:04 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> Message-ID: On Wed, 21 Jun 2023 09:45:40 GMT, Xiaohong Gong wrote: > Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. That is the question: what is the ground truth. On x86, the idea is to keep the flags and the cpu features in sync. So if the hardware does not support a flag value to be larger (eg trying to set `UseAVX=3` if only avx2 is available, we force it down to `UseAVX=2`). And if a flag restricts cpu features, we mask them off. Of course, the hardware may support more features than the VM assumes. But still, that allows us to do some "cross cpu" testing - we can simulate avx1 features on a avx2 machine for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1600591097 From thartmann at openjdk.org Wed Jun 21 10:36:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:36:24 GMT Subject: [jdk21] RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Message-ID: Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Changes: https://git.openjdk.org/jdk21/pull/49/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=49&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309266 Stats: 61 lines in 2 files changed: 59 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk21/pull/49.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/49/head:pull/49 PR: https://git.openjdk.org/jdk21/pull/49 From thartmann at openjdk.org Wed Jun 21 10:36:30 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:36:30 GMT Subject: [jdk21] RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic Message-ID: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8310126: C1: Missing receiver null check in Reference::get intrinsic Changes: https://git.openjdk.org/jdk21/pull/51/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=51&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310126 Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/51.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/51/head:pull/51 PR: https://git.openjdk.org/jdk21/pull/51 From thartmann at openjdk.org Wed Jun 21 10:36:43 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:36:43 GMT Subject: [jdk21] RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording Message-ID: Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8309498: [JVMCI] race in CallSiteTargetValue recording Changes: https://git.openjdk.org/jdk21/pull/50/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=50&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309498 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk21/pull/50.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/50/head:pull/50 PR: https://git.openjdk.org/jdk21/pull/50 From thartmann at openjdk.org Wed Jun 21 10:36:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:36:54 GMT Subject: [jdk21] RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 Message-ID: Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8308855: ARM32: TestBooleanVector crashes after 8300257 Changes: https://git.openjdk.org/jdk21/pull/48/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=48&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308855 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/48.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/48/head:pull/48 PR: https://git.openjdk.org/jdk21/pull/48 From chagedorn at openjdk.org Wed Jun 21 10:44:04 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 10:44:04 GMT Subject: [jdk21] RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:27:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. > > Thanks, > Tobias Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk21/pull/48#pullrequestreview-1490402325 From chagedorn at openjdk.org Wed Jun 21 10:46:03 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 10:46:03 GMT Subject: [jdk21] RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: <9ipVCW7lfoKS1k7tQfmgztN18MN9l2SR32uPPlgnmzE=.a301743f-faee-4858-aa5d-f06a2b7f4e15@github.com> On Wed, 21 Jun 2023 10:28:11 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/49#pullrequestreview-1490404207 From chagedorn at openjdk.org Wed Jun 21 10:47:03 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 10:47:03 GMT Subject: [jdk21] RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:28:33 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/50#pullrequestreview-1490405626 From chagedorn at openjdk.org Wed Jun 21 10:47:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 10:47:05 GMT Subject: [jdk21] RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> References: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Message-ID: On Wed, 21 Jun 2023 10:28:51 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/51#pullrequestreview-1490407201 From thartmann at openjdk.org Wed Jun 21 10:52:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:52:02 GMT Subject: [jdk21] RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: <4UQXnNCf0-AxTCLHzlIVlvulhEIirM7ktD6bxzXzTNs=.2691b31e-6e86-4a69-a262-35868e067937@github.com> On Wed, 21 Jun 2023 10:28:33 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/50#issuecomment-1600613875 From thartmann at openjdk.org Wed Jun 21 10:52:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:52:12 GMT Subject: [jdk21] RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:28:11 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/49#issuecomment-1600613794 From thartmann at openjdk.org Wed Jun 21 10:52:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:52:12 GMT Subject: [jdk21] RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> References: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Message-ID: On Wed, 21 Jun 2023 10:28:51 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/51#issuecomment-1600613942 From thartmann at openjdk.org Wed Jun 21 10:52:14 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 10:52:14 GMT Subject: [jdk21] RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:27:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/48#issuecomment-1600613712 From thartmann at openjdk.org Wed Jun 21 11:01:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 11:01:04 GMT Subject: RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 08:56:26 GMT, Roland Westrelin wrote: > Before 8275201, loading the element klass of an array returned: > > > TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); > > > that is exact if the array type is exact. I changed it to: > > > tkls->is_aryklassptr()->elem(); > > > When the array type is exact (newly allocated array for instance) but > the element class has subclasses, this doesn't return an exact class > (so the logic is different from the one that was there before). That > affects array store checks that no longer constant fold. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14536#pullrequestreview-1490429175 From rcastanedalo at openjdk.org Wed Jun 21 12:00:05 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 Jun 2023 12:00:05 GMT Subject: RFR: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs [v3] In-Reply-To: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> References: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> Message-ID: On Tue, 20 Jun 2023 10:21:51 GMT, Daniel Skantz wrote: >> ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. >> >> Testing: tier1-tier3. >> >> Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. >> Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. >> >> Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. >> >> ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test #iterations Thanks for addressing the suggestions and for the additional explanation, looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14492#pullrequestreview-1490552606 From djelinski at openjdk.org Wed Jun 21 12:07:09 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 21 Jun 2023 12:07:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Compilation should be a good enough test for the `long` -> `jint` changes. These changes are supposed to address [this difference](https://learn.microsoft.com/en-us/cpp/overview/cpp-conformance-improvements-2019?view=msvc-170#overload-resolution-involving-integral-overloads-and-long-arguments) between MSVC behavior and the C++ standard. When compiled with `-permissive-` or with a compiler that puts more emphasis on standards conformance (like clang), the current code fails to compile. I verified some of the generated binaries by comparing the results of `dumpbin /all` before and after the change. Most of the time the changes were limited to timestamp, UUID and mangled function names. `Jaccesswalker.exe` had a few more changes because of a changed format string. None of the changed function names in client libs area are externally visible, but there are some observable changes to `c2v` functions exported from jvm.dll. I had to revert some of the `NULL`->`nullptr` changes to get this to compile; I assume this will be addressed before this PR is merged. Judging by the PR title, these changes don't belong here anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1600710570 From jwaters at openjdk.org Wed Jun 21 12:13:09 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 21 Jun 2023 12:13:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 1 Jun 2023 11:49:24 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fix the code that is actually warning Yeah, those were code cleanups I thought I could do out of convenience, I'll revert them all before this goes in ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1600719023 From duke at openjdk.org Wed Jun 21 12:47:09 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 21 Jun 2023 12:47:09 GMT Subject: RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB Message-ID: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> Removed TLAB from the IR Framework whitelist. If TLAB allocations are disabled by `-XX:-UseTLAB` the IR verification can fail, therefore `"TLAB"` should not be withelisted. See [JDK-8295210](https://bugs.openjdk.org/browse/JDK-8295210) for an example of such a failure. ------------- Commit messages: - removed TLAB from whitelist Changes: https://git.openjdk.org/jdk/pull/14583/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14583&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295210 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14583/head:pull/14583 PR: https://git.openjdk.org/jdk/pull/14583 From thartmann at openjdk.org Wed Jun 21 12:53:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 21 Jun 2023 12:53:04 GMT Subject: RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> References: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> Message-ID: On Wed, 21 Jun 2023 11:29:51 GMT, Eric Nothum wrote: > Removed TLAB from the IR Framework whitelist. If TLAB allocations are disabled by `-XX:-UseTLAB` the IR verification can fail, therefore `"TLAB"` should not be withelisted. See [JDK-8295210](https://bugs.openjdk.org/browse/JDK-8295210) for an example of such a failure. Looks good and trivial. Thanks for fixing. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14583#pullrequestreview-1490662542 From roland at openjdk.org Wed Jun 21 13:33:39 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 Jun 2023 13:33:39 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions Message-ID: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> This change adds a new loop opts pass to optimize redundant conditions such as the second one in: if (i < 10) { if (i < 42) { In the branch of the first if, the type of i can be narrowed down to [min_jint, 9] which can then be used to constant fold the second condition. The compiler already keeps track of type[n] for every node in the current compilation unit. That's not sufficient to optimize the snippet above though because the type of i can only be narrowed in some sections of the control flow (that is a subset of all controls). The solution is to build a new table that tracks the type of n at every control c type'[n, root] = type[n] // initialized from igvn's type table type'[n, c] = type[n, idom(c)] This pass iterates over the CFG looking for conditions such as: if (i < 10) { that allows narrowing the type of i and updates the type' table accordingly. At a region r: type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) For a Phi phi at a region r: type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) Once a type is narrowed, uses are enqueued and their types are computed by calling the Value() methods. If a use's type is narrowed, it's recorded at c in the type' table. Value() methods retrieve types from the type table, not the type' table. To address that issue while leaving Value() methods unchanged, before calling Value() at c, the type table is updated so: type[n] = type'[n, c] An exception is for Phi::Value which needs to retrieve the type of nodes are various controls: there, a new type(Node* n, Node* c) method is used. For most n and c, type'[n, c] is likely the same as type[n], the type recorded in the global igvn table (that is there shouldn't be many nodes at only a few control for which we can narrow the type down). As a consequence, the types'[n, c] table is implemented with: - At c, narrowed down types are stored in a GrowableArray. Each entry records the previous type at idom(c) and the narrowed down type at c. - The GrowableArray of type updates is recorded in a hash table indexed by c. If there's no update at c, there's no entry in the hash table. This pass operates in 2 steps: - it first iterates over the graph looking for conditions that narrow the types of some nodes and propagate type updates to uses until a fix point. - it transforms the graph so newly found constant nodes are folded. The new pass is run on every loop opts. There are a couple reasons for that: 1- one of the goals is to avoid the many bugs we've been hitting where a CastII nodes capture a type that, at some point, conflicts with its input type. The CastII becomes top, that control path is dead but the compiler fails to prove it. By running the new pass often, there's a better chance that these inconsistencies can be caught and fixed. 2- I tried running it less often and ran into inconsistencies where: when the new pass is run, it eliminates a range check. At a later point, the range check CastII becomes top. If the pass is not run then, we hit the problem of 1- above. If it is run, then it can prove the path that leads to the CastII is dead. I looked at compilation time by running ctw on java.base and looking at times reported by CITime. Overall, the new pass adds +50% to the time spent in loop opts and about 5% to total compilation time. I already spent quite a bit of time trying to decrease the compilation time overhead and I don't see any obvious bottleneck at this point. The pass runs until a fixed point is reached and can go over the cfg several times. If it goes only once over the cfg, the compilation time overhead goes down to ~40%. That's a fairly small improvement in terms of compilation time in my opinion and, I would say, it's better to let it run until a fixed point and find all constants that it can find. There are several changes that I had to make outside the new pass itself. Some of them could be pushed separately eventhough they are solving issues that may not exist without the new pass. In particular, I had to add assert predicates for all eliminated conditions during range check eliminations. ------------- Commit messages: - conditional propagation Changes: https://git.openjdk.org/jdk/pull/14586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8275202 Stats: 2924 lines in 29 files changed: 2750 ins; 110 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/14586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14586/head:pull/14586 PR: https://git.openjdk.org/jdk/pull/14586 From ecaspole at openjdk.org Wed Jun 21 13:35:19 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 21 Jun 2023 13:35:19 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v3] In-Reply-To: References: Message-ID: <6T4PgyTA5LRf4A-ltcP-uIMVwPhYWaVaolqA4c5SejI=.d69aeaf4-183e-43db-9d3a-bbe9ed2529da@github.com> On Tue, 20 Jun 2023 20:13:19 GMT, Eric Caspole wrote: >> Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. >> This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. >> The defaults are set very low by default and the intent is that they would be customized for any given study. > > Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright header and apply Alekseys comments Aleksey, I think I applied all your comments, is this OK with you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14521#issuecomment-1600843509 From chagedorn at openjdk.org Wed Jun 21 13:59:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 Jun 2023 13:59:08 GMT Subject: RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> References: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> Message-ID: <_gmU_i8NMoo9u8hVTpsjfToPtTk_z0BpmnMiq4V0PWo=.d4529a64-5690-46c0-91b3-fa4d36afd5b7@github.com> On Wed, 21 Jun 2023 11:29:51 GMT, Eric Nothum wrote: > Removed TLAB from the IR Framework whitelist. If TLAB allocations are disabled by `-XX:-UseTLAB` the IR verification can fail, therefore `"TLAB"` should not be withelisted. See [JDK-8295210](https://bugs.openjdk.org/browse/JDK-8295210) for an example of such a failure. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14583#pullrequestreview-1490811478 From aph at openjdk.org Wed Jun 21 14:30:18 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Jun 2023 14:30:18 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Mon, 19 Jun 2023 02:06:27 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update MaskQueryOperationsBenchmark.java Something is wrong with your setup. You should be seeing this: `# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)` not this: `# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1600940540 From roland at openjdk.org Wed Jun 21 14:36:21 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 Jun 2023 14:36:21 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> Message-ID: <3FilSkgaRewtvarggYxWX4MUH7u2BIUhdHKJymashPg=.30942339-d95c-4e91-94ea-fb5bec740e47@github.com> On Wed, 21 Jun 2023 07:02:35 GMT, Roland Westrelin wrote: > > Speaking of alternative ways to pass profile info around, you could just embed `ciCallProfile` in `SubTypeCheck`. Any particular reasons not to do so? > > It felt easier in terms of memory management. If we have some extra data embedded in the `SubTypeCheck` node, is it a pointer or the full data structure? If it is a pointer, do we clone the data on node clone? Try to reclaim memory on node destruction? Where should the data live so it's not destroyed while we need it but doesn't live longer that's required? Let me see if I can simplify that part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1600953233 From shade at openjdk.org Wed Jun 21 14:52:11 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 Jun 2023 14:52:11 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> Message-ID: On Mon, 19 Jun 2023 10:05:37 GMT, Chang Peng wrote: > Output before this patch: https://gist.github.com/changpeng1997/734aa176577bfff56f5a87db9c8db69a > Output after this patch: https://gist.github.com/changpeng1997/73098069b8f814310d6606dfd7dc56c5 Blackhole mode autodetection was added in JMH 1.33, and enabled in JMH 1.34. The logs above say they run with JMH 1.33. Current version is 1.36, you need to upgrade, @changpeng1997. Also, I notice that your before/after logs use different JVM modes, one uses `release`, and another uses `fastdebug`. These are not comparable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1600982239 From aph at openjdk.org Wed Jun 21 15:06:05 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Jun 2023 15:06:05 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Mon, 19 Jun 2023 02:06:27 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update MaskQueryOperationsBenchmark.java So I'm looking at the results of the patch and I see: Before Benchmark (inputs) Mode Cnt Score Error Units MaskQueryOperationsBenchmark.testFirstTrueInt 1 avgt 3 69.547 ? 2.837 ns/op MaskQueryOperationsBenchmark.testFirstTrueInt 2 avgt 3 69.549 ? 0.497 ns/op MaskQueryOperationsBenchmark.testFirstTrueInt 3 avgt 3 69.506 ? 1.360 ns/op After: Benchmark (inputs) Mode Cnt Score Error Units MaskQueryOperationsBenchmark.testFirstTrueInt 1 avgt 3 58.955 ? 0.838 ns/op MaskQueryOperationsBenchmark.testFirstTrueInt 2 avgt 3 58.690 ? 2.940 ns/op MaskQueryOperationsBenchmark.testFirstTrueInt 3 avgt 3 58.923 ? 1.088 ns/op which corresponds with a change from 0x00000001158ef748: fmov x11, d16 0x00000001158ef74c: rbit x11, x11 0x00000001158ef750: clz x11, x11 0x00000001158ef754: lsr w11, w11, #3 ;; 0x4 0x00000001158ef758: orr w8, wzr, #0x4 0x00000001158ef75c: cmp w11, w8 0x00000001158ef760: csel w11, w8, w11, ge // ge = tcont ``` to 0x0000000115f3f8e8: fmov x14, d16 0x0000000115f3f8ec: orr x14, x14, #0x100000000 0x0000000115f3f8f0: rbit x14, x14 0x0000000115f3f8f4: clz x14, x14 0x0000000115f3f8f8: lsr w14, w14, #3 That's a pretty decent speedup when you consider that the benchmark is dominated by memory operations and vector->core register moves. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1601008191 From mbaesken at openjdk.org Wed Jun 21 15:26:16 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 21 Jun 2023 15:26:16 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar Message-ID: There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. ------------- Commit messages: - JDK-8310550 Changes: https://git.openjdk.org/jdk/pull/14593/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310550 Stats: 14 lines in 12 files changed: 0 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14593/head:pull/14593 PR: https://git.openjdk.org/jdk/pull/14593 From aph at openjdk.org Wed Jun 21 16:53:02 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 21 Jun 2023 16:53:02 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Mon, 19 Jun 2023 02:06:27 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update MaskQueryOperationsBenchmark.java If we care about memory ops, note that we can get a useful speedup with `match(Set dst (VectorMaskFirstTrue (LoadVector mem)))` but perhaps that's not worth doing. Benchmark (inputs) Mode Cnt Score Error Units MaskQueryOperationsBenchmark.testFirstTrueInt 1 avgt 3 49.591 ? 0.477 ns/op I will say that in general if you have to work in the core integer processor on an in-memory vector , it might be worth loading straight into core registers rather than going via the SIMD regs. Maybe we should write a general-purpose function that bypasses the SIMD unit in all such cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1601216501 From erikj at openjdk.org Wed Jun 21 17:01:04 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 21 Jun 2023 17:01:04 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 15:18:19 GMT, Matthias Baesken wrote: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. The update to Java.gmk is good. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14593#pullrequestreview-1491263836 From simonis at openjdk.org Wed Jun 21 17:32:05 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 21 Jun 2023 17:32:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) Message-ID: This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer # # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) ... Current CompileTask: C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) V [libjvm.so+0x1230036] thread_native_entry(Thread*)+0x1a5 (os_linux.cpp:778) ... ``` `SubTypeCheckNode::sub()` expects that it's `sub_t` input `Type` is either a Klasspointer (i.e. `Type::KlassPtr`) or an Ooppointer (i.e. `Type::OopPtr`, `Type::InstPtr` or `Type::AryPtr`). It only checks for a Klasspointer and if that's not the case it assumes an Ooppointer. However, in the crashing case, `sub_t` has the generic pointer type `Type::AnyPtr` so debug builds will run into an assertion and product builds will just crash. The `SubTypeCheckNode` in question has the following shape in `split_if()`: Con (#top) | | __IfTrue |/ || __IfFalse |// Region | __ ConP (#NULL) | / | __/ _ Phi (Oop:kotlinx/coroutines/internal/LockFreeLinkedListNode:NotNull) || ___/ ||| ____ Phi (Oop:kotlinx/coroutines/internal/LockFreeLinkedListNode:NotNull) |||| |/// Phi | ConP (Klass:precise klass kotlinx/coroutines/channels/Send) | | \ / SubTypeCheck `split_if()` then searches for the first contstant input pf `SubTypeCheck` `Phi`-node and finds `ConP (#NULL)`. It then calls `SubTypeCheckNode::sub()` with `sub_t` as `ConP (#NULL)`'s type which is `Type::AnyPtr` and crashes. I've verified that returning `bottom_type()` from `SubTypeCheckNode::sub` for the `(!sub_t->isa_klassptr() && !sub_t->isa_oopptr())` case fixes the crash (by instrumenting the VM to ensure that the compilation as well as the further program execution succeeds if we take the new branch). I'm only not sure if the unusual graph which leads to this crash is caused by the *uncommon* bytecode generated by the Kotlin compiler or if it is the result of another problem in an earlier optimization stage? While browsing JBS, I found [JDK-8303513](https://bugs.openjdk.org/browse/JDK-8303513) which seems similar to this issue (i.e. also caused by a SubTypeCheckNode with an input of the TOP constant node). While looking at `SubTypeCheckNode::Ideal()` I found that it already has exactly the same safeguard as proposed for `SubTypeCheckNode::sub()` in this PR, namely: if (!super_t->isa_klassptr() || (!sub_t->isa_klassptr() && !sub_t->isa_oopptr())) { return NULL; } I'd really appreciate if @rwestrel could take a look at this issue. ------------- Commit messages: - 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) Changes: https://git.openjdk.org/jdk/pull/14600/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14600&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303279 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14600.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14600/head:pull/14600 PR: https://git.openjdk.org/jdk/pull/14600 From kvn at openjdk.org Wed Jun 21 18:13:06 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 21 Jun 2023 18:13:06 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 03:26:38 GMT, Fei Gao wrote: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Check for `UseFMA` should be moved from `c2compiler.cpp` to `Matcher::match_rule_supported` in `.ad` files. I see we have such check for Fma vectors in `x86.ad` but not for scalars. Similar issue exist for other platforms. ------------- PR Review: https://git.openjdk.org/jdk/pull/14576#pullrequestreview-1491402018 From dlong at openjdk.org Wed Jun 21 19:02:03 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Jun 2023 19:02:03 GMT Subject: [jdk21] RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: <_gHr__o7f2Kx0J9mcnWg5hs1r4HU6ORcYS2xllYImak=.0a4b9e9d-acb8-448e-a8ca-70b505527d70@github.com> On Wed, 21 Jun 2023 10:28:33 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. > > Thanks, > Tobias Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk21/pull/50#pullrequestreview-1491507920 From dlong at openjdk.org Wed Jun 21 19:45:06 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Jun 2023 19:45:06 GMT Subject: [jdk21] RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> References: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Message-ID: On Wed, 21 Jun 2023 10:28:51 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. > > Thanks, > Tobias Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk21/pull/51#pullrequestreview-1491705068 From dlong at openjdk.org Wed Jun 21 19:47:04 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Jun 2023 19:47:04 GMT Subject: [jdk21] RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: <39K8Pk2u7oT85atVBQZrzX8B_S34f2ej0h090Pc7tSE=.06022b01-ea67-47ce-b157-7f13852cd29c@github.com> On Wed, 21 Jun 2023 10:28:11 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. > > Thanks, > Tobias Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk21/pull/49#pullrequestreview-1491707600 From dlong at openjdk.org Wed Jun 21 19:48:08 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 21 Jun 2023 19:48:08 GMT Subject: [jdk21] RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: <6ex2WE3nRgRgRO05Qm6lMJId9Z8EiLgdZ61D9_ny1Zw=.d8d0a788-4d4c-4841-ab8f-f3f127686700@github.com> On Wed, 21 Jun 2023 10:27:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. > > Thanks, > Tobias Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk21/pull/48#pullrequestreview-1491708972 From dholmes at openjdk.org Wed Jun 21 21:51:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 21 Jun 2023 21:51:03 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 15:18:19 GMT, Matthias Baesken wrote: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. Mostly seems okay - a couple of things need further adjusting I think. Thanks. src/jdk.compiler/share/classes/com/sun/tools/javac/file/JavacFileManager.java line 196: > 194: > 195: /** > 196: * Set whether or not to use ct.sym as an alternate As an alternate to what? This needs something else. test/langtools/tools/javap/4798312/JavapShouldLoadClassesFromRTJarTest.java line 1: > 1: /* The name of this test includes RTJar. It needs to be changed too I think. Does this test actually still test something? ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14593#pullrequestreview-1491961660 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1237747922 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1237749197 From sviswanathan at openjdk.org Wed Jun 21 23:45:03 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 Jun 2023 23:45:03 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 02:30:53 GMT, Quan Anh Mai wrote: >> @merykitty @sviswa7 @fg1417 Is there a way to stress-test the registers? It seems this bug only triggered because we had a moderately large unrolling factor, and then did not vectorize, leaving lots of instructions with probably a higher register pressure. Would be nice to have some sort of testing where we generate more (all?) of the possible register combinations. What do you think? > > @eme64 Yes that was my mistake, that node requires AVX512VL so `vlRegF` and `regF` are the same. > >> Is there a way to stress-test the registers? > > Can we randomise the allocated register during register allocation? > > Thanks. > @merykitty Yes, randomization would be great. I don't know much about the register allocator, so feel free to do something like that if you want and have time ;) > > @sviswa7 Is there something you want me to change still? No additional changes from my side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1601826050 From sviswanathan at openjdk.org Thu Jun 22 00:14:09 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 Jun 2023 00:14:09 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14379#pullrequestreview-1492107433 From sviswanathan at openjdk.org Thu Jun 22 00:21:03 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 Jun 2023 00:21:03 GMT Subject: RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: <7HOB55d8ZyXWd5_IbmDnBXHYjDGTEAbz6DD6WiGfmBo=.3d23bb8f-0d2d-4e83-ab95-ee097e2a9deb@github.com> On Mon, 19 Jun 2023 08:56:26 GMT, Roland Westrelin wrote: > Before 8275201, loading the element klass of an array returned: > > > TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); > > > that is exact if the array type is exact. I changed it to: > > > tkls->is_aryklassptr()->elem(); > > > When the array type is exact (newly allocated array for instance) but > the element class has subclasses, this doesn't return an exact class > (so the logic is different from the one that was there before). That > affects array store checks that no longer constant fold. @rwestrel Thanks a lot for this fix. It fully recovers the performance drop that we have been observing since JDK 19. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14536#issuecomment-1601848258 From jwaters at openjdk.org Thu Jun 22 02:55:07 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 02:55:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v5] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert NULL to nullptr changes in jaccesswalker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/5fa2d3eb..9a8a9158 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=03-04 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Thu Jun 22 03:00:16 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 03:00:16 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Revert NULL to nullptr changes in jaccesswalker ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/9a8a9158..a31e52e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From thartmann at openjdk.org Thu Jun 22 05:48:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:48:10 GMT Subject: [jdk21] RFR: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:27:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. > > Thanks, > Tobias Thanks, Dean! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/48#issuecomment-1602043437 From thartmann at openjdk.org Thu Jun 22 05:48:11 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:48:11 GMT Subject: [jdk21] RFR: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:28:11 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. > > Thanks, > Tobias Thanks, Dean! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/49#issuecomment-1602043617 From thartmann at openjdk.org Thu Jun 22 05:49:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:49:07 GMT Subject: [jdk21] RFR: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: <3KJcQYmCTj8wB9EWlqOfpoa3OU18peiH5EegLApFFDk=.f61760be-2eb7-421a-b0ab-0d4c582591b7@github.com> On Wed, 21 Jun 2023 10:28:33 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. > > Thanks, > Tobias Thanks, Dean! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/50#issuecomment-1602043812 From thartmann at openjdk.org Thu Jun 22 05:50:09 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:50:09 GMT Subject: [jdk21] Integrated: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> References: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Message-ID: On Wed, 21 Jun 2023 10:28:51 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 89ac41be Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/89ac41be57a072589fea5400ca7797cdcf712e17 Stats: 63 lines in 2 files changed: 62 ins; 0 del; 1 mod 8310126: C1: Missing receiver null check in Reference::get intrinsic Reviewed-by: chagedorn, dlong Backport-of: 02aaab12e331e5a4c249f1d281c4439e2e7c914f ------------- PR: https://git.openjdk.org/jdk21/pull/51 From thartmann at openjdk.org Thu Jun 22 05:50:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:50:06 GMT Subject: [jdk21] RFR: 8310126: C1: Missing receiver null check in Reference::get intrinsic In-Reply-To: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> References: <3HO2N6Zgs0xQC9iuClqxu_dYIRlegISMuV4Q7ErmOq0=.851cff5e-7240-4c11-a366-2d89ac7e9b69@github.com> Message-ID: On Wed, 21 Jun 2023 10:28:51 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310126](https://bugs.openjdk.java.net/browse/JDK-8310126). Applies cleanly. > > Thanks, > Tobias Thanks, Dean! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/51#issuecomment-1602043725 From thartmann at openjdk.org Thu Jun 22 05:51:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:51:08 GMT Subject: [jdk21] Integrated: 8308855: ARM32: TestBooleanVector crashes after 8300257 In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:27:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308855](https://bugs.openjdk.java.net/browse/JDK-8308855). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 5357bcd7 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/5357bcd7762379a2e32ad99ef6f482009f4437a2 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8308855: ARM32: TestBooleanVector crashes after 8300257 Reviewed-by: chagedorn, dlong Backport-of: 266f9838ee28fb49b5368fc9778854c456b02b7c ------------- PR: https://git.openjdk.org/jdk21/pull/48 From thartmann at openjdk.org Thu Jun 22 05:51:11 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:51:11 GMT Subject: [jdk21] Integrated: 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:28:11 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309266](https://bugs.openjdk.java.net/browse/JDK-8309266). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 7621d988 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/7621d988f9fd6ec1b075d355b500b768518100c2 Stats: 61 lines in 2 files changed: 59 ins; 0 del; 2 mod 8309266: C2: assert(final_con == (jlong)final_int) failed: final value should be integer Reviewed-by: chagedorn, dlong Backport-of: 4a9cc8a000cafb3ad77a33710054b567e8553652 ------------- PR: https://git.openjdk.org/jdk21/pull/49 From thartmann at openjdk.org Thu Jun 22 05:52:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 22 Jun 2023 05:52:07 GMT Subject: [jdk21] Integrated: 8309498: [JVMCI] race in CallSiteTargetValue recording In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:28:33 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309498](https://bugs.openjdk.java.net/browse/JDK-8309498). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 55aa4cb4 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/55aa4cb48af5bf13ebc745c6464ab3c42e87b375 Stats: 11 lines in 1 file changed: 5 ins; 3 del; 3 mod 8309498: [JVMCI] race in CallSiteTargetValue recording Reviewed-by: chagedorn, dlong Backport-of: bb966827ac445d805bac5005d0fbda0c61111252 ------------- PR: https://git.openjdk.org/jdk21/pull/50 From dean.long at oracle.com Thu Jun 22 06:35:28 2023 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 21 Jun 2023 23:35:28 -0700 Subject: Question regarding ReplayCompiles and multiple inlining In-Reply-To: References: Message-ID: I noticed this problem before too.? Unfortunately I can't think of a workaround.? It seems like the right fix is to change the replay file format to record more information. dl On 6/19/23 5:07 AM, Volker Simonis wrote: > Hi, > > I try to reproduce a compiler issue with a ReplayDataFile but > unfortunately I can't reproduce the crash. > > I hacked the VM to print out the the inlining tree just before the > crashes and realized that the original inlining differes from the > inlining done by ReplayCompiles. > > In my specific case I have the following inlining pattern during the > crash (`foo::f1()` gets inlined twice into `foo::f0() `): > . > . > @ 57 foo::f0() inline (hot) > @ 48 foo::f1() inline (hot) > @ 2 bar::f2() inline (hot) > . > . > @ 48 foo::f1() inline (hot) > @ 2 bar::f2() NodeCountInliningCutoff > > In the ReplayDataFile (in the `inline` part of the `compile` line) > both, `foo::f1()` and `bar::f2()` are recorded only once (because they > have the same bci, name/signature and inlining depth). > > When running the replay, I get the following inlining pattern: > . > . > @ 57 foo::f0() force inline by ciReplay > @ 48 foo::f1() force inline by ciReplay > @ 2 bar::f2() force inline by ciReplay > . > . > @ 48 foo::f1() force inline by ciReplay > @ 2 bar::f2() force inline by ciReplay > > This is clearly different because in the replay we inline `bar::f2()` > a second time (while in the original run it was skipped due to > NodeCountInliningCutoff). > > From looking at `find_ciInlineRecord()` [1], it looks like the replay > file only records the bci, inlining depth and method name/signature > for an inlinee? How is this supposed to work if a method is inlined > differently at the same level like in this example? > > Notice that I'm currently working with JDK 17 (because my problem > doesn't reproduce with HEAD) but it seems the relevant code hasn't > changed much in this area since JDK 17. > > Please let me know if this is a known problem and if there's any way > to workaround it? > > Thank you and best regards, > Volker > > [1] https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp#L992 From duke at openjdk.org Thu Jun 22 06:58:03 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 22 Jun 2023 06:58:03 GMT Subject: RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> References: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> Message-ID: <00qalnTbS398_BRW1EbEjoM79YgpFUku1Phw5TsdLWs=.25158d33-ca49-4db2-9a19-3bc1176916de@github.com> On Wed, 21 Jun 2023 11:29:51 GMT, Eric Nothum wrote: > Removed TLAB from the IR Framework whitelist. If TLAB allocations are disabled by `-XX:-UseTLAB` the IR verification can fail, therefore `"TLAB"` should not be withelisted. See [JDK-8295210](https://bugs.openjdk.org/browse/JDK-8295210) for an example of such a failure. \integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/14583#issuecomment-1602104627 From roland at openjdk.org Thu Jun 22 07:02:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 07:02:05 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: <3FilSkgaRewtvarggYxWX4MUH7u2BIUhdHKJymashPg=.30942339-d95c-4e91-94ea-fb5bec740e47@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> <3FilSkgaRewtvarggYxWX4MUH7u2BIUhdHKJymashPg=.30942339-d95c-4e91-94ea-fb5bec740e47@github.com> Message-ID: On Wed, 21 Jun 2023 14:33:10 GMT, Roland Westrelin wrote: > Speaking of alternative ways to pass profile info around, you could just embed `ciCallProfile` in `SubTypeCheck`. Any particular reasons not to do so? One thing to consider is that we don't necessarily want to common 2 `SubTypeCheck` nodes with the same object/super kass inputs. If we have something like: if (...) { if (o instanceof C) { } } else { if (o instanceof C) { } } There are 2 `SubTypeCheck` nodes with possibly different profile data. In order to common them, we would have to decide what profile data to keep (or drop profile data entirely) which could lead to less performant subtype checks. I think we want to decide on a case by case basis whether 2 `SubTypeCheck` nodes need to be commoned or not which is essentially what the logic I added does. Having extra edges makes that fairly straightforward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1602108426 From roland at openjdk.org Thu Jun 22 07:08:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 07:08:05 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: Message-ID: On Tue, 6 Jun 2023 10:36:21 GMT, Emanuel Peter wrote: > This is another case where imprecise type computation leads to corrupted control flow. > > The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. > > Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). > > Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. > > **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. > > **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. > > I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). > > **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. > > **Testing** Attached one regre... If in `PhaseIdealLoop::is_counted_loop`, when the loop limit check is added, we added a `CastII` to cast the limit to a narrower type wouldn't that have the same effect? `PhiNode::value()` would compute a narrower type for the iv Phi. Would the loop incr still overflow? Assuming that does work wouldn't that be a simpler change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602115596 From davleopo at openjdk.org Thu Jun 22 07:52:12 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Thu, 22 Jun 2023 07:52:12 GMT Subject: RFR: JDK-8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null Message-ID: Fix JVMCI handling of null dynamic constants and dynamic constant errors. For null dynamic constants the jvmci code wrongly was checking for the_null_sentinel while constantpool resolve_possibly_cached_constant_at returns nullptr for null constants. And for errors during the bootstrap method resolve_possibly_cached_constant_at already throws the BootstrapMethodError so we can never reach the next branch. ------------- Commit messages: - 8310425: [JVMCI] fix jvmci support for null DynamicConstant and DynamicConstant in error. Changes: https://git.openjdk.org/jdk/pull/14582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14582&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310425 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14582/head:pull/14582 PR: https://git.openjdk.org/jdk/pull/14582 From dnsimon at openjdk.org Thu Jun 22 08:19:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 Jun 2023 08:19:02 GMT Subject: RFR: JDK-8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:26:19 GMT, David Leopoldseder wrote: > Fix JVMCI handling of null dynamic constants and dynamic constant errors. > > For null dynamic constants the jvmci code wrongly was checking for the_null_sentinel while constantpool resolve_possibly_cached_constant_at returns nullptr for null constants. > > And for errors during the bootstrap method resolve_possibly_cached_constant_at already throws the BootstrapMethodError so we can never reach the next branch. Marked as reviewed by dnsimon (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14582#pullrequestreview-1492642352 From epeter at openjdk.org Thu Jun 22 09:08:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 09:08:06 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 07:04:59 GMT, Roland Westrelin wrote: >> This is another case where imprecise type computation leads to corrupted control flow. >> >> The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. >> >> Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). >> >> Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. >> >> **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. >> >> **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. >> >> I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). >> >> **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. >>... > > If in `PhaseIdealLoop::is_counted_loop`, when the loop limit check is added, we added a `CastII` to cast the limit to a narrower type wouldn't that have the same effect? `PhiNode::value()` would compute a narrower type for the iv Phi. Would the loop incr still overflow? Assuming that does work wouldn't that be a simpler change? @rwestrel thanks for the idea. So you mean I should make the `Phi` type narrow enough, such that adding the `stride` to it should never lead to a type overflow, right? Then there would be no type overflow in the `incr`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602280067 From mbaesken at openjdk.org Thu Jun 22 09:26:06 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 22 Jun 2023 09:26:06 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 21:46:03 GMT, David Holmes wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > src/jdk.compiler/share/classes/com/sun/tools/javac/file/JavacFileManager.java line 196: > >> 194: >> 195: /** >> 196: * Set whether or not to use ct.sym as an alternate > > As an alternate to what? This needs something else. should "to the image modules files" be used instead ? > test/langtools/tools/javap/4798312/JavapShouldLoadClassesFromRTJarTest.java line 1: > >> 1: /* > > The name of this test includes RTJar. It needs to be changed too I think. Does this test actually still test something? It seems to start a javap. So I think it tests something, how important this is and what other tests might cover similar stuff, I do not know unfortunately . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1238252196 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1238254149 From djelinski at openjdk.org Thu Jun 22 10:40:09 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 22 Jun 2023 10:40:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 22 Jun 2023 03:00:16 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Revert NULL to nullptr changes in jaccesswalker src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp line 547: > 545: snprintf( s, sizeof(s), > 546: "ERROR calling GetAccessibleContextInfo; vmID = %lX, context = %p", > 547: reinterpret_cast(vmID), (void*)context ); do you need this cast? I checked a few compilers and passing a signed long to "%lX" was fine with them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238338713 From roland at openjdk.org Thu Jun 22 11:13:04 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 11:13:04 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: Message-ID: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> On Tue, 6 Jun 2023 10:36:21 GMT, Emanuel Peter wrote: > This is another case where imprecise type computation leads to corrupted control flow. > > The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. > > Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). > > Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. > > **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. > > **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. > > I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). > > **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. > > **Testing** Attached one regre... It doesn't seem to be true that the loop incr never overflows in the general case. See this example: public class TestOverflowCountedLoopIncr { public static void main(String[] args) { for (int i = 0; i < 20_000; i++) { test(Integer.MAX_VALUE); } } private static float test(int start) { float v = 1; int i = start; do { synchronized (new Object()) {} v *= 2; i++; } while (i < Integer.MIN_VALUE + 100); return v; } } That one also hangs with -XX:LoopMaxUnroll=0. A bug in LSM it seems. I will file a bug for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602452171 From roland at openjdk.org Thu Jun 22 11:13:04 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 11:13:04 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 11:08:36 GMT, Roland Westrelin wrote: >> This is another case where imprecise type computation leads to corrupted control flow. >> >> The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. >> >> Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). >> >> Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. >> >> **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. >> >> **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. >> >> I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). >> >> **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. >>... > > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: > > public class TestOverflowCountedLoopIncr { > public static void main(String[] args) { > for (int i = 0; i < 20_000; i++) { > test(Integer.MAX_VALUE); > } > } > > private static float test(int start) { > float v = 1; > int i = start; > do { > synchronized (new Object()) {} > v *= 2; > i++; > } while (i < Integer.MIN_VALUE + 100); > return v; > } > } > > That one also hangs with -XX:LoopMaxUnroll=0. A bug in LSM it seems. I will file a bug for that. > @rwestrel thanks for the idea. So you mean I should make the `Phi` type narrow enough, such that adding the `stride` to it should never lead to a type overflow, right? Then there would be no type overflow in the `incr`. That's what I was thinking indeed. > You want to `CastII` the limit for the exit check, I assume? And we would have to add a `CastII` again if the limit is ever changed during the unrolling, right? Would that be required? I was hoping that the CastII on the limit before unrolling would cause the phi type to be narrow enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602453662 From epeter at openjdk.org Thu Jun 22 12:21:12 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 12:21:12 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 11:08:36 GMT, Roland Westrelin wrote: > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? > Would that be required? I was hoping that the CastII on the limit before unrolling would cause the phi type to be narrow enough. Ok, let's think this through for going from pre-loop to main loop / zero-trip-guard of the main loop: `incr = iv + stride` exit check: `iv + stride = incr < limit` So that the incr does not overflow, it would have to have type `init+stride ... max_int` (`stride > 0`). That means we would want the phi to have a type that has `stride` subtracted from `hi`, hence `init..max_int-stride`. For `stride < 0` we want the incr to have type `min_int...init+stride` and the `phi` should have `min_int-stride ... init`. So the issue in the example above is that the pre-loop phi has type `minint...7`, and not `minint+3...7`, if I understand you correctly, @rwestrel . Because with that type, the `incr` would then have type `min_int..4`, and that would let the zero-trip-guard recognize that it is always false, since the limit has type `8...maxint`. `PhiNode::Value` determines the range for counted-loops. We take the limit into account there. But it seems we don't do the right thing there yet. I'll have a look into that, and if a `CastII` on the limit helps. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602538125 From roland at openjdk.org Thu Jun 22 12:29:03 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 12:29:03 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 12:18:21 GMT, Emanuel Peter wrote: > > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: > > But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? Ran with: -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints I see a single counted loop, no uncommon trap in the IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602548414 From epeter at openjdk.org Thu Jun 22 12:40:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 12:40:05 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: <7lExBpd5CEfsaEjWd9pf7zKDqpe7SqtGDZmSFCESZ3M=.1350a8f9-fe4d-4b2a-8cd0-74784b137b1e@github.com> On Thu, 22 Jun 2023 12:26:06 GMT, Roland Westrelin wrote: >>> It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >> >> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. >> Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? >> >>> Would that be required? I was hoping that the CastII on the limit before unrolling would cause the phi type to be narrow >> enough. >> >> Ok, let's think this through for going from pre-loop to main loop / zero-trip-guard of the main loop: >> >> `incr = iv + stride` >> exit check: `iv + stride = incr < limit` >> >> So that the incr does not overflow, it would have to have type `init+stride ... max_int` (`stride > 0`). That means we would want the phi to have a type that has `stride` subtracted from `hi`, hence `init..max_int-stride`. For `stride < 0` we want the incr to have type `min_int...init+stride` and the `phi` should have `min_int-stride ... init`. >> >> So the issue in the example above is that the pre-loop phi has type `minint...7`, and not `minint+3...7`, if I understand you correctly, @rwestrel . Because with that type, the `incr` would then have type `min_int..4`, and that would let the zero-trip-guard recognize that it is always false, since the limit has type `8...maxint`. >> >> `PhiNode::Value` determines the range for counted-loops. We take the limit into account there. But it seems we don't do the right thing there yet. I'll have a look into that, and if a `CastII` on the limit helps. >> >> Now what happens for the main-loop to post-loop? There, the stride has changed with the unrolling. So maybe we'd have to somehow do more to ensure the incr of the main-loop cannot overflow. I guess that could maybe be an issue for the vectorized drain-loop for vector-super-unrolling, or the post-loop. > >> > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >> >> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? > > Ran with: > > -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints > > I see a single counted loop, no uncommon trap in the IR. @rwestrel so you think the `incr` can indeed overflow, and that is ok? Or would that be a bug? Why do we even have the loop limit check in the first place, if overflow is allowed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602562601 From aph at openjdk.org Thu Jun 22 12:50:06 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 Jun 2023 12:50:06 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Mon, 19 Jun 2023 02:06:27 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Update MaskQueryOperationsBenchmark.java OK. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14373#pullrequestreview-1493110503 From epeter at openjdk.org Thu Jun 22 12:58:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 12:58:23 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v2] In-Reply-To: References: Message-ID: > Removed a spurious assert before optimization bailout. > > I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. > > I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. > > I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. > Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: removed unnecessary flags from test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14494/files - new: https://git.openjdk.org/jdk/pull/14494/files/bd53d95e..53e8913e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14494&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14494&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14494/head:pull/14494 PR: https://git.openjdk.org/jdk/pull/14494 From epeter at openjdk.org Thu Jun 22 12:58:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 12:58:26 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 07:04:40 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> removed unnecessary flags from test > > test/hotspot/jtreg/compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java line 43: > >> 41: public static void main(String[] args) { >> 42: TestFramework.runWithFlags("-Xbatch", "-XX:-TieredCompilation", >> 43: "-XX:CompileCommand=compileonly,compiler.loopopts.superword.TestUnorderedReductionPartialVectorization::test*"); > > It should probably also trigger without the flags by just specifying `TestFramework.run()` as `test1()` is not using other methods and the IR framework will implicitly use `-Xbatch` and wait for the compilation of `test1()` to be finished. @chhagedorn I was able to remove them, thanks for the catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14494#discussion_r1238483623 From lucy at openjdk.org Thu Jun 22 13:37:18 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 22 Jun 2023 13:37:18 GMT Subject: RFR: 8299683: [S390X] Problems with -XX:+VerifyStack [v3] In-Reply-To: References: Message-ID: On Mon, 6 Feb 2023 04:35:03 GMT, sid8606 wrote: >> Deoptimization and uncommon trap stubs require last Java PC to point to a PC which has an appropriate OopMap. Adjusting a offset for PC in last java frame. > > sid8606 has updated the pull request incrementally with one additional commit since the last revision: > > Calculate Oop Map offset for last java frame I decided to sponsor since @TheRealMDoerr is currently on and off. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12161#issuecomment-1419109326 From jwaters at openjdk.org Thu Jun 22 13:52:12 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 13:52:12 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 22 Jun 2023 10:37:11 GMT, Daniel Jeli?ski wrote: >> Julian Waters has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Revert NULL to nullptr changes in jaccesswalker > > src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp line 547: > >> 545: snprintf( s, sizeof(s), >> 546: "ERROR calling GetAccessibleContextInfo; vmID = %lX, context = %p", >> 547: reinterpret_cast(vmID), (void*)context ); > > do you need this cast? I checked a few compilers and passing a signed long to "%lX" was fine with them. gcc will crash with a warning about a mismatched format specifier between signed and unsigned if this isn't done, unfortunately ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238560245 From roland at openjdk.org Thu Jun 22 13:57:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 13:57:05 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 12:26:06 GMT, Roland Westrelin wrote: >>> It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >> >> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. >> Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? >> >>> Would that be required? I was hoping that the CastII on the limit before unrolling would cause the phi type to be narrow >> enough. >> >> Ok, let's think this through for going from pre-loop to main loop / zero-trip-guard of the main loop: >> >> `incr = iv + stride` >> exit check: `iv + stride = incr < limit` >> >> So that the incr does not overflow, it would have to have type `init+stride ... max_int` (`stride > 0`). That means we would want the phi to have a type that has `stride` subtracted from `hi`, hence `init..max_int-stride`. For `stride < 0` we want the incr to have type `min_int...init+stride` and the `phi` should have `min_int-stride ... init`. >> >> So the issue in the example above is that the pre-loop phi has type `minint...7`, and not `minint+3...7`, if I understand you correctly, @rwestrel . Because with that type, the `incr` would then have type `min_int..4`, and that would let the zero-trip-guard recognize that it is always false, since the limit has type `8...maxint`. >> >> `PhiNode::Value` determines the range for counted-loops. We take the limit into account there. But it seems we don't do the right thing there yet. I'll have a look into that, and if a `CastII` on the limit helps. >> >> Now what happens for the main-loop to post-loop? There, the stride has changed with the unrolling. So maybe we'd have to somehow do more to ensure the incr of the main-loop cannot overflow. I guess that could maybe be an issue for the vectorized drain-loop for vector-super-unrolling, or the post-loop. > >> > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >> >> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? > > Ran with: > > -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints > > I see a single counted loop, no uncommon trap in the IR. > @rwestrel so you think the `incr` can indeed overflow, and that is ok? Or would that be a bug? Why do we even have the loop limit check in the first place, if overflow is allowed? To guarantee no overflow requires init < limit (for a loop going up). Nothing guarantees that when c2 pattern matches a counted loop. Whether overflow is a problem or not would require taking a closer look at individual optimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602683762 From djelinski at openjdk.org Thu Jun 22 14:08:08 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 22 Jun 2023 14:08:08 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <4az-3bnmXmvuQ_p-IHOJCvxhpFaXNp0qG9REPV4j-4U=.b1439cea-de5d-4073-bdc3-b20aedf25347@github.com> On Thu, 22 Jun 2023 13:48:53 GMT, Julian Waters wrote: >> src/jdk.accessibility/windows/native/jaccesswalker/jaccesswalker.cpp line 547: >> >>> 545: snprintf( s, sizeof(s), >>> 546: "ERROR calling GetAccessibleContextInfo; vmID = %lX, context = %p", >>> 547: reinterpret_cast(vmID), (void*)context ); >> >> do you need this cast? I checked a few compilers and passing a signed long to "%lX" was fine with them. > > gcc will crash with a warning about a mismatched format specifier between signed and unsigned if this isn't done, unfortunately Which gcc? This code compiles without warnings: #include int main() { unsigned long i = 1; long j = 2; printf("%ld %ld %lx %lx %lu %lu\n", i, j, i, j, i, j); return 0; } # gcc -Wall -Wextra -Wformat=2 test.c # gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238581381 From jwaters at openjdk.org Thu Jun 22 14:08:08 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:08:08 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: <4az-3bnmXmvuQ_p-IHOJCvxhpFaXNp0qG9REPV4j-4U=.b1439cea-de5d-4073-bdc3-b20aedf25347@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4az-3bnmXmvuQ_p-IHOJCvxhpFaXNp0qG9REPV4j-4U=.b1439cea-de5d-4073-bdc3-b20aedf25347@github.com> Message-ID: On Thu, 22 Jun 2023 14:03:26 GMT, Daniel Jeli?ski wrote: >> gcc will crash with a warning about a mismatched format specifier between signed and unsigned if this isn't done, unfortunately > > Which gcc? This code compiles without warnings: > > #include > int main() { > unsigned long i = 1; > long j = 2; > printf("%ld %ld %lx %lx %lu %lu\n", i, j, i, j, i, j); > return 0; > } > > > # gcc -Wall -Wextra -Wformat=2 test.c > # gcc --version > gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 > Copyright (C) 2019 Free Software Foundation, Inc. > This is free software; see the source for copying conditions. There is NO > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. I'm currently running version 13.1, win32 threads ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238583171 From jwaters at openjdk.org Thu Jun 22 14:08:09 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:08:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4az-3bnmXmvuQ_p-IHOJCvxhpFaXNp0qG9REPV4j-4U=.b1439cea-de5d-4073-bdc3-b20aedf25347@github.com> Message-ID: On Thu, 22 Jun 2023 14:04:48 GMT, Julian Waters wrote: >> Which gcc? This code compiles without warnings: >> >> #include >> int main() { >> unsigned long i = 1; >> long j = 2; >> printf("%ld %ld %lx %lx %lu %lu\n", i, j, i, j, i, j); >> return 0; >> } >> >> >> # gcc -Wall -Wextra -Wformat=2 test.c >> # gcc --version >> gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 >> Copyright (C) 2019 Free Software Foundation, Inc. >> This is free software; see the source for copying conditions. There is NO >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > I'm currently running version 13.1, win32 threads I'll retry again, maybe the warning has changed now ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238583800 From jwaters at openjdk.org Thu Jun 22 14:16:28 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:16:28 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v7] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert Cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/a31e52e9..775a3b05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Thu Jun 22 14:16:29 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:16:29 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v6] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4az-3bnmXmvuQ_p-IHOJCvxhpFaXNp0qG9REPV4j-4U=.b1439cea-de5d-4073-bdc3-b20aedf25347@github.com> Message-ID: On Thu, 22 Jun 2023 14:05:15 GMT, Julian Waters wrote: >> I'm currently running version 13.1, win32 threads > > I'll retry again, maybe the warning has changed now Seems like it doesn't trigger any longer, I'll revert the cast. Thanks for catching this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238589176 From alanb at openjdk.org Thu Jun 22 14:23:04 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 22 Jun 2023 14:23:04 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 15:18:19 GMT, Matthias Baesken wrote: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. src/java.sql/share/classes/java/sql/DriverManager.java line 658: > 656: * (which is invoking this class indirectly) > 657: * classloader, so that the JDBC driver class outside the image > 658: * can be loaded from here. This code should probably be changed to use VM.isSystemDomainLoader(callerCL). I think the comment should be replaced because it doesn't match what it actually does and it's nothing to do with the whether the JDBC driver is in the run-time image or not. How about: "If the caller is defined to the bootstrap or platform class loader then use the Thread CCL as the initiating class loader so that a JDBC on the class path, or bundled with an application, is found." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1238604300 From jwaters at openjdk.org Thu Jun 22 14:23:25 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:23:25 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v8] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: GetDIBits should take an LPVOID ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/775a3b05..7dbe5dea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Thu Jun 22 14:23:27 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:23:27 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v7] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 22 Jun 2023 14:16:28 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert Cast Alright, I've addressed the issues brought up by the reviews, including the GetDIBits cast. In general I left everything that passes into a Java call at least once as a jint, and everything else as a native type ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1602725175 From djelinski at openjdk.org Thu Jun 22 14:32:10 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 22 Jun 2023 14:32:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v8] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 22 Jun 2023 14:23:25 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > GetDIBits should take an LPVOID src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1089: > 1087: entry_point(); > 1088: colorBits = (jint*)safe_Malloc(MAX_ICON_SIZE * MAX_ICON_SIZE * sizeof(jint)); > 1089: GetDIBits(dc, iconInfo.hbmColor, 0, iconSize, (LPVOID) colorBits, &bmi, DIB_RGB_COLORS); I don't believe casting to LPVOID was what @prrace was asking here. The function takes a void* parameter because any other pointer type is implicitly convertible to that. We are using long/int/jint because we're asking for 32 bits per pixel (see biBitCount above); if we asked for 24/16/8/any other number of bits per pixel, we would have used a different pointer type. Please revert these casts. They don't add any value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238616005 From epeter at openjdk.org Thu Jun 22 14:34:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 Jun 2023 14:34:06 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 13:54:37 GMT, Roland Westrelin wrote: >>> > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >>> >>> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? >> >> Ran with: >> >> -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints >> >> I see a single counted loop, no uncommon trap in the IR. > >> @rwestrel so you think the `incr` can indeed overflow, and that is ok? Or would that be a bug? Why do we even have the loop limit check in the first place, if overflow is allowed? > > To guarantee no overflow requires init < limit (for a loop going up). Nothing guarantees that when c2 pattern matches a counted loop. Whether overflow is a problem or not would require taking a closer look at individual optimizations. @rwestrel ok. Well if overflow for the `incr` in indeed in general allowed, then my approach is fundamentally flawed. But maybe we can fix this specific case with your idea, of inserting a `CastII` for the limit, after we do the loop limit check, so that the check actually has an effect on the type of the limit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1602746932 From jwaters at openjdk.org Thu Jun 22 14:40:23 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 14:40:23 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert "GetDIBits should take an LPVOID" This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/7dbe5dea..84f8e08c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From djelinski at openjdk.org Thu Jun 22 14:53:05 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 22 Jun 2023 14:53:05 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 14:40:23 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert "GetDIBits should take an LPVOID" > > This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. Thanks. The code changes look good to me now. Some of the files have old copyright, please update them to 2023 before integrating. ------------- Marked as reviewed by djelinski (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1493384042 From jwaters at openjdk.org Thu Jun 22 15:12:06 2023 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 22 Jun 2023 15:12:06 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 14:40:23 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert "GetDIBits should take an LPVOID" > > This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. Will do, thanks Daniel @prrace @dholmes-ora Are both of you happy with the changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1602811926 From dqu at openjdk.org Thu Jun 22 15:56:14 2023 From: dqu at openjdk.org (Daohan Qu) Date: Thu, 22 Jun 2023 15:56:14 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used Message-ID: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> The failure recording for `retry_class_loading_during_parsing()` is removed in [8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). ------------- Commit messages: - Update copyright year - Remove some unused code Changes: https://git.openjdk.org/jdk/pull/14615/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14615&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310581 Stats: 15 lines in 3 files changed: 0 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14615.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14615/head:pull/14615 PR: https://git.openjdk.org/jdk/pull/14615 From roland at openjdk.org Thu Jun 22 16:24:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 Jun 2023 16:24:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 17:25:38 GMT, Volker Simonis wrote: > This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): > > > # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 > # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer > # > # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) > ... > Current CompileTask: > C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) > > Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) > V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) > V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) > V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) > V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) > V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) > V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) > V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) > V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) > V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) > V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) > V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) > V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) > V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) > V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) > V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) > V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) > V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) > V [libjvm.so+0x1230036] thread_native_entry(Thread*)+0x1a5 (os_linux.cpp:778) > ... > ``` > > `SubTypeC... @simonis I reproduced it and I'm taking a closer look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1602965165 From vlivanov at openjdk.org Thu Jun 22 18:02:04 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 Jun 2023 18:02:04 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> Message-ID: On Wed, 21 Jun 2023 07:02:35 GMT, Roland Westrelin wrote: > Late expansion allows some optimizations to trigger that wouldn't otherwise. I agree that early expansion may hinder optimizations for polymorphic case. But bimorphic case may enrich the receiver with more precise type information. > Carrying the JVM state in the SubTypeCheck looks like too much extra complexity to me. Yes, I agree. > It felt easier in terms of memory management. If we have some extra data embedded in the SubTypeCheck node, is it a pointer or the full data structure? `ciCallProfile` has fixed size and is passed by value. Embedding the whole structure inside `SubTypeCheck` doesn't look problematic. It refers to CI entities which should be kept alive for the duration of the compilation. > One thing to consider is that we don't necessarily want to common 2 SubTypeCheck nodes with the same object/super kass inputs. I thought about commoning of SubTypeCheck nodes as well and there are other cases when it may be undesireable. For example, when 2 checks are performed in 2 subbranches with low frequencies, we don't want to place commoned check in a hot dominating block. I'd prefer to see `SubTypeCheck` to have control input which is explicitly relaxed to accommodate commoning. Overall, I'm fine with late expansion of profile-guided type checks for now, but embedding profile data info SubTypeCheck should significantly simplify the patch without compromising the benefits. Also, enhancing profiling support separately may be a viable tradeoff as well. Inaccuracies in code shape classification don't look like a critical issue when the guards are introduced during macro expansion. We can explore other optimization opportunities later (e.g., profile for superclasses may help reflection case). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1603092517 From vlivanov at openjdk.org Thu Jun 22 19:40:07 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 Jun 2023 19:40:07 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 02:30:53 GMT, Quan Anh Mai wrote: > Is there a way to stress-test the registers? As an idea for such a stress test mode, is it possible to make `regF`/`vlRegF`, `regD`/`vlRegD` (and `vec`/`legVec` family of register classes) disjoint sets (`xmm0-xmm15` and `xmm16-xmm31`)? It should be enough to trigger relevant asserts whenever an AD instruction is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1603211564 From aivanov at openjdk.org Thu Jun 22 20:08:09 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Thu, 22 Jun 2023 20:08:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 14:40:23 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert "GetDIBits should take an LPVOID" > > This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. Looks fine to me, except for a few comments. The size of the types for `jint` and `jlong` remains the same after amending the typedef `jni_md.h`. Yet I'm still cautious about it. You should have an approval from hotspot. Please also update the copyright year in the modified files. src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 325: > 323: } > 324: > 325: jint sx, sy, ex, ey; I agree with David, these should have type `int` as accepted by the [`::Arc`](https://learn.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-arc) function. However, the `AngleToCoord` function is declared with `jint` as parameters. https://github.com/openjdk/jdk/blob/84f8e08c2ecc90ec50a13406fb99b8cd52f33b7c/src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp#L52 Its declaration can be changed to `int`, it's an internal function used by `*_doDrawArc` and `*_doFillArc`. src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1084: > 1082: > 1083: jint *colorBits = nullptr; > 1084: int *maskBits = nullptr; Suggestion: jint *colorBits = NULL; int *maskBits = NULL; I'd rather keep `NULL` ? it's used consistently inside `_Win32ShellFolder2_getIconBits` function as well as through the file, so `nullptr` is out of place. src/java.desktop/windows/native/libawt/windows/awt_MenuBar.cpp line 148: > 146: } > 147: > 148: AwtMenuItem* AwtMenuBar::GetItem(jobject target, jint index) What is the reason for using `jint` instead of `int`? The member function is used in for-loop which iterates with `int` loop variable. Yet the implementation of `GetItem` up-calls into Java. ------------- PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1493854643 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238938194 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238948758 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238966069 From aivanov at openjdk.org Thu Jun 22 20:08:10 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Thu, 22 Jun 2023 20:08:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Thu, 25 May 2023 01:30:34 GMT, David Holmes wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "GetDIBits should take an LPVOID" >> >> This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. > > src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 605: > >> 603: return; >> 604: } >> 605: jint sx, sy, ex, ey; > > Again these don't seem to need to be Java types. I've got the same concern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238941743 From aivanov at openjdk.org Thu Jun 22 20:08:12 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Thu, 22 Jun 2023 20:08:12 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 19:33:17 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "GetDIBits should take an LPVOID" >> >> This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. > > src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1084: > >> 1082: >> 1083: jint *colorBits = nullptr; >> 1084: int *maskBits = nullptr; > > Suggestion: > > jint *colorBits = NULL; > int *maskBits = NULL; > > I'd rather keep `NULL` ? it's used consistently inside `_Win32ShellFolder2_getIconBits` function as well as through the file, so `nullptr` is out of place. The type of `colorBits` is `jint` because it's set to `iconBits` array, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1238952370 From jwaters at openjdk.org Fri Jun 23 00:16:09 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 00:16:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 19:20:09 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "GetDIBits should take an LPVOID" >> >> This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. > > src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 325: > >> 323: } >> 324: >> 325: jint sx, sy, ex, ey; > > I agree with David, these should have type `int` as accepted by the [`::Arc`](https://learn.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-arc) function. However, the `AngleToCoord` function is declared with `jint` as parameters. > > https://github.com/openjdk/jdk/blob/84f8e08c2ecc90ec50a13406fb99b8cd52f33b7c/src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp#L52 > > Its declaration can be changed to `int`, it's an internal function used by `*_doDrawArc` and `*_doFillArc`. Resolved, will push soon ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239142969 From jwaters at openjdk.org Fri Jun 23 00:16:10 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 00:16:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <9RJ8-w0M5_3QeoXfshnOhtwVvi7g89XaOH35RTFKIZg=.8b19b043-0b16-4f28-a124-22ee2e644940@github.com> On Thu, 22 Jun 2023 19:24:29 GMT, Alexey Ivanov wrote: >> src/java.desktop/windows/native/libawt/java2d/windows/GDIRenderer.cpp line 605: >> >>> 603: return; >>> 604: } >>> 605: jint sx, sy, ex, ey; >> >> Again these don't seem to need to be Java types. > > I've got the same concern. Resolved, pushing soon ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239143080 From jwaters at openjdk.org Fri Jun 23 00:16:13 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 00:16:13 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: On Thu, 22 Jun 2023 19:37:56 GMT, Alexey Ivanov wrote: >> src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1084: >> >>> 1082: >>> 1083: jint *colorBits = nullptr; >>> 1084: int *maskBits = nullptr; >> >> Suggestion: >> >> jint *colorBits = NULL; >> int *maskBits = NULL; >> >> I'd rather keep `NULL` ? it's used consistently inside `_Win32ShellFolder2_getIconBits` function as well as through the file, so `nullptr` is out of place. > > The type of `colorBits` is `jint` because it's set to `iconBits` array, right? Yes, colorBits is a jint because of that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239143202 From jwaters at openjdk.org Fri Jun 23 00:19:18 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 00:19:18 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> Message-ID: <42a4Nj_iDfQRh-eOXo4PSl7eag1EVv9JW1y2Uvqt2vg=.1dac1318-9637-46f5-9c40-f090c9e2640e@github.com> On Thu, 22 Jun 2023 19:51:42 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "GetDIBits should take an LPVOID" >> >> This reverts commit 7dbe5dea84b1afb2235b66da581bcd3c1da4d6ac. > > src/java.desktop/windows/native/libawt/windows/awt_MenuBar.cpp line 148: > >> 146: } >> 147: >> 148: AwtMenuItem* AwtMenuBar::GetItem(jobject target, jint index) > > What is the reason for using `jint` instead of `int`? > > The member function is used in for-loop which iterates with `int` loop variable. Yet the implementation of `GetItem` up-calls into Java. I had it as a jint since it upcalls into Java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239144537 From jwaters at openjdk.org Fri Jun 23 00:31:28 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 00:31:28 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v10] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Fixups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/84f8e08c..80b6f787 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=08-09 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From kvn at openjdk.org Fri Jun 23 01:00:02 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 Jun 2023 01:00:02 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: On Thu, 22 Jun 2023 15:48:42 GMT, Daohan Qu wrote: > The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. > > Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14615#pullrequestreview-1494178441 From dqu at openjdk.org Fri Jun 23 01:04:01 2023 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 23 Jun 2023 01:04:01 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: <4yIezcB_jE2Oj3R8AIBJhj4FHH83cbP14Mh0_NqHkWk=.90e8b00d-51f1-4072-ab95-94ea1ef2b5aa@github.com> On Fri, 23 Jun 2023 00:57:32 GMT, Vladimir Kozlov wrote: >> The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. >> >> Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). > > Good. Thanks for your review @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14615#issuecomment-1603492184 From dqu at openjdk.org Fri Jun 23 01:07:03 2023 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 23 Jun 2023 01:07:03 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: On Thu, 22 Jun 2023 15:48:42 GMT, Daohan Qu wrote: > The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. > > Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). There are two failures, but one in langtools seems unrelated while another in serviceability should have been fixed in another PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14615#issuecomment-1603494380 From dholmes at openjdk.org Fri Jun 23 02:08:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 23 Jun 2023 02:08:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v10] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Fri, 23 Jun 2023 00:31:28 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixups Hotspot changes still approved. Other changes seem okay to me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1494291399 From jwaters at openjdk.org Fri Jun 23 02:12:07 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 02:12:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v10] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Fri, 23 Jun 2023 00:31:28 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Fixups Alright, I'll integrate once Alexsey approve, anyone else has further objections? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1603594332 From jwaters at openjdk.org Fri Jun 23 02:38:13 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 02:38:13 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with two additional commits since the last revision: - Revert wrong Copyright - Copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/80b6f787..16b5a914 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=09-10 Stats: 7 lines in 7 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From chagedorn at openjdk.org Fri Jun 23 06:13:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jun 2023 06:13:05 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 12:58:23 GMT, Emanuel Peter wrote: >> Removed a spurious assert before optimization bailout. >> >> I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. >> >> I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. >> >> I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. >> Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > removed unnecessary flags from test Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14494#pullrequestreview-1494450478 From chagedorn at openjdk.org Fri Jun 23 06:13:08 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jun 2023 06:13:08 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 12:51:06 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestUnorderedReductionPartialVectorization.java line 43: >> >>> 41: public static void main(String[] args) { >>> 42: TestFramework.runWithFlags("-Xbatch", "-XX:-TieredCompilation", >>> 43: "-XX:CompileCommand=compileonly,compiler.loopopts.superword.TestUnorderedReductionPartialVectorization::test*"); >> >> It should probably also trigger without the flags by just specifying `TestFramework.run()` as `test1()` is not using other methods and the IR framework will implicitly use `-Xbatch` and wait for the compilation of `test1()` to be finished. > > @chhagedorn I was able to remove them, thanks for the catch! Nice! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14494#discussion_r1239358447 From jwaters at openjdk.org Fri Jun 23 06:16:08 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 06:16:08 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 8 Jun 2023 11:20:05 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the code that is actually warning > > I'll take a look? hopefully next week. Wait a minute, I was right, it was a jint the whole time! Oh well, I'll wait for what @aivanov-jdk has to say, but I don't like the idea of leaving both inconsistent ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1603730539 From djelinski at openjdk.org Fri Jun 23 06:13:07 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 23 Jun 2023 06:13:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Fri, 23 Jun 2023 02:38:13 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Revert wrong Copyright > - Copyright src/java.desktop/windows/native/libawt/windows/awt_Menu.h line 76: > 74: /*for multifont menu */ > 75: BOOL IsTopMenu(); > 76: virtual AwtMenuItem* GetItem(jobject target, int index); Hi @aivanov-jdk are you OK leaving this inconsistent with the definition? https://github.com/openjdk/jdk/blob/16b5a91461db1765e2e7596ebaaf1299cec9b0c8/src/java.desktop/windows/native/libawt/windows/awt_Menu.cpp#L261 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239358334 From thartmann at openjdk.org Fri Jun 23 06:35:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 06:35:24 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: On Thu, 22 Jun 2023 15:48:42 GMT, Daohan Qu wrote: > The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. > > Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14615#pullrequestreview-1494473622 From dqu at openjdk.org Fri Jun 23 06:35:24 2023 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 23 Jun 2023 06:35:24 GMT Subject: Integrated: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: On Thu, 22 Jun 2023 15:48:42 GMT, Daohan Qu wrote: > The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. > > Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). This pull request has now been integrated. Changeset: 47728931 Author: Daohan Qu Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/47728931274ec7f58b06c463125ef40338aa4fba Stats: 15 lines in 3 files changed: 0 ins; 10 del; 5 mod 8310581: retry_class_loading_during_parsing() is not used Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14615 From thartmann at openjdk.org Fri Jun 23 06:56:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 06:56:03 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 16 Jun 2023 15:59:40 GMT, Roland Westrelin wrote: > The test contains a loop nest with 2 loops. The outer loop is an > irreducible loop. The safepoint for that loop is also in the inner > loop. Because `IdealLoopTree::check_safepts()` ignores irreducible > loops, that safepoint is not marked as being required and is > eliminated from the inner loop. The inner loop is then optimized out > and the outer loop becomes an infinite loop with no safepoint (a > single node loop). That, in turn, causes the loop to be eliminated > because it has not use and the assert fires. > > The fix I propose is to make `IdealLoopTree::check_safepts()` work > with irreducible loops. I think > `IdealLoopTree::allpaths_check_safepts()` can be used for that. When > working on this I wondered if that method could be called with a loop > whose head has more than 3 inputs. I couldn't write a test case with > an irreducible loop whose head had more than 3 inputs but I added an > assert in the method and ran some testing. That assert fired so I also > propose to tweak the method so it's robust in that case. Looks reasonable to me. All tests passed. @eme64 Please have a look as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14522#pullrequestreview-1494501044 From dqu at openjdk.org Fri Jun 23 06:58:10 2023 From: dqu at openjdk.org (Daohan Qu) Date: Fri, 23 Jun 2023 06:58:10 GMT Subject: RFR: 8310581: retry_class_loading_during_parsing() is not used In-Reply-To: References: <-RIqOOP0dJyH45SaTsWmxU2992gghDLjS73mvqFE57E=.f9297028-4963-47d2-82b8-5be26e8100eb@github.com> Message-ID: On Fri, 23 Jun 2023 06:30:43 GMT, Tobias Hartmann wrote: >> The failure recording for `retry_class_loading_during_parsing()` is removed in [JDK-8222446](https://bugs.openjdk.org/browse/JDK-8222446). As it is never used to set failure reason in the current code base, this function and the code checking for it should be removed. >> >> Test `tier1-3` for release build on Linux x86-64 has done. The test failure is irrelevant to this patch (which is [JDK-8309214](https://bugs.openjdk.org/browse/JDK-8309214)). > > Looks good to me too. Thanks for your review @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14615#issuecomment-1603765939 From thartmann at openjdk.org Fri Jun 23 07:07:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 07:07:07 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 10:35:25 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Be conservative in sizing to be close to original behavior Good catch! Looks good to me otherwise. src/hotspot/share/opto/reg_split.cpp line 575: > 573: // Keep track of DEFS & Phis for later passes > 574: Node_List defs{split_arena, 8}; > 575: Node_List phis{split_arena, 16}; Why do you use aggregate initialization instead of constructor invocation here? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1494513550 PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1239398494 From thartmann at openjdk.org Fri Jun 23 07:09:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 07:09:05 GMT Subject: RFR: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs [v3] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 08:32:51 GMT, Daniel Skantz wrote: >> src/hotspot/share/c1/c1_ValueMap.cpp line 367: >> >>> 365: bool _valid = true; >>> 366: >>> 367: void visit(Value* vp) { >> >> Since `Value` is already a pointer type, can't we use `Value v` here? > > I am not sure if this is possible without changing the ValueVisitor ([ref](https://github.com/openjdk/jdk/blob/a0595761ef35c4eec8cb84326a869b9473cd5bba/src/hotspot/share/c1/c1_Instruction.hpp#L123)) itself Right, makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14492#discussion_r1239402716 From thartmann at openjdk.org Fri Jun 23 07:17:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 07:17:16 GMT Subject: RFR: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs [v3] In-Reply-To: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> References: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> Message-ID: On Tue, 20 Jun 2023 10:21:51 GMT, Daniel Skantz wrote: >> ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. >> >> Testing: tier1-tier3. >> >> Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. >> Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. >> >> Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. >> >> ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test #iterations Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14492#pullrequestreview-1494562984 From tobias.hartmann at oracle.com Fri Jun 23 07:26:38 2023 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 23 Jun 2023 09:26:38 +0200 Subject: Question regarding ReplayCompiles and multiple inlining In-Reply-To: References: Message-ID: Should we file an RFE for this or is this already tracked? Thanks, Tobias On 22.06.23 08:35, dean.long at oracle.com wrote: > I noticed this problem before too.? Unfortunately I can't think of a workaround.? It seems like the > right fix is to change the replay file format to record more information. > > dl > > On 6/19/23 5:07 AM, Volker Simonis wrote: >> Hi, >> >> I try to reproduce a compiler issue with a ReplayDataFile but >> unfortunately I can't reproduce the crash. >> >> I hacked the VM to print out the the inlining tree just before the >> crashes and realized that the original inlining differes from the >> inlining done by ReplayCompiles. >> >> In my specific case I have the following inlining pattern during the >> crash (`foo::f1()` gets inlined twice into `foo::f0() `): >> ???? . >> ???? . >> ?? @ 57??? foo::f0()??? inline (hot) >> ???? @ 48??? foo::f1()??? inline (hot) >> ?????? @ 2??? bar::f2()??? inline (hot) >> ???????? . >> ???????? . >> ???? @ 48??? foo::f1()??? inline (hot) >> ?????? @ 2??? bar::f2()?? NodeCountInliningCutoff >> >> In the ReplayDataFile (in the `inline` part of the `compile` line) >> both, `foo::f1()` and `bar::f2()` are recorded only once (because they >> have the same? bci, name/signature and inlining depth). >> >> When running the replay, I get the following inlining pattern: >> ???? . >> ???? . >> ?? @ 57??? foo::f0()??? force inline by ciReplay >> ???? @ 48??? foo::f1()??? force inline by ciReplay >> ?????? @ 2??? bar::f2()??? force inline by ciReplay >> ???????? . >> ???????? . >> ???? @ 48??? foo::f1()??? force inline by ciReplay >> ?????? @ 2??? bar::f2()??? force inline by ciReplay >> >> This is clearly different because in the replay we inline `bar::f2()` >> a second time (while in the original run it was skipped due to >> NodeCountInliningCutoff). >> >> ?From looking at `find_ciInlineRecord()` [1], it looks like the replay >> file only records the bci, inlining depth and method name/signature >> for an inlinee? How is this supposed to work if a method is inlined >> differently at the same level like in this example? >> >> Notice that I'm currently working with JDK 17 (because my problem >> doesn't reproduce with HEAD) but it seems the relevant code hasn't >> changed much in this area since JDK 17. >> >> Please let me know if this is a known problem and if there's any way >> to workaround it? >> >> Thank you and best regards, >> Volker >> >> [1] https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp#L992 From duke at openjdk.org Fri Jun 23 07:28:13 2023 From: duke at openjdk.org (Eric Nothum) Date: Fri, 23 Jun 2023 07:28:13 GMT Subject: Integrated: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> References: <681jnmqV4t-mOXbUXmwIkqVfl1g2Gi0ac49zs_TWK5Q=.e6043aaa-77dc-4d23-8658-20f5f41f000c@github.com> Message-ID: <1rAdXohzs4TWozUyNJBvqkBiyAPqqw_dAxdK_OpKOmc=.c31e69d0-7cbf-4c72-a740-a74926da376d@github.com> On Wed, 21 Jun 2023 11:29:51 GMT, Eric Nothum wrote: > Removed TLAB from the IR Framework whitelist. If TLAB allocations are disabled by `-XX:-UseTLAB` the IR verification can fail, therefore `"TLAB"` should not be withelisted. See [JDK-8295210](https://bugs.openjdk.org/browse/JDK-8295210) for an example of such a failure. This pull request has now been integrated. Changeset: 31dcda5d Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/31dcda5d67c90ecd571b0a943bcedc0bfe3f1fba Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8295210: IR framework should not whitelist -XX:-UseTLAB Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14583 From epeter at openjdk.org Fri Jun 23 07:37:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 07:37:23 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up Message-ID: Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: NoSafepointVerifier nsv; ttyLocker ttyl; The verifier triggered immediately, as expected. And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. **Question** Maybe we should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? **Testing** I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. ------------- Commit messages: - renamed _st to _output - make sure CloneMap info is also printed to stream, and not to tty - add locker again, and refactor - write to xtty or tty - add tty locker again - buffer xtty and tty - 8306922: IR verification fails because IR dump is chopped up Changes: https://git.openjdk.org/jdk/pull/14591/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306922 Stats: 168 lines in 8 files changed: 24 ins; 4 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/14591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14591/head:pull/14591 PR: https://git.openjdk.org/jdk/pull/14591 From chagedorn at openjdk.org Fri Jun 23 07:41:06 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jun 2023 07:41:06 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 14:43:18 GMT, Emanuel Peter wrote: > Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. > > The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. > > What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: > > > NoSafepointVerifier nsv; > ttyLocker ttyl; > > > The verifier triggered immediately, as expected. > And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. > > We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. > > I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. > > I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. > > **Question** > > Maybe we should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? > > **Testing** > > I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. Looks good, thanks for cleaning this up! I think it's a good idea to also check the other usages of `ttyLocker` separately. Can you also file an RFE to remove the "safepoint while printing handling" in the IR framework? This is no longer needed with this patch but would exceed the scope of this fix. You can assign that to me. src/hotspot/share/opto/compile.cpp line 559: > 557: // be sure to tag this tty output with the compile ID. > 558: > 559: // Node dumping can cause a safepoint, which can break the ttyLocker. Suggestion: // Node dumping can cause a safepoint, which can break the tty lock. src/hotspot/share/opto/output.cpp line 2084: > 2082: void PhaseOutput::print_scheduling() { > 2083: print_scheduling(tty); > 2084: } New line: Suggestion: } test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 49: > 47: "-XX:+UseCMoveUnconditionally", > 48: "-XX:+UseVectorCmov", > 49: "-XX:CompileCommand=compileonly,compiler.c2.irTests.TestVectorConditionalMove::test*"); `compileonly` is probably not necessary but you can leave it as such. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14591#pullrequestreview-1494608279 PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239467929 PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239471742 PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239473229 From epeter at openjdk.org Fri Jun 23 07:48:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 07:48:14 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v2] In-Reply-To: References: Message-ID: > Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. > > The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. > > What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: > > > NoSafepointVerifier nsv; > ttyLocker ttyl; > > > The verifier triggered immediately, as expected. > And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. > > We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. > > I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. > > I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. > > **Question** > > Maybe we should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? > > **Testing** > > I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14591/files - new: https://git.openjdk.org/jdk/pull/14591/files/b61a0767..1cf4d545 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=00-01 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14591/head:pull/14591 PR: https://git.openjdk.org/jdk/pull/14591 From epeter at openjdk.org Fri Jun 23 07:57:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 07:57:21 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v3] In-Reply-To: References: Message-ID: > Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. > > The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. > > What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: > > > NoSafepointVerifier nsv; > ttyLocker ttyl; > > > The verifier triggered immediately, as expected. > And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. > > We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. > > I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. > > I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. > > **Question** > > Maybe we should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? > > **Testing** > > I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into JDK-8306922 - removed unnecessary flags from TestVectorConditionalMove.java - Apply suggestions from code review Co-authored-by: Christian Hagedorn - renamed _st to _output - make sure CloneMap info is also printed to stream, and not to tty - add locker again, and refactor - write to xtty or tty - add tty locker again - buffer xtty and tty - 8306922: IR verification fails because IR dump is chopped up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14591/files - new: https://git.openjdk.org/jdk/pull/14591/files/1cf4d545..bca07373 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=01-02 Stats: 906 lines in 158 files changed: 199 ins; 203 del; 504 mod Patch: https://git.openjdk.org/jdk/pull/14591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14591/head:pull/14591 PR: https://git.openjdk.org/jdk/pull/14591 From epeter at openjdk.org Fri Jun 23 07:57:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 07:57:23 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v3] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 07:31:39 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8306922 >> - removed unnecessary flags from TestVectorConditionalMove.java >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - renamed _st to _output >> - make sure CloneMap info is also printed to stream, and not to tty >> - add locker again, and refactor >> - write to xtty or tty >> - add tty locker again >> - buffer xtty and tty >> - 8306922: IR verification fails because IR dump is chopped up > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 49: > >> 47: "-XX:+UseCMoveUnconditionally", >> 48: "-XX:+UseVectorCmov", >> 49: "-XX:CompileCommand=compileonly,compiler.c2.irTests.TestVectorConditionalMove::test*"); > > `compileonly` is probably not necessary but you can leave it as such. Also removed the `-XX:-TieredCompilation` :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239491397 From chagedorn at openjdk.org Fri Jun 23 08:05:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 Jun 2023 08:05:05 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v3] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 07:57:21 GMT, Emanuel Peter wrote: >> Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. >> >> The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. >> >> What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: >> >> >> NoSafepointVerifier nsv; >> ttyLocker ttyl; >> >> >> The verifier triggered immediately, as expected. >> And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. >> >> We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. >> >> I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. >> >> I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. >> >> **Question** >> >> Maybe we should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? >> >> **Testing** >> >> I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8306922 > - removed unnecessary flags from TestVectorConditionalMove.java > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - renamed _st to _output > - make sure CloneMap info is also printed to stream, and not to tty > - add locker again, and refactor > - write to xtty or tty > - add tty locker again > - buffer xtty and tty > - 8306922: IR verification fails because IR dump is chopped up Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14591#pullrequestreview-1494659586 From thartmann at openjdk.org Fri Jun 23 08:11:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 Jun 2023 08:11:06 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v3] In-Reply-To: References: Message-ID: <-BFA6v9R2TTkkXpEZx902fWAfAnqhLuVdQVtwbpUYxM=.73885f02-8507-4d6f-b72a-9d272ea2db38@github.com> On Fri, 23 Jun 2023 07:57:21 GMT, Emanuel Peter wrote: >> Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. >> >> The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. >> >> What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: >> >> >> NoSafepointVerifier nsv; >> ttyLocker ttyl; >> >> >> The verifier triggered immediately, as expected. >> And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. >> >> We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. >> >> I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. >> >> I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. >> >> **Follow-up work** >> >> [JDK-8310712](https://bugs.openjdk.org/browse/JDK-8310712) C2: check for broken tty locks due to SafePoint >> We should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken. Or maybe we just build the verifier into the ttyLocker? >> >> [JDK-8310711](https://bugs.openjdk.org/browse/JDK-8310711) IR Framework: remove safepoint while printing handling >> >> **Testing** >> >> I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into JDK-8306922 > - removed unnecessary flags from TestVectorConditionalMove.java > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - renamed _st to _output > - make sure CloneMap info is also printed to stream, and not to tty > - add locker again, and refactor > - write to xtty or tty > - add tty locker again > - buffer xtty and tty > - 8306922: IR verification fails because IR dump is chopped up Looks good to me otherwise. > Maybe we should in general go through all uses of ttyLocker, and add a NoSafepointVerifier to ensure no such lock is broken? Or maybe we just build the verifier into the ttyLocker? Should I file an RFE? Yes, I think it would be good to file an RFE to investigate the impact of adding a NoSafepointVerifier to the ttyLocker. src/hotspot/share/opto/compile.cpp line 561: > 559: // Node dumping can cause a safepoint, which can break the tty lock. > 560: // Buffer all node dumps, so that all safepoints happen before we lock. > 561: stringStream ss; Should we add a `ResourceMark rm;`? src/hotspot/share/opto/output.cpp line 2072: > 2070: if (C->trace_opto_output()) { > 2071: // Buffer and print all at once > 2072: stringStream ss; Should we add a `ResourceMark rm;`? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14591#pullrequestreview-1494665517 PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239506027 PR Review Comment: https://git.openjdk.org/jdk/pull/14591#discussion_r1239506853 From epeter at openjdk.org Fri Jun 23 08:41:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 08:41:06 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 16 Jun 2023 15:59:40 GMT, Roland Westrelin wrote: > The test contains a loop nest with 2 loops. The outer loop is an > irreducible loop. The safepoint for that loop is also in the inner > loop. Because `IdealLoopTree::check_safepts()` ignores irreducible > loops, that safepoint is not marked as being required and is > eliminated from the inner loop. The inner loop is then optimized out > and the outer loop becomes an infinite loop with no safepoint (a > single node loop). That, in turn, causes the loop to be eliminated > because it has not use and the assert fires. > > The fix I propose is to make `IdealLoopTree::check_safepts()` work > with irreducible loops. I think > `IdealLoopTree::allpaths_check_safepts()` can be used for that. When > working on this I wondered if that method could be called with a loop > whose head has more than 3 inputs. I couldn't write a test case with > an irreducible loop whose head had more than 3 inputs but I added an > assert in the method and ran some testing. That assert fired so I also > propose to tweak the method so it's robust in that case. src/hotspot/share/opto/loopnode.cpp line 3515: > 3513: // Allpaths backwards scan from loop tail, terminating each path at first safepoint > 3514: // encountered. Helper for check_safepts. > 3515: void IdealLoopTree::allpaths_check_safepts(VectorSet &visited, Node_List &stack) { @rwestrel you should update the description here. Suggestion: Allpaths backwards scan. Starting at the head, traversing all backedges, and the body. Terminating each path at first safepoint encountered. Helper for check_safepts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14522#discussion_r1239537606 From epeter at openjdk.org Fri Jun 23 08:44:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 08:44:16 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 23 Jun 2023 08:38:09 GMT, Emanuel Peter wrote: >> The test contains a loop nest with 2 loops. The outer loop is an >> irreducible loop. The safepoint for that loop is also in the inner >> loop. Because `IdealLoopTree::check_safepts()` ignores irreducible >> loops, that safepoint is not marked as being required and is >> eliminated from the inner loop. The inner loop is then optimized out >> and the outer loop becomes an infinite loop with no safepoint (a >> single node loop). That, in turn, causes the loop to be eliminated >> because it has not use and the assert fires. >> >> The fix I propose is to make `IdealLoopTree::check_safepts()` work >> with irreducible loops. I think >> `IdealLoopTree::allpaths_check_safepts()` can be used for that. When >> working on this I wondered if that method could be called with a loop >> whose head has more than 3 inputs. I couldn't write a test case with >> an irreducible loop whose head had more than 3 inputs but I added an >> assert in the method and ran some testing. That assert fired so I also >> propose to tweak the method so it's robust in that case. > > src/hotspot/share/opto/loopnode.cpp line 3515: > >> 3513: // Allpaths backwards scan from loop tail, terminating each path at first safepoint >> 3514: // encountered. Helper for check_safepts. >> 3515: void IdealLoopTree::allpaths_check_safepts(VectorSet &visited, Node_List &stack) { > > @rwestrel you should update the description here. Suggestion: > > Allpaths backwards scan. Starting at the head, traversing all backedges, and the body. Terminating each path at first safepoint encountered. Helper for check_safepts. Also the line above is not accurate enough anymore: `_required_safept->push(n); // save the one closest to the tail` For one: could there not be multiple such SafePoints? If so: what does it mean to take "the closest"? And we may have multiple backedges / tails, now that we allow irreducible loops. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14522#discussion_r1239540658 From epeter at openjdk.org Fri Jun 23 08:44:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 08:44:16 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 23 Jun 2023 08:41:09 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 3515: >> >>> 3513: // Allpaths backwards scan from loop tail, terminating each path at first safepoint >>> 3514: // encountered. Helper for check_safepts. >>> 3515: void IdealLoopTree::allpaths_check_safepts(VectorSet &visited, Node_List &stack) { >> >> @rwestrel you should update the description here. Suggestion: >> >> Allpaths backwards scan. Starting at the head, traversing all backedges, and the body. Terminating each path at first safepoint encountered. Helper for check_safepts. > > Also the line above is not accurate enough anymore: > `_required_safept->push(n); // save the one closest to the tail` > > For one: could there not be multiple such SafePoints? If so: what does it mean to take "the closest"? And we may have multiple backedges / tails, now that we allow irreducible loops. Other than that the fix looks reasonable to me, thanks for the fix @rwestrel ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14522#discussion_r1239541231 From epeter at openjdk.org Fri Jun 23 08:49:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 08:49:04 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. @fg1417 @jatin-bhateja What do you think about the consistency of arm / intel hardware flags for `SVE` and `AVX`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1603944788 From epeter at openjdk.org Fri Jun 23 08:55:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 08:55:03 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v4] In-Reply-To: References: Message-ID: > Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. > > The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. > > What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: > > > NoSafepointVerifier nsv; > ttyLocker ttyl; > > > The verifier triggered immediately, as expected. > And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. > > We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. > > I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. > > I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. > > **Follow-up work** > > [JDK-8310712](https://bugs.openjdk.org/browse/JDK-8310712) C2: check for broken tty locks due to SafePoint > We should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken. Or maybe we just build the verifier into the ttyLocker? > > [JDK-8310711](https://bugs.openjdk.org/browse/JDK-8310711) IR Framework: remove safepoint while printing handling > > **Testing** > > I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: adding Resource marks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14591/files - new: https://git.openjdk.org/jdk/pull/14591/files/bca07373..566a0b54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14591&range=02-03 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14591.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14591/head:pull/14591 PR: https://git.openjdk.org/jdk/pull/14591 From epeter at openjdk.org Fri Jun 23 09:14:07 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 09:14:07 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <8G37g4PRtyp7xkqAE8R9aP8FOqnaA6cQe9j-snEO0DU=.0c8bf019-55cc-4d88-8f11-c57dc76aea96@github.com> On Wed, 21 Jun 2023 08:25:03 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > ## Background & Problems > > Post loop vectorization takes advantage of vector mask (predicate) features of some hardware platforms, such as x86 AVX-512 and AArch64 SVE, to vectorize tail iterations of loops for better performance. The existing implementation in the C2 compiler has a long history. It was first implemented in [JDK-8153998](https://bugs.openjdk.org/browse/JDK-8153998) in 2016 under a C2's experimental feature PostLoopMultiversioning to support x86 AVX-512 vector masks. Due to insufficient maintenance, it had been broken for a very long time. Last year, We took over [JDK-8183390](https://bugs.openjdk.org/browse/JDK-8183390) to fix and re-enable this feature. Several issues were fixed and AArch64 vector mask support was added at that time. As we proposed to make post loop vectorization non-experimental in future JDK releases, we did some stress tests early in this year but found more problems inside. The problems include stability, maintainability and performance. > > 1. Stability > Multiple C2 crash or mis-compilation issues related to post loop vectorization were filed on JBS, including [JDK-8301657](https://bugs.openjdk.org/browse/JDK-8301657), [JDK-8301904](https://bugs.openjdk.org/browse/JDK-8301904), [JDK-8301944](https://bugs.openjdk.org/browse/JDK-8301944), [JDK-8304774](https://bugs.openjdk.org/browse/JDK-8304774), [JDK-8308949](https://bugs.openjdk.org/browse/JDK-8308949) and perhaps more with recent C2 patches. > > 2. Maintainability > The original implementation is based on multi-versioned post loops and the code is mixed in SuperWord. But post loop vectorization does not actually use the SLP algorithm. So there is a lot of special handling for post loops in current SuperWord code. As more and more features are added in SuperWord, the legacy code is becoming more and more difficult to maintain and extend. > > 3. Performance > Post loop vectorization was expected to bring obvious performance benefit for small iteration loops. But JMH tests showed it didn't. A main reason is that the multi-versioned vector post loop is jumped over from main loop's minimum-trip guard if the whole loop has very few iterations (read [JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084) to learn more). The previous implementation also has limited vectorization ability, such as it can only vectorize loop statements with single data size. > > ## About this patch > > The main idea of post loop vectorization is widening scalar operations in the post loop and adding vector mask... @pfustc Thanks already for the PR description and graphs! I'm going to look at this today, and give you some preliminary feedback. At a first glance it looks quite good, I'm especially happy that you moved things outside of SuperWord ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1603978076 From epeter at openjdk.org Fri Jun 23 09:51:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 09:51:03 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/loopnode.cpp line 2280: > 2278: if (!stride_is_con()) { > 2279: // Stride could be non-constant if a loop is vector masked > 2280: return 0; Could this break the assumption anywhere else that `stride_con != 0`? I fear that it may just silently succeed everywhere, or do checks like: if (stride_con() > 0) { // assume positive } else { // assume negative (now wrong!) } Might it be better to have an assert here, and do the `stride_is_con` checks at the call sites of `stride_con`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239617763 From epeter at openjdk.org Fri Jun 23 09:56:02 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 09:56:02 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/loopnode.cpp line 4688: > 4686: for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { > 4687: IdealLoopTree* lpt = iter.current(); > 4688: if (lpt->is_counted() && lpt->is_innermost()) { Is this applied to all innermost counted loops? Or only post-loops? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239622374 From epeter at openjdk.org Fri Jun 23 10:40:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 10:40:03 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 09:53:22 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/loopnode.cpp line 4688: > >> 4686: for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { >> 4687: IdealLoopTree* lpt = iter.current(); >> 4688: if (lpt->is_counted() && lpt->is_innermost()) { > > Is this applied to all innermost counted loops? Or only post-loops? Ah, you do the check inside. Why not lift it out and assert inside? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239661209 From epeter at openjdk.org Fri Jun 23 10:46:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 10:46:03 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/loopnode.hpp line 143: > 141: if (is_vector_masked()) { > 142: return false; > 143: } Does this mean that the post-loop has a `CountedLoop` node, but it does not adhere to the counted-loop assumptions, such as having a `incr`, `limit`, `phi` etc? With the old post-loop-vectorization, the LoopNode would always fold away, so it would disappear after IGVN. But now it would stick around, right? Could that turn out to be a problem? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239666832 From epeter at openjdk.org Fri Jun 23 10:55:07 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 10:55:07 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <6IkvVTm9e60qXwaID0EihRXlUielrryBWoTmYAp3PuU=.c624b13d-bc6d-4c79-86a6-72bda016b50f@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/loopnode.hpp line 775: > 773: > 774: void collect_loop_core_nodes(PhaseIdealLoop* phase, Unique_Node_List& wq) const; > 775: nit: why move it? src/hotspot/share/opto/superword.cpp line 179: > 177: assert(_packset.length() == 0, "packset must be empty"); > 178: success = SLP_extract(); > 179: if (PostLoopMultiversioning) { Could we now have an assert for `cl->is_main_loop()` at the beginning of `SuperWord::transform_loop`, and remove all checks for it in SuperWord? src/hotspot/share/opto/superword.cpp line 632: > 630: cl->set_slp_pack_count(_packset.length()); > 631: } > 632: } else { Again: Could we now have an assert for `cl->is_main_loop()` at the beginning of `SuperWord::SLP_extract`, and remove all checks for it in SuperWord? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239670056 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239672094 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239673798 From epeter at openjdk.org Fri Jun 23 10:59:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 10:59:08 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/superword.cpp line 3881: > 3879: } > 3880: > 3881: // Following is used outside superword optimization Could we move the whole SWPointer outside of SuperWord, into some "autovectorization.hpp" maybe? Because the SW of SWPointer means SuperWord, maybe a renaming could be good too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239676542 From epeter at openjdk.org Fri Jun 23 10:59:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 10:59:09 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 10:55:17 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/superword.cpp line 3881: > >> 3879: } >> 3880: >> 3881: // Following is used outside superword optimization > > Could we move the whole SWPointer outside of SuperWord, into some "autovectorization.hpp" maybe? Because the SW of SWPointer means SuperWord, maybe a renaming could be good too? If you are going to do that, I'd suggest doing this refactoring in a separate RFE. It would help in general with any future extension to auto-vectorization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239677815 From epeter at openjdk.org Fri Jun 23 11:05:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:05:05 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 10:56:40 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 3881: >> >>> 3879: } >>> 3880: >>> 3881: // Following is used outside superword optimization >> >> Could we move the whole SWPointer outside of SuperWord, into some "autovectorization.hpp" maybe? Because the SW of SWPointer means SuperWord, maybe a renaming could be good too? > > If you are going to do that, I'd suggest doing this refactoring in a separate RFE. It would help in general with any future extension to auto-vectorization. Can we untangle it completely from SuperWord? it seems you have made it optional, so yes. And maybe we can also make the trace flags like `_slp->is_trace_alignment()` independent? It would be nice to also be able to trace this for non SuperWord-contexts like post-loop masked vectoriaztion, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239682803 From epeter at openjdk.org Fri Jun 23 11:15:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:15:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/superword.cpp line 4297: > 4295: > 4296: bool SWPointer::Tracer::slp_trace_alignment() { > 4297: return _slp && _slp->is_trace_alignment(); Aha, here you wrap it. You have some uses above that could be replaced with this now. But again, even better would be if we had a general trace flag that could trace it for any context, not just SuperWord. src/hotspot/share/opto/superword.hpp line 251: > 249: int count_size(int size) { > 250: return _stats[exact_log2(size)]; > 251: } Add assert from `record_size`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239685755 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239691320 From epeter at openjdk.org Fri Jun 23 11:15:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:15:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 11:06:13 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/superword.cpp line 4297: > >> 4295: >> 4296: bool SWPointer::Tracer::slp_trace_alignment() { >> 4297: return _slp && _slp->is_trace_alignment(); > > Aha, here you wrap it. You have some uses above that could be replaced with this now. But again, even better would be if we had a general trace flag that could trace it for any context, not just SuperWord. After all, should the `VectorizeDebug` flag not apply everywhere? See `phase->C->directive()->VectorizeDebugOption`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239687685 From epeter at openjdk.org Fri Jun 23 11:15:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:15:11 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 11:08:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 4297: >> >>> 4295: >>> 4296: bool SWPointer::Tracer::slp_trace_alignment() { >>> 4297: return _slp && _slp->is_trace_alignment(); >> >> Aha, here you wrap it. You have some uses above that could be replaced with this now. But again, even better would be if we had a general trace flag that could trace it for any context, not just SuperWord. > > After all, should the `VectorizeDebug` flag not apply everywhere? See `phase->C->directive()->VectorizeDebugOption`. I'd also move this to some static functions in a potential "autovectorization.hpp", and move `_vector_loop_debug` there, together with all its `is_trace...` accessors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239688884 From epeter at openjdk.org Fri Jun 23 11:18:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:18:08 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/superword.hpp line 666: > 664: IdealLoopTree* lpt() const { return _lpt; } > 665: PhiNode* iv() const { > 666: return _slp ? _slp->iv() : _lpt->_head->as_CountedLoop()->phi()->as_Phi(); I'd suggest either cache it directly from `_lpt->_head->as_CountedLoop()->phi()->as_Phi()`, or just query it directly. Reduce dependence on `_slp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239693729 From epeter at openjdk.org Fri Jun 23 11:21:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:21:05 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/superword.hpp line 669: > 667: } > 668: > 669: void init(); This is just a helper function for the constructors, right? Maybe move it closer to them? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239697066 From epeter at openjdk.org Fri Jun 23 11:31:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:31:05 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vectornode.hpp line 1826: > 1824: class LoopVectorMaskNode : public TypeNode { > 1825: private: > 1826: int _max_trips; Add comment: what is this for exactly? Maybe consider adding more elaborate specification/description above the 3 node classes. General code style: I think we are trying to get away from the `//--------------NodeName/FunctionName-------` tags, so no need to add them anymore. src/hotspot/share/opto/vectornode.hpp line 1839: > 1837: virtual bool cmp(const Node& n) const { > 1838: return TypeNode::cmp(n) && > 1839: _max_trips == ((LoopVectorMaskNode&)n)._max_trips; Is this cast really safe? Can you use `as_LoopVectorMaskNode()` instead, so we at least have an assert if it fails to be true? I fear this may get us into undefined behavior otherwise... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239702895 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239706579 From epeter at openjdk.org Fri Jun 23 11:38:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:38:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 11:28:02 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vectornode.hpp line 1839: > >> 1837: virtual bool cmp(const Node& n) const { >> 1838: return TypeNode::cmp(n) && >> 1839: _max_trips == ((LoopVectorMaskNode&)n)._max_trips; > > Is this cast really safe? Can you use `as_LoopVectorMaskNode()` instead, so we at least have an assert if it fails to be true? I fear this may get us into undefined behavior otherwise... Oh dear, I just saw the same pattern in: bool TypeNode::cmp(const Node& n) const { return !Type::cmp(_type, ((TypeNode&)n)._type); } We should try to avoid doing that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239710563 From epeter at openjdk.org Fri Jun 23 11:38:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 11:38:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 11:33:05 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 1839: >> >>> 1837: virtual bool cmp(const Node& n) const { >>> 1838: return TypeNode::cmp(n) && >>> 1839: _max_trips == ((LoopVectorMaskNode&)n)._max_trips; >> >> Is this cast really safe? Can you use `as_LoopVectorMaskNode()` instead, so we at least have an assert if it fails to be true? I fear this may get us into undefined behavior otherwise... > > Oh dear, I just saw the same pattern in: > > bool TypeNode::cmp(const Node& n) const { > return !Type::cmp(_type, ((TypeNode&)n)._type); > } > > We should try to avoid doing that. Even if all callers currently ensure that `n` has the correct type, I'd say it is still not a great idea to cast without checking, at least in debug. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239711766 From epeter at openjdk.org Fri Jun 23 12:01:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:01:08 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <9J-XGP_2qSJT-EefUtvLMt1HzWHWgtvN3RmanPRDt0I=.71bc5695-7eab-4d29-8ff4-b20f28721247@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.hpp line 85: > 83: > 84: // Some node check utilities > 85: bool is_loop_iv(Node* n) { return n == _iv; } General code style comment, applies everywhere: add more `const` everywhere. To arguments, and the functions themselves, wherever possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239730798 From epeter at openjdk.org Fri Jun 23 12:11:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:11:08 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.hpp line 95: > 93: } > 94: return false; > 95: } Do you not want to do this sort of implementation in `SWPointer` instead? There are already methods like `scaled_iv_plus_offset`, so it would fit in next to that, right? src/hotspot/share/opto/vmaskloop.hpp line 97: > 95: } > 96: > 97: bool is_memory_phi(Node* n) { Looks like a helper method that could live in `node.hpp` or `cfgnode.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239736305 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239739004 From epeter at openjdk.org Fri Jun 23 12:11:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:11:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 12:07:55 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vmaskloop.hpp line 97: > >> 95: } >> 96: >> 97: bool is_memory_phi(Node* n) { > > Looks like a helper method that could live in `node.hpp` or `cfgnode.hpp`. SuperWord also makes similar checks, you could refactor those too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239739568 From epeter at openjdk.org Fri Jun 23 12:28:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:28:08 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 63: > 61: if (cl->is_vector_masked()) return; > 62: // Skip non-post loop > 63: if (!cl->is_post_loop()) return; Check before enterin, and assert here. src/hotspot/share/opto/vmaskloop.cpp line 71: > 69: if (cl->loopexit()->in(0) != cl) return; > 70: // Skip if some loop operations are pinned to the backedge > 71: if (cl->back_control()->outcnt() != 1) return; It would be interesting to have some trace flag that tells us why we bailed out here and did not do the post-loop vectorization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239751423 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239753834 From epeter at openjdk.org Fri Jun 23 12:40:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:40:06 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 172: > 170: if (idx != -1) { > 171: trace_msg(nullptr, "Loop has unreachable node while traversing from head"); > 172: return false; Can this ever happen? Or could you add an assert here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239765004 From epeter at openjdk.org Fri Jun 23 12:43:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 12:43:06 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <7yyO-hCumcyr6FV7kfvCBDffgvUY6gAKZzUMawxOzkI=.43755296-4d39-4eec-85af-b8c3e3ac1a92@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 104: > 102: _core_set.clear(); > 103: _body_set.clear(); > 104: _body_nodes.clear(); Would it make sense to somehow reserve the space, so that we do not allocate multiple times when growing these data structures later? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239767741 From epeter at openjdk.org Fri Jun 23 14:12:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 14:12:06 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. Tests are building... I already am getting this, from our build system: `Toolchain: clang (clang/LLVM from Xcode 12.4)`, for the `macosx-aarch64-...` builds. .../src/hotspot/share/opto/vmaskloop.cpp:970:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral] tty->vprint_cr(format, ap); That means we won't get any test coverage on those platforms from this test run. src/hotspot/share/opto/vmaskloop.cpp line 269: > 267: Node_List* worklist = new Node_List(_arena); > 268: if (!collect_statements_helper(store, MemNode::ValueIn, stmt, worklist)) { > 269: return false; Why does the `store` need special handling here? Can you not just throw it on the `worklist`? Would be nice to have the code be shorter ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1604335503 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239867900 From aivanov at openjdk.org Fri Jun 23 14:27:08 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 23 Jun 2023 14:27:08 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> On Fri, 23 Jun 2023 06:10:05 GMT, Daniel Jeli?ski wrote: >> Julian Waters has updated the pull request incrementally with two additional commits since the last revision: >> >> - Revert wrong Copyright >> - Copyright > > src/java.desktop/windows/native/libawt/windows/awt_Menu.h line 76: > >> 74: /*for multifont menu */ >> 75: BOOL IsTopMenu(); >> 76: virtual AwtMenuItem* GetItem(jobject target, int index); > > Hi @aivanov-jdk are you OK leaving this inconsistent with the definition? > https://github.com/openjdk/jdk/blob/16b5a91461db1765e2e7596ebaaf1299cec9b0c8/src/java.desktop/windows/native/libawt/windows/awt_Menu.cpp#L261 The declaration and implementation have to match. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239886709 From aivanov at openjdk.org Fri Jun 23 14:27:10 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 23 Jun 2023 14:27:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v9] In-Reply-To: <42a4Nj_iDfQRh-eOXo4PSl7eag1EVv9JW1y2Uvqt2vg=.1dac1318-9637-46f5-9c40-f090c9e2640e@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <8DgisSGbTfEW8SIvgeweqpmQz3xbVFDKsInXCBPklRI=.368ca26b-eaaa-4620-8c71-20b72deb931f@github.com> <42a4Nj_iDfQRh-eOXo4PSl7eag1EVv9JW1y2Uvqt2vg=.1dac1318-9637-46f5-9c40-f090c9e2640e@github.com> Message-ID: <2kn5RYAWrZokkv1fHoUC2FO0fS7Z1R4DQH2ws9Mrw88=.f8d7bd70-1a1c-4421-91d2-5296812b5561@github.com> On Fri, 23 Jun 2023 00:16:45 GMT, Julian Waters wrote: >> src/java.desktop/windows/native/libawt/windows/awt_MenuBar.cpp line 148: >> >>> 146: } >>> 147: >>> 148: AwtMenuItem* AwtMenuBar::GetItem(jobject target, jint index) >> >> What is the reason for using `jint` instead of `int`? >> >> The member function is used in for-loop which iterates with `int` loop variable. Yet the implementation of `GetItem` up-calls into Java. > > I had it as a jint since it upcalls into Java I am fine with either way as long as it compiles without warning and menu works as expected. I had to ask the question. Which way conveys the intention clearer? I don't know. I incline to using `int` as is in the latest version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239883713 From aivanov at openjdk.org Fri Jun 23 14:34:17 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 23 Jun 2023 14:34:17 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> Message-ID: On Fri, 23 Jun 2023 14:24:44 GMT, Alexey Ivanov wrote: >> src/java.desktop/windows/native/libawt/windows/awt_Menu.h line 76: >> >>> 74: /*for multifont menu */ >>> 75: BOOL IsTopMenu(); >>> 76: virtual AwtMenuItem* GetItem(jobject target, int index); >> >> Hi @aivanov-jdk are you OK leaving this inconsistent with the definition? >> https://github.com/openjdk/jdk/blob/16b5a91461db1765e2e7596ebaaf1299cec9b0c8/src/java.desktop/windows/native/libawt/windows/awt_Menu.cpp#L261 > > The declaration and implementation have to match. To minimise the number of changes, we can go for using `jint` in `AwtMenu::GetItem`. What do you thing, @djelinski and @TheShermanTanker? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239891473 From jwaters at openjdk.org Fri Jun 23 14:34:18 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 14:34:18 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> Message-ID: On Fri, 23 Jun 2023 14:28:51 GMT, Alexey Ivanov wrote: >> The declaration and implementation have to match. > > To minimise the number of changes, we can go for using `jint` in `AwtMenu::GetItem`. > > What do you thing, @djelinski and @TheShermanTanker? Hmm, I lean towards jint as I feel it conveys the fact that it is a Java parameter clearer, intuitively to me it makes sense that a Java integer type would still work in a C++ for loop in native code ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1239893754 From epeter at openjdk.org Fri Jun 23 14:42:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 14:42:11 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 214: > 212: } > 213: } else if (in->is_Phi()) { > 214: // 2) We don't support phi nodes except the iv phi of the loop Add: and memory phi's cannot be reached. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239903232 From epeter at openjdk.org Fri Jun 23 14:48:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 14:48:09 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 223: > 221: return true; > 222: } else { > 223: trace_msg(in, "Found unsupported memory load input"); This is a bit generic. Would be nice to have more specific info why it is "unsupported". See my example that hit it. src/hotspot/share/opto/vmaskloop.cpp line 548: > 546: // Check supported memory access via SWPointer. It's not supported if > 547: // 1) The constructed SWPointer is invalid > 548: // 2) Address is growing down (index scale * loop stride < 0) Is that a limitation that could be removed in the future? src/hotspot/share/opto/vmaskloop.cpp line 549: > 547: // 1) The constructed SWPointer is invalid > 548: // 2) Address is growing down (index scale * loop stride < 0) > 549: // 3) Memory access scale is different from data size I guess this could also be relaxed for strided accesses in the future? src/hotspot/share/opto/vmaskloop.cpp line 550: > 548: // 2) Address is growing down (index scale * loop stride < 0) > 549: // 3) Memory access scale is different from data size > 550: // 4) The loop increment node is on the SWPointer's node stack Why should the `incr` not be on the node stack? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239908943 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239911117 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239911846 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239910656 From epeter at openjdk.org Fri Jun 23 14:53:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 14:53:10 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <2TrGfJR8KnACIXUhFz95B5vxvIyveHOnN1CqcJAjPmw=.d8ce4661-7315-4132-a82f-023536c34234@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 337: > 335: // For load node, check if it has the same vector element size with > 336: // the bottom type of the statement > 337: if (!same_element_size(mem_type, stmt_bottom_type)) { Can this limitation be removed in the future? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239917583 From epeter at openjdk.org Fri Jun 23 14:59:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 14:59:11 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 14:53:59 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vmaskloop.cpp line 317: > >> 315: >> 316: // Find element basic type for each vectorization candidate node >> 317: bool VectorMaskedLoop::find_vector_element_types() { > > This is very similar to `SuperWord::compute_vector_element_type`. It would be nice to extract it from both and have some shared utility, right? Or is there a clear reason why the two are too different? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239921855 From cslucas at openjdk.org Fri Jun 23 15:03:19 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 23 Jun 2023 15:03:19 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Wed, 14 Jun 2023 20:19:58 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Merge branch 'openjdk:master' into rematerialization-of-merges >> - Rome minor refactorings. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> Catching up with master. >> - Address PR review 6: debug format output & some refactoring. >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe > > src/hotspot/share/opto/c2_globals.hpp line 473: > >> 471: " register allocation.") \ >> 472: \ >> 473: product(bool, ReduceAllocationMerges, true, \ > > I suggest to turn the flag into diagnostic one. There are much stricter requirements for product flags, so better to avoid introducing new ones. @iwanowww - I'm confused by what a "Diagnostic" flag is. According to [this documentation](https://wiki.openjdk.org/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process) "Diagnostic flags are not meant for VM tuning or for product modes. They are to be used for VM quality assurance or field diagnosis of VM bugs [...]" I believe the patch I'm proposing is a VM tuning optimization, so should it really be a diagnostic flag? Besides, I think we'll try _at a later moment_ to make this a product flag. Do you think an experimental flag is more appropriate? Thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1239928841 From epeter at openjdk.org Fri Jun 23 15:05:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 15:05:26 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: <2TrGfJR8KnACIXUhFz95B5vxvIyveHOnN1CqcJAjPmw=.d8ce4661-7315-4132-a82f-023536c34234@github.com> References: <2TrGfJR8KnACIXUhFz95B5vxvIyveHOnN1CqcJAjPmw=.d8ce4661-7315-4132-a82f-023536c34234@github.com> Message-ID: <3MS1QpICe_TjmJUb1_lznk4vrep4iJz-hHLOoMpO0OM=.1596c7f7-864c-4631-8145-e258da08282a@github.com> On Fri, 23 Jun 2023 14:50:32 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vmaskloop.cpp line 337: > >> 335: // For load node, check if it has the same vector element size with >> 336: // the bottom type of the statement >> 337: if (!same_element_size(mem_type, stmt_bottom_type)) { > > Can this limitation be removed in the future? Write: Vector element size does not match of the store in the statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239929598 From epeter at openjdk.org Fri Jun 23 15:05:27 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 15:05:27 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. src/hotspot/share/opto/vmaskloop.cpp line 357: > 355: set_elem_bt(node, mem_type); > 356: } else { > 357: trace_msg(node, "Subword operand does not have precise type"); Not clear to me what this means. src/hotspot/share/opto/vmaskloop.cpp line 367: > 365: BasicType self_type = node->bottom_type()->array_element_basic_type(); > 366: if (!same_element_size(self_type, stmt_bottom_type)) { > 367: trace_msg(node, "Vector element size does not match"); does not match with what? size of store of statement? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239931034 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1239931583 From roland at openjdk.org Fri Jun 23 15:17:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 23 Jun 2023 15:17:05 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> <5i3F6c867jOllvaioB2BiIEGb4L3GU3eIYeiYLDM0Hk=.32ebec08-24c2-4844-802f-6a4d74fdd5a0@github.com> Message-ID: <2CgmghpavsohtMBm2C6YfZwIo38JDH9TC3jw6DkZhrU=.3ff57c1f-9b4d-4141-986a-205559ffdbde@github.com> On Thu, 22 Jun 2023 17:59:00 GMT, Vladimir Ivanov wrote: > > It felt easier in terms of memory management. If we have some extra data embedded in the SubTypeCheck node, is it a pointer or the full data structure? > > `ciCallProfile` has fixed size and is passed by value. Embedding the whole structure inside `SubTypeCheck` doesn't look problematic. It refers to CI entities which should be kept alive for the duration of the compilation. Some `SubTypeCheck` nodes have no profile data associated with them. It doesn't seem right that all of them have to carry around `ciCallProfile`. That's maybe a small overhead at this point but we could profile more than 2 receivers in the future so that overhead could change. > I'd prefer to see `SubTypeCheck` to have control input which is explicitly relaxed to accommodate commoning. I don't how that would simplify things. `SubTypeCheck` is a `Cmp` node. As a consequence some optimizations that apply to `Cmp` nodes apply to it. `Cmp` nodes have no control. Setting the control to one of them will break things and require extra logic to accomodate the extra control (split if will likely break for instance). Beyond that, we'll need logic to find `SubTypeCheck` nodes that be commoned which I expect would be in the same locations I already extra code. I don't understand what the control edge would make simpler. > Overall, I'm fine with late expansion of profile-guided type checks for now, but embedding profile data info SubTypeCheck should significantly simplify the patch without compromising the benefits. At this point, I fail to see how. I'm not saying the solution I propose is great but it seems to me the problem it solves need to be solved in any case. > Also, enhancing profiling support separately may be a viable tradeoff as well. Inaccuracies in code shape classification don't look like a critical issue when the guards are introduced during macro expansion. Is that out of concern that getting the code done on all platforms will be too complicated? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1604428685 From epeter at openjdk.org Fri Jun 23 15:18:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 Jun 2023 15:18:14 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. I'm in the middle of reviewing, but have to end it here for the week now ? For now it's a lot of detail-feedback. I'll give a more overall-feedback once I'm done reading through, and reflecting on it. Still: this is good work. We will have to discuss the performance benefits vs the code complexity. And maybe we first need to refactor some things to reduce code duplication. But this looks much better than the previous post-loop vectorization. Have a great weekend, Emanuel ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1604429147 From jbhateja at openjdk.org Fri Jun 23 16:52:09 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 Jun 2023 16:52:09 GMT Subject: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation Message-ID: Backing out shuffle related overhaul done with [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw significant performance degradation with VectorAPI JMH and our internal benchmarks. Following two issues were filed on this recently. 1/ [JDK-8310459](https://bugs.openjdk.org/browse/JDK-8310459): We observed significant performance drop in VectorAPI slice / unslice performance w.r.t to JDK-20. 2/ [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373): Performance drop in Vector-API based kernel with JDK-21. A follow-up JBS [JDK-8310691](https://bugs.openjdk.org/browse/JDK-8310691) is created to address this in JDK-22. Best Regards, Jatin ------------- Commit messages: - 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation Changes: https://git.openjdk.org/jdk/pull/14629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14629&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310459 Stats: 3895 lines in 64 files changed: 1169 ins; 1819 del; 907 mod Patch: https://git.openjdk.org/jdk/pull/14629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14629/head:pull/14629 PR: https://git.openjdk.org/jdk/pull/14629 From aivanov at openjdk.org Fri Jun 23 16:56:10 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 23 Jun 2023 16:56:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> Message-ID: On Fri, 23 Jun 2023 14:30:49 GMT, Julian Waters wrote: >> To minimise the number of changes, we can go for using `jint` in `AwtMenu::GetItem`. >> >> What do you thing, @djelinski and @TheShermanTanker? > > Hmm, I lean towards jint as I feel it conveys the fact that it is a Java parameter clearer, intuitively to me it makes sense that a Java integer type would still work in a C++ for loop in native code You're right? it gives a hint it'll be an upcall into Java. Let's go for `jint` then. I don't think there's a need to change the type of the for-loop variable to `jint`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1240041248 From jwaters at openjdk.org Fri Jun 23 16:59:07 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 16:59:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <4XUr19lXHho95I9rleAZZXwyaIgZGL3pr77HqMRFCVQ=.de59b9ea-c7dd-4056-bebe-019c84a7ac54@github.com> Message-ID: On Fri, 23 Jun 2023 16:53:01 GMT, Alexey Ivanov wrote: >> Hmm, I lean towards jint as I feel it conveys the fact that it is a Java parameter clearer, intuitively to me it makes sense that a Java integer type would still work in a C++ for loop in native code > > You're right? it gives a hint it'll be an upcall into Java. Let's go for `jint` then. > > I don't think there's a need to change the type of the for-loop variable to `jint`. Oh no, I didn't mean to change the loop variable, rather that leaving the jint as is should be fine in the for loop ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1240044495 From jwaters at openjdk.org Fri Jun 23 17:09:12 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 17:09:12 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v12] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with three additional commits since the last revision: - Swap to jint in awt_MenuBar.h - Swap to jint in awt_MenuBar.cpp - Swap to jint in awt_Menu.h ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/16b5a914..d5f74cc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=10-11 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Fri Jun 23 17:09:14 2023 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 23 Jun 2023 17:09:14 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v11] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: <6LhofTh1RYX22niDjY1cPgArtX7E5Ghm4880ulFgUmI=.7326b3e2-8dd6-4e49-bf88-c91e4b93bbed@github.com> On Fri, 23 Jun 2023 02:38:13 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Revert wrong Copyright > - Copyright Alright, waiting for you to do the honours :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1604563651 From aivanov at openjdk.org Fri Jun 23 18:05:13 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 23 Jun 2023 18:05:13 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v12] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Fri, 23 Jun 2023 17:09:12 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with three additional commits since the last revision: > > - Swap to jint in awt_MenuBar.h > - Swap to jint in awt_MenuBar.cpp > - Swap to jint in awt_Menu.h Changes requested by aivanov (Reviewer). src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp line 1126: > 1124: > 1125: // Free bitmap buffers if they were allocated > 1126: if (colorBits != nullptr) { Let's revert `nullptr` to `NULL`. The `NULL` value is used consistently inside `_Win32ShellFolder2_getIconBits` function as well as through out the file, so `nullptr` is out of place. ------------- PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1495623168 PR Review Comment: https://git.openjdk.org/jdk/pull/14125#discussion_r1240121736 From vlivanov at openjdk.org Fri Jun 23 18:51:08 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 Jun 2023 18:51:08 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Mon, 19 Jun 2023 12:22:56 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - more test failures > - Merge branch 'master' into JDK-8308869 > - whitespaces > - test failures > - review > - 32 bit fix > - white spaces > - fix & test Overall, I'd prefer to leave commoning considerations for a separate enhancement. Embedding `ciCallProfile` looks to me much cleaner than exploding its content into node inputs. Having profile info explicitly fed into `SubTypeCheck` as node inputs in practice defeats any possible sharing unless the nodes are constructed from the very same profile data. The types, their order, and frequencies have to perfectly match in order for commoning to happen. You already have `IfNode::same_condition()` to alleviate some of the effects of broken sharing. When you embed profiling info you are left with a choice how to common nodes (whether to take profiling info into account or not). But if you simply ignore it until macro expansion, the behavior will stay the same as it is now. I prefer the patch to be focused on slow path case (reduce the frequency of secondary super cache checks & updates) and leave the rest for future considerations. As an example, it's still an open question for me should `IfNode::search_identical()` take profile info into account. Current patch ignores profile-related info (`IfNode::same_condition()` check), but maybe it is worth merging the profiles instead? > Some SubTypeCheck nodes have no profile data associated with them. I don't consider footprint as an issue here. `SubTypeCheck`s are relatively rare and `ciCallProfile` size is quite small for any practical morphism limits. Additional profiling may introduce more about 1-2 additional slots (rather than 10s or 100s) and the main footprint hit will be on runtime side (in MDOs). > Is that out of concern that getting the code done on all platforms will be too complicated? It does look like an excessive requirement, but I'm not too much concerned about it. If you think it's better to get the full support all at once, I'm perfectly fine with that. It just seems cleaner to refine profiling part separately. There are open questions which may be well out of scope for the proposed enhancement. For example, while `checkcast`/`aastore` behave very similarly to `invokevirtual`/`invokeinterface` (very low rate of failures), `instanceof` is different and can expose very high rates of failures (esp. in case of chained `instanceof` checks). Should we continue profiling for that? (Can C2 benefit from such info? I believe so: we could skip SSC check if failure rate is too high.) Also, I refrained from commenting on naming, but `ciCallProfile` does look confusing when it comes to `checkcast`, `aastore`, and `instanceof` cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1604712395 From dean.long at oracle.com Fri Jun 23 20:27:56 2023 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 23 Jun 2023 13:27:56 -0700 Subject: Question regarding ReplayCompiles and multiple inlining In-Reply-To: References: Message-ID: <4877cd8d-c7cd-8edc-1222-0bfde69eaaec@oracle.com> I think we need to file one. dl On 6/23/23 12:26 AM, Tobias Hartmann wrote: > Should we file an RFE for this or is this already tracked? > > Thanks, > Tobias > > On 22.06.23 08:35, dean.long at oracle.com wrote: >> I noticed this problem before too.? Unfortunately I can't think of a workaround.? It seems like the >> right fix is to change the replay file format to record more information. >> >> dl >> >> On 6/19/23 5:07 AM, Volker Simonis wrote: >>> Hi, >>> >>> I try to reproduce a compiler issue with a ReplayDataFile but >>> unfortunately I can't reproduce the crash. >>> >>> I hacked the VM to print out the the inlining tree just before the >>> crashes and realized that the original inlining differes from the >>> inlining done by ReplayCompiles. >>> >>> In my specific case I have the following inlining pattern during the >>> crash (`foo::f1()` gets inlined twice into `foo::f0() `): >>> ???? . >>> ???? . >>> ?? @ 57??? foo::f0()??? inline (hot) >>> ???? @ 48??? foo::f1()??? inline (hot) >>> ?????? @ 2??? bar::f2()??? inline (hot) >>> ???????? . >>> ???????? . >>> ???? @ 48??? foo::f1()??? inline (hot) >>> ?????? @ 2??? bar::f2()?? NodeCountInliningCutoff >>> >>> In the ReplayDataFile (in the `inline` part of the `compile` line) >>> both, `foo::f1()` and `bar::f2()` are recorded only once (because they >>> have the same? bci, name/signature and inlining depth). >>> >>> When running the replay, I get the following inlining pattern: >>> ???? . >>> ???? . >>> ?? @ 57??? foo::f0()??? force inline by ciReplay >>> ???? @ 48??? foo::f1()??? force inline by ciReplay >>> ?????? @ 2??? bar::f2()??? force inline by ciReplay >>> ???????? . >>> ???????? . >>> ???? @ 48??? foo::f1()??? force inline by ciReplay >>> ?????? @ 2??? bar::f2()??? force inline by ciReplay >>> >>> This is clearly different because in the replay we inline `bar::f2()` >>> a second time (while in the original run it was skipped due to >>> NodeCountInliningCutoff). >>> >>> ?From looking at `find_ciInlineRecord()` [1], it looks like the replay >>> file only records the bci, inlining depth and method name/signature >>> for an inlinee? How is this supposed to work if a method is inlined >>> differently at the same level like in this example? >>> >>> Notice that I'm currently working with JDK 17 (because my problem >>> doesn't reproduce with HEAD) but it seems the relevant code hasn't >>> changed much in this area since JDK 17. >>> >>> Please let me know if this is a known problem and if there's any way >>> to workaround it? >>> >>> Thank you and best regards, >>> Volker >>> >>> [1] https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp#L992 From duke at openjdk.org Fri Jun 23 21:14:20 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 23 Jun 2023 21:14:20 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v8] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: replace multiple intrinsics with one general intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/3bd12ec5..53a5309d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=06-07 Stats: 84 lines in 6 files changed: 12 ins; 40 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From vlivanov at openjdk.org Fri Jun 23 21:28:17 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 Jun 2023 21:28:17 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <3G-J7S82KT6w5oWaxYK-3soNIQDfcR65ESTRLA_LfDc=.bdde8aa7-4044-44de-9c01-951013d7707d@github.com> On Fri, 23 Jun 2023 15:00:15 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/opto/c2_globals.hpp line 473: >> >>> 471: " register allocation.") \ >>> 472: \ >>> 473: product(bool, ReduceAllocationMerges, true, \ >> >> I suggest to turn the flag into diagnostic one. There are much stricter requirements for product flags, so better to avoid introducing new ones. > > @iwanowww - I'm confused by what a "Diagnostic" flag is. According to [this documentation](https://wiki.openjdk.org/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process) "Diagnostic flags are not meant for VM tuning or for product modes. They are to be used for VM quality assurance or field diagnosis of VM bugs [...]" I believe the patch I'm proposing is a VM tuning optimization, so should it really be a diagnostic flag? Besides, I think we'll try _at a later moment_ to make this a product flag. Do you think an experimental flag is more appropriate? Thank you. You can look at it in the following way: since the flag is set to true by default, the feature is unconditionally available in product binaries. The only reason to explicitly specify the flag is to turn the optimization off and it may be needed to diagnose VM crashes or performance regressions. As an afterthrought, maybe C2 should check a compiler directive (and not a global flag) to be able to control the optimization up to per-method granularity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1240415979 From duke at openjdk.org Fri Jun 23 21:34:22 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 23 Jun 2023 21:34:22 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v9] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - merge master - replace multiple intrinsics with one general intrinsic - Merge branch 'openjdk:master' into avx512sort - fix license in one file - Update test/micro/org/openjdk/bench/java/util/ArraysSort.java Co-authored-by: Andrew Haley - fix license - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort - remove libstdc++ - 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) ------------- Changes: https://git.openjdk.org/jdk/pull/14227/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=08 Stats: 2889 lines in 18 files changed: 2880 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From jwaters at openjdk.org Sat Jun 24 01:29:26 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 24 Jun 2023 01:29:26 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v13] In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code Julian Waters has updated the pull request incrementally with two additional commits since the last revision: - Leave nullptr for another day - src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp Co-authored-by: Alexey Ivanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14125/files - new: https://git.openjdk.org/jdk/pull/14125/files/d5f74cc2..3fe2a894 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14125&range=11-12 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14125.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14125/head:pull/14125 PR: https://git.openjdk.org/jdk/pull/14125 From jwaters at openjdk.org Sat Jun 24 01:29:26 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 24 Jun 2023 01:29:26 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v12] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Fri, 23 Jun 2023 17:09:12 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with three additional commits since the last revision: > > - Swap to jint in awt_MenuBar.h > - Swap to jint in awt_MenuBar.cpp > - Swap to jint in awt_Menu.h Alright, done ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1605222563 From duke at openjdk.org Sun Jun 25 02:28:34 2023 From: duke at openjdk.org (Chang Peng) Date: Sun, 25 Jun 2023 02:28:34 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v4] In-Reply-To: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: > This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. > > VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. > > This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). > > Test: > All vector and vectorapi test passed. > > Performance: > The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. > > Following data is collected on a 128-bit Neon machine. > > Benchmark (inputs) Mode Before After Units > MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms > > [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() > [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 > [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- Chang Peng has updated the pull request incrementally with one additional commit since the last revision: Reset MaskQueryOperationsBenchmark.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14373/files - new: https://git.openjdk.org/jdk/pull/14373/files/62a6522c..8ce4ba84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14373&range=02-03 Stats: 196 lines in 1 file changed: 35 ins; 104 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/14373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14373/head:pull/14373 PR: https://git.openjdk.org/jdk/pull/14373 From duke at openjdk.org Sun Jun 25 02:48:17 2023 From: duke at openjdk.org (Chang Peng) Date: Sun, 25 Jun 2023 02:48:17 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v3] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: <7PXeowOvPKugvDUDBfnrzwPJ6gXPPlqjnNqSETO1h-Q=.7a335264-0b6d-48b7-aa7a-ed24b119264c@github.com> On Wed, 21 Jun 2023 14:25:57 GMT, Andrew Haley wrote: > Something is wrong with your setup. You should be seeing this: `# Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)` > > not this: `# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)` @theRealAph Sorry for the delay, I was on holiday last week. I found that we can avoid the effects of blackhole by using ``` -Djmh.blackhole.autoDetect=true ```, so I have reset this benchmark. Following is the performance of ``` testFirstTrueInt ``` and ``` testFirstTrueLong ``` before and after this patch: Benchmark bits (inputs) Mode Before After Units MaskQueryOperationsBenchmark.testFirstTrueInt 128 1 thrpt 520650.354 580091.081 ops/ms MaskQueryOperationsBenchmark.testFirstTrueInt 128 2 thrpt 520677.937 580391.661 ops/ms MaskQueryOperationsBenchmark.testFirstTrueInt 128 3 thrpt 519967.269 580705.642 ops/ms MaskQueryOperationsBenchmark.testFirstTrueLong 128 1 thrpt 518563.126 575941.490 ops/ms MaskQueryOperationsBenchmark.testFirstTrueLong 128 2 thrpt 517329.190 578848.383 ops/ms MaskQueryOperationsBenchmark.testFirstTrueLong 128 3 thrpt 517987.339 577601.752 ops/ms And following are the corresponding JMH output: before my patch: https://gist.github.com/changpeng1997/3ebe4b7beea93716d9f29d4ef71641af after my patch: https://gist.github.com/changpeng1997/d306957370eb0bdbb8e71b601440cdaa We can see the C2 code of firstTrue(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1605831986 From duke at openjdk.org Sun Jun 25 02:48:19 2023 From: duke at openjdk.org (Chang Peng) Date: Sun, 25 Jun 2023 02:48:19 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> <35V2f2S9VV-P24H4gtZCx_d7FD3qY4u10jW8e60xgOw=.5d3e8013-9914-4229-b8a7-dc81725ea5d7@github.com> Message-ID: <-GIXHLdCeX7NZq2ioZdQyAP0ijWbWpvj1EFrLGceRV4=.e6d5f7fa-c4fc-4b1b-aebc-357af144d8a8@github.com> On Wed, 21 Jun 2023 14:49:08 GMT, Aleksey Shipilev wrote: > > Output before this patch: https://gist.github.com/changpeng1997/734aa176577bfff56f5a87db9c8db69a > > Output after this patch: https://gist.github.com/changpeng1997/73098069b8f814310d6606dfd7dc56c5 > > Blackhole mode autodetection was added in JMH 1.33, and enabled in JMH 1.34. The logs above say they run with JMH 1.33. Current version is 1.36, you need to upgrade, @changpeng1997. > > Also, I notice that your before/after logs use different JVM modes, one uses `release`, and another uses `fastdebug`. These are not comparable. @shipilev Thanks! Sorry for this mistake. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14373#issuecomment-1605832732 From xgong at openjdk.org Sun Jun 25 03:15:19 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Sun, 25 Jun 2023 03:15:19 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> Message-ID: <9MxeDJvxUo6ibLhZY52TR1o4OAQKC_jxdIJKoeNzRdY=.dc4a4b9c-80ea-4aca-bd65-59a33a5521d4@github.com> On Wed, 21 Jun 2023 09:45:40 GMT, Xiaohong Gong wrote: >> @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? >> >> https://github.com/openjdk/jdk/blob/8d899925dc281c5dabbef14d85a6df807f8d300e/src/hotspot/cpu/x86/vm_version_x86.cpp#L954-L955 >> >> Can you do a similar thing in `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp` ? >> >> It would be nice not to have to check for the flag and the features in every test, but just for the features. And the features should depend on what is present on the hardware, minus the restrictions by the flags. > >> @XiaohongGong Thanks for looking into this. But it seems to me this is not the same approach as we are taking with x86 SSE and AVX, where the `UseAVX` and `UseSSE` flags affect both the VM features and also the `applyIfCPUFeature` from the IR framework. We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? >> >> https://github.com/openjdk/jdk/blob/8d899925dc281c5dabbef14d85a6df807f8d300e/src/hotspot/cpu/x86/vm_version_x86.cpp#L954-L955 >> >> Can you do a similar thing in `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp` ? >> >> It would be nice not to have to check for the flag and the features in every test, but just for the features. And the features should depend on what is present on the hardware, minus the restrictions by the flags. > > Thanks for looking at this PR @eme64 ! Yes, that's the main difference between aarch64 and x86 platforms. It actually makes things simpler that changing the CPU features based on the vm option. But per my understanding, CPU features are the hardware's feature which is the objective fact, while the `UseSVE` are the JVM's option that people can set different values. And they cannot be mixed. Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. I'm not sure, but I'm afraid doing such changes like x86 may have some risks in current aarch64's backend. > >> We check in all sorts of places for `sve` feature, so why do you now only change it for this particular test? > > For each SVE test, we have tried to add flag `UseSVE=1` in the test's `main` function to make sure this option is not changed by others, and current test is run with the expected sve feature. For example: > > public static void main(String[] args) { > TestFramework.runWithFlags("--add-modules=jdk.incubator.vector", > "-XX:UseSVE=1"); > } > > For this test, we cannot add such an option in the test file, since it is also used to test other platforms like x86. > @XiaohongGong I see, you are worried that it would take a lot of work in the aarch64 code? So in the backend you are using the UseSVE flag instead of feature support? Yes, that may need more overhead on AArch64 code. I think the changing should not only limit to `UseSVE` flag and the `sve` cpu feature, but also to all other flags. We have to keep the design consistent. And yes, for the current backend, we're almost using `UseSVE` flag instead of the cpu feature check. > test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.AND_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.AND_VS, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VL, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.AND_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.OR_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.OR_VB, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VL, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(counts = {IRNode.OR_VI, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) test/hotspot/jtreg/compiler/vectorapi/VectorLogicalOpIdentityTest.java: @IR(failOn = IRNode.XOR_VS, applyIfCPUFeatureAnd = {"asimd", "true", "sve", "false"}) test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopArrayIndexComputeTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx512dq", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx512dq", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/BasicLongOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "sse4.1", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/ArrayIndexFillTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"}, test/hotspot/jtreg/compiler/vectorization/runner/LoopReductionOpTest.java: @IR(applyIfCPUFeatureOr = {"sve", "true", "avx2", "true"} I'm afraid we have to modify all these tests, and what I know is my colleague @pfustc will take charge of the vectorization tests once this PR is merged. Besides, we have to only care about the rules that `sve` is `true` like `"@IR(applyIfCPUFeatureOr = {"sve", "true", ...}`. If the `sve` feature is `false`, `UseSVE` is always `0` in the VM, which is synced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1605839870 From pli at openjdk.org Sun Jun 25 07:05:04 2023 From: pli at openjdk.org (Pengfei Li) Date: Sun, 25 Jun 2023 07:05:04 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> Message-ID: <2es4SfAl2gbVRwe7NL9bMsi6Q_mTQyr0xsIOgjyJCjk=.cbdc06e0-7047-4094-9e35-8e6c40430ccf@github.com> On Wed, 21 Jun 2023 10:31:00 GMT, Emanuel Peter wrote: > I discussed it with @chhagedorn and @TobiHartmann . They also think it is better to have flags and cpu features in sync. Because the flags really should restrict what the VM uses everywhere. We want to make sure the restrictions apply in the VM, if we check for features or flags. Hi @eme64, Having flags and cpu features in sync sounds a good idea. However, of all platforms supported by HotSpot, only x86 does in this way. Even on x86, only AVX & SSE related features have such sync at present. There is no sync for other feature strings and flags, like `UseSHA`, `UseAES` and etc. That's the reason we think current approach of x86 is more like a workaround for IR tests only. If you believe keeping this kind of sync is a better approach, I'd suggest doing this for all CPU features on all platforms in another RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1605902739 From pli at openjdk.org Sun Jun 25 09:32:09 2023 From: pli at openjdk.org (Pengfei Li) Date: Sun, 25 Jun 2023 09:32:09 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 15:15:02 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > I'm in the middle of reviewing, but have to end it here for the week now ? > > For now it's a lot of detail-feedback. I'll give a more overall-feedback once I'm done reading through, and reflecting on it. > > Still: this is good work. We will have to discuss the performance benefits vs the code complexity. And maybe we first need to refactor some things to reduce code duplication. But this looks much better than the previous post-loop vectorization. > > Have a great weekend, > Emanuel Hi @eme64, Thank you so much for so many detailed suggestions. As it may take time to address all your comments, I would like say something about our general thoughts first corresponding to your preliminary feedback. - Regarding duplicated code, it is actually our biggest concern while refactoring the post loop vectorization out from SuperWord. That's also the reason we choose to reuse `SWPointer`. Your suggestion of moving `SWPointer` out is good. But for others, we cannot yet conclude that we should combine similar logics in `SuperWord` and `VectorMaskedLoop` at the moment. Current code does look duplicated as I referred SuperWord code while working on this patch. But their logics are not exactly the same and may diverge as we extend them in the future. For example, `VectorMaskedLoop` is supposed to be able to vectorize loops with some "if-else" conditions (SVE has the ability). If we try to add that support, we may need to change current RPO traversal in `VectorMaskedLoop` because the similar logic in SuperWord only supports single basic block loop. For the same reason, we may need to change the way of finding vector element types if we add vectorization support in `VectorMaskedLoop` for type conversions. Considering doing such code combination is involved, I hope to leave some duplication for now and decide whether to combine them later. - Regarding the "experimental", we agree that keeping a feature experimental for a long time is bad. Of course, we don't want this re-implementation to be abandoned several years later just like previous `PostLoopMultiversioning` due to lack of test. Feeling unsafe is not the only reason we want to keep it experimental. Today, there are various CPU hardwares that support vector masks (predicates) and their performance on masked vector operations are not consistent. The performance data we showed above are just from the latest generations of x86 and AArch64 CPUs. We are also testing on other hardwares and seeing different results. In the end, we may only enable this feature for some micro-architectures of CPUs where it's beneficial. So in the short term, we propose to keep it experimental and turned off by default. In addition, only a small portion of today's CPUs in the world can get performance benefit from vector masks. So we don't need to rush to make this non-experimental. I thin k one or two JDK release is a reasonable time period for such "experimental". - Regarding the performance, I apologize that I cannot answer all your questions at the moment because we just started the performance evaluation work. But we will get you back later once we get more conclusions. - Regarding the testing, adding more IR rules is on the way. But as you might have realized, this depends on some other on-going JBS tasks. And how to add new rules depends on the result of our discussions about syncing flags and cpu features in another PR. The original jtreg `TestRangeCheckEliminationDisabled.java` is too simple and just checks the compatibility of two VM options. So I suspect it is no longer necessary after `PostLoopMultiversioning` is removed. Thanks again for all your feedback. I will address the detailed comments one by one later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1605988026 From aph at openjdk.org Sun Jun 25 10:31:03 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 25 Jun 2023 10:31:03 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Fri, 23 Jun 2023 08:46:32 GMT, Emanuel Peter wrote: >> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> >> This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. >> >> Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > > @fg1417 @jatin-bhateja What do you think about the consistency of arm / intel hardware flags for `SVE` and `AVX`? > Thanks for looking at this PR @eme64 ! Yes, that's the main difference between aarch64 and x86 platforms. It actually makes things simpler that changing the CPU features based on the vm option. But per my understanding, CPU features are the hardware's feature which is the objective fact, while the `UseSVE` are the JVM's option that people can set different values. And they cannot be mixed. Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. I'm not sure, but I'm afraid doing such changes like x86 may have some risks in current aarch64's backend. It might, but it sounds like it's the right thing to do as soon as possible after the JDK 21 fork. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1606015394 From aph at openjdk.org Sun Jun 25 10:34:03 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 25 Jun 2023 10:34:03 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2es4SfAl2gbVRwe7NL9bMsi6Q_mTQyr0xsIOgjyJCjk=.cbdc06e0-7047-4094-9e35-8e6c40430ccf@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1TsQHR9tbJ1D4o7bFysmiVi427dU_5F_WBC3RdbQblY=.f0238315-2675-4e18-9e7f-88ff26aa2ae4@github.com> <_UD4dypm2ww27V8MgRev76u4bk32ws82WYsEoTXLV_o=.7982db64-a653-4a08-ba39-f8a579b7b7f9@github.com> <2es4SfAl2gbVRwe7NL9bMsi6Q_mTQyr0xsIOgjyJCjk=.cbdc06e0-7047-4094-9e35-8e6c40430ccf@github.com> Message-ID: On Sun, 25 Jun 2023 07:02:07 GMT, Pengfei Li wrote: > > I discussed it with @chhagedorn and @TobiHartmann . They also think it is better to have flags and cpu features in sync. Because the flags really should restrict what the VM uses everywhere. We want to make sure the restrictions apply in the VM, if we check for features or flags. I agree. > Having flags and cpu features in sync sounds a good idea. However, of all platforms supported by HotSpot, only x86 does in this way. Even on x86, only AVX & SSE related features have such sync at present. There is no sync for other feature strings and flags, like `UseSHA`, `UseAES` and etc. That's the reason we think current approach of x86 is more like a workaround for IR tests only. If you believe keeping this kind of sync is a better approach, I'd suggest doing this for all CPU features on all platforms in another RFE. That's an interesting suggestion, but having to keep all of the back ends in lock-step is an intolerable constraint. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1606016037 From jwaters at openjdk.org Sun Jun 25 15:57:08 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 25 Jun 2023 15:57:08 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 8 Jun 2023 11:20:05 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the code that is actually warning > > I'll take a look? hopefully next week. @aivanov-jdk Is the final change ok with you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1606140949 From aivanov at openjdk.org Sun Jun 25 16:27:07 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Sun, 25 Jun 2023 16:27:07 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v13] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Sat, 24 Jun 2023 01:29:26 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Leave nullptr for another day > - src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp > > Co-authored-by: Alexey Ivanov Marked as reviewed by aivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14125#pullrequestreview-1497128460 From aivanov at openjdk.org Sun Jun 25 16:27:10 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Sun, 25 Jun 2023 16:27:10 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v4] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> <3VAC9z-WjItzbkXeNreX7PGi18ypbaom-RjBJgHb9L4=.1e90295d-c012-47dd-b5fe-fa7889ce2c84@github.com> Message-ID: On Thu, 8 Jun 2023 11:20:05 GMT, Alexey Ivanov wrote: >> Julian Waters has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the code that is actually warning > > I'll take a look? hopefully next week. > @aivanov-jdk Is the final change ok with you? Looks good now. Thanks! I've run client tests, all is green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1606149816 From jwaters at openjdk.org Sun Jun 25 23:42:09 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 25 Jun 2023 23:42:09 GMT Subject: RFR: 8308780: Fix the Java Integer types on Windows [v13] In-Reply-To: References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Sat, 24 Jun 2023 01:29:26 GMT, Julian Waters wrote: >> On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Leave nullptr for another day > - src/java.desktop/windows/native/libawt/windows/ShellFolder2.cpp > > Co-authored-by: Alexey Ivanov Haha, thanks Alexsey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14125#issuecomment-1606329100 From jwaters at openjdk.org Sun Jun 25 23:46:17 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 25 Jun 2023 23:46:17 GMT Subject: Integrated: 8308780: Fix the Java Integer types on Windows In-Reply-To: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> References: <-mAb3g-jnRbOa0PFdM8gVP-8zI8CRGwxPvSdXd3uPm8=.2e6ab688-3ca1-4b8a-8674-5245c3f7557f@github.com> Message-ID: On Wed, 24 May 2023 13:56:05 GMT, Julian Waters wrote: > On Windows, the basic Java Integer types are defined as long and __int64 respectively. In particular, the former is rather problematic since it breaks compilation as the Visual C++ becomes stricter and more compliant with every release, which means the way Windows code treats long as a typedef for int is no longer correct, especially with -permissive- enabled. Instead of changing every piece of broken code to match the jint = long typedef, which is far too time consuming, we can instead change jint to an int (which is still the same 32 bit number type as long), as there are far fewer problems caused by this definition. It's better to get this over and done with sooner than later when a future version of Visual C++ finally starts to break on existing code This pull request has now been integrated. Changeset: c92b049d Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/c92b049db7853a061ce05cebdc1fd73205ed0c83 Stats: 28 lines in 10 files changed: 0 ins; 5 del; 23 mod 8308780: Fix the Java Integer types on Windows Reviewed-by: dholmes, djelinski, aivanov ------------- PR: https://git.openjdk.org/jdk/pull/14125 From xgong at openjdk.org Mon Jun 26 02:31:17 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 Jun 2023 02:31:17 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Fri, 23 Jun 2023 08:46:32 GMT, Emanuel Peter wrote: >> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> >> This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. >> >> Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > > @fg1417 @jatin-bhateja What do you think about the consistency of arm / intel hardware flags for `SVE` and `AVX`? > > Thanks for looking at this PR @eme64 ! Yes, that's the main difference between aarch64 and x86 platforms. It actually makes things simpler that changing the CPU features based on the vm option. But per my understanding, CPU features are the hardware's feature which is the objective fact, while the `UseSVE` are the JVM's option that people can set different values. And they cannot be mixed. Besides, x86 just mask off the CPU features for JVM instead of really changing the hardware's features. I'm not sure, but I'm afraid doing such changes like x86 may have some risks in current aarch64's backend. > > It might, but it sounds like it's the right thing to do as soon as possible after the JDK 21 fork. Thanks for looking at this issue @theRealAph. Sounds a good suggestion! I will take an investigation for this, and try to make a change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1606469933 From thartmann at openjdk.org Mon Jun 26 05:08:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jun 2023 05:08:02 GMT Subject: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: <1vmj0h4iaeclaAWNDkCrtUsF0iKVTHzdenMYBK4KNgI=.833866fa-4df2-49cc-a421-e191d97b52d2@github.com> On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw significant performance degradation in VectorAPI JMH micros and some of our internal benchmarks. Following two issues were filed on this recently. > > 1/ [JDK-8310459](https://bugs.openjdk.org/browse/JDK-8310459): Performance drop in VectorAPI slice / unslice performance w.r.t to JDK-20. > 2/ [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373): Performance drop in Vector-API based kernel with JDK-21. > > A follow-up JBS [JDK-8310691](https://bugs.openjdk.org/browse/JDK-8310691) is created to address this in JDK-22. > > Best Regards, > Jatin Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14629#pullrequestreview-1497576693 From eliu at openjdk.org Mon Jun 26 06:07:05 2023 From: eliu at openjdk.org (Eric Liu) Date: Mon, 26 Jun 2023 06:07:05 GMT Subject: RFR: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 [v4] In-Reply-To: References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Sun, 25 Jun 2023 02:28:34 GMT, Chang Peng wrote: >> This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. >> >> VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. >> >> This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). >> >> Test: >> All vector and vectorapi test passed. >> >> Performance: >> The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. >> >> Following data is collected on a 128-bit Neon machine. >> >> Benchmark (inputs) Mode Before After Units >> MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms >> MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms >> >> [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() >> [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 >> [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- > > Chang Peng has updated the pull request incrementally with one additional commit since the last revision: > > Reset MaskQueryOperationsBenchmark.java Marked as reviewed by eliu (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14373#pullrequestreview-1497629253 From epeter at openjdk.org Mon Jun 26 06:14:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jun 2023 06:14:14 GMT Subject: RFR: 8306922: IR verification fails because IR dump is chopped up [v3] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 08:02:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8306922 >> - removed unnecessary flags from TestVectorConditionalMove.java >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - renamed _st to _output >> - make sure CloneMap info is also printed to stream, and not to tty >> - add locker again, and refactor >> - write to xtty or tty >> - add tty locker again >> - buffer xtty and tty >> - 8306922: IR verification fails because IR dump is chopped up > > Thanks for the update, looks good! Thanks @chhagedorn for all the discussions and help! And thanks @TobiHartmann for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14591#issuecomment-1606714380 From epeter at openjdk.org Mon Jun 26 06:14:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jun 2023 06:14:16 GMT Subject: Integrated: 8306922: IR verification fails because IR dump is chopped up In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 14:43:18 GMT, Emanuel Peter wrote: > Some IR tests were failing because the logs were messed up (badly interleaved), due to multiple threads writing at the same time. > > The issue was in `Compile::print_ideal_ir`. While we did lock the `tty`, this lock was broken because the node dumping sometimes calls into the VM, which can trigger Safepoints, which in turn can explicitly break the locks, with `ttyLocker::break_tty_lock_for_safepoint`. > > What good is a lock if it can be broken? Not much. Though it is only broken if there is a Safepoint, so if we can avoid safepointing while holding the lock we should be safe. To verify that, we can write: > > > NoSafepointVerifier nsv; > ttyLocker ttyl; > > > The verifier triggered immediately, as expected. > And I now gather the whole dump into a separate `stringStream` first, so that all the Safepoints can happen before I even acquire the lock. Then, I acquire the log and just print the `stringStream` to `tty / xtty`, without triggering any Safepoint. > > We still need to acquire a lock, else other threads may be printing also, and we get a bad interleaving in the locks. > > I had to enable `dump_bfs` to print to an arbitrary stream, instead of just `tty`. > > I un-problemlisted `compiler/c2/irTests/TestVectorConditionalMove.java`, and fixed a CompileCommand issue for it. > > **Follow-up work** > > [JDK-8310712](https://bugs.openjdk.org/browse/JDK-8310712) C2: check for broken tty locks due to SafePoint > We should in general go through all uses of `ttyLocker`, and add a `NoSafepointVerifier` to ensure no such lock is broken. Or maybe we just build the verifier into the ttyLocker? > > [JDK-8310711](https://bugs.openjdk.org/browse/JDK-8310711) IR Framework: remove safepoint while printing handling > > **Testing** > > I ran `TestVectorConditionalMove.java` 1000x on master, just with the CompileCommand issue fixed. It triggered an IR issue 5 times (hit rate `0.005`). With the fix, it never triggers (`0.995^1000 = 0.007`). I also ran up to tier6 and stress-testing. This pull request has now been integrated. Changeset: 9057b350 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/9057b3503349ead7d995b1a705317324830eabb2 Stats: 173 lines in 8 files changed: 26 ins; 6 del; 141 mod 8306922: IR verification fails because IR dump is chopped up Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14591 From duke at openjdk.org Mon Jun 26 06:15:21 2023 From: duke at openjdk.org (sid8606) Date: Mon, 26 Jun 2023 06:15:21 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch. Message-ID: Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. Ran tier1 test cases passing with release, fastdebug and slowdebug. ------------- Commit messages: - 8309889: Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch Changes: https://git.openjdk.org/jdk/pull/14647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309889 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14647/head:pull/14647 PR: https://git.openjdk.org/jdk/pull/14647 From duke at openjdk.org Mon Jun 26 07:04:15 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 26 Jun 2023 07:04:15 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call Message-ID: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. Also generally null_check_receiver() should be combined with stopped(), which was not the case here. ------------- Commit messages: - 8307625: remove redundant calls of null_check_receiver() and replace them with an assert Changes: https://git.openjdk.org/jdk/pull/14542/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14542&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307625 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14542.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14542/head:pull/14542 PR: https://git.openjdk.org/jdk/pull/14542 From duke at openjdk.org Mon Jun 26 07:36:10 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 26 Jun 2023 07:36:10 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v2] In-Reply-To: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: > The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. > Also generally null_check_receiver() should be combined with stopped(), which was not the case here. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: 8307625: moved declaration of t inside the assert to avoid dead code in the product version. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14542/files - new: https://git.openjdk.org/jdk/pull/14542/files/a6292a6a..5763fde8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14542&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14542&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14542.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14542/head:pull/14542 PR: https://git.openjdk.org/jdk/pull/14542 From epeter at openjdk.org Mon Jun 26 08:39:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jun 2023 08:39:04 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v2] In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 12:58:23 GMT, Emanuel Peter wrote: >> Removed a spurious assert before optimization bailout. >> >> I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. >> >> I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. >> >> I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. >> Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > removed unnecessary flags from test Just saw a x86 (32bit) IR rule failure for `compiler.loopopts.superword.TestUnorderedReductionPartialVectorization.test1`. Will investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14494#issuecomment-1606977614 From epeter at openjdk.org Mon Jun 26 08:51:27 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jun 2023 08:51:27 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v3] In-Reply-To: References: Message-ID: > Removed a spurious assert before optimization bailout. > > I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. > > I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. > > I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. > Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: require 64 bit for test with OR_REDUCTION_V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14494/files - new: https://git.openjdk.org/jdk/pull/14494/files/53e8913e..056f00af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14494&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14494&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14494.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14494/head:pull/14494 PR: https://git.openjdk.org/jdk/pull/14494 From thartmann at openjdk.org Mon Jun 26 09:12:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jun 2023 09:12:05 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v2] In-Reply-To: References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: On Mon, 26 Jun 2023 07:36:10 GMT, Eric Nothum wrote: >> The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. >> Also generally null_check_receiver() should be combined with stopped(), which was not the case here. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > 8307625: moved declaration of t inside the assert to avoid dead code in the product version. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14542#pullrequestreview-1498024884 From jsjolen at openjdk.org Mon Jun 26 10:58:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 26 Jun 2023 10:58:05 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 07:03:49 GMT, Tobias Hartmann wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Be conservative in sizing to be close to original behavior > > src/hotspot/share/opto/reg_split.cpp line 575: > >> 573: // Keep track of DEFS & Phis for later passes >> 574: Node_List defs{split_arena, 8}; >> 575: Node_List phis{split_arena, 16}; > > Why do you use aggregate initialization instead of constructor invocation here? This will call the constructor, and has since C++11 I believe. However, I'm clearly being inconsistent here and with the VectorSet change above. There's no reason that I picked brace initializer other than it being the 'modern' way: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-list I'll revert this change and use regular parens to be more stylistically consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1242001442 From duke at openjdk.org Mon Jun 26 10:59:19 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 26 Jun 2023 10:59:19 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s Message-ID: The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. ------------- Commit messages: - Update README.md - change s to ms in README.md Changes: https://git.openjdk.org/jdk/pull/14649/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295191 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14649/head:pull/14649 PR: https://git.openjdk.org/jdk/pull/14649 From jsjolen at openjdk.org Mon Jun 26 11:04:09 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 26 Jun 2023 11:04:09 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 10:54:58 GMT, Johan Sj?len wrote: >> src/hotspot/share/opto/reg_split.cpp line 575: >> >>> 573: // Keep track of DEFS & Phis for later passes >>> 574: Node_List defs{split_arena, 8}; >>> 575: Node_List phis{split_arena, 16}; >> >> Why do you use aggregate initialization instead of constructor invocation here? > > This will call the constructor, and has since C++11 I believe. However, I'm clearly being inconsistent here and with the VectorSet change above. There's no reason that I picked brace initializer other than it being the 'modern' way: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-list > > I'll revert this change and use regular parens to be more stylistically consistent. Looking at https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md "uniform initialization", it's OK to use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1242007985 From chagedorn at openjdk.org Mon Jun 26 11:13:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jun 2023 11:13:02 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 08:37:31 GMT, Eric Nothum wrote: > The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. I think it's better to actually change the flags to accept seconds instead of milliseconds. I don't think we need a more fine-grained control than seconds and it makes it easier to use these flags. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14649#issuecomment-1607237024 From thartmann at openjdk.org Mon Jun 26 11:43:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jun 2023 11:43:02 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 11:01:11 GMT, Johan Sj?len wrote: >> This will call the constructor, and has since C++11 I believe. However, I'm clearly being inconsistent here and with the VectorSet change above. There's no reason that I picked brace initializer other than it being the 'modern' way: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Res-list >> >> I'll revert this change and use regular parens to be more stylistically consistent. > > Looking at https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md "uniform initialization", it's OK to use. Thanks for the background, I wasn't aware of that. I don't have a strong opinion but consistency in the same area would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1242063030 From duke at openjdk.org Mon Jun 26 13:05:13 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 26 Jun 2023 13:05:13 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v3] In-Reply-To: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: > The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. > Also generally null_check_receiver() should be combined with stopped(), which was not the case here. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: 8307625: adding null_check_receiver() for the uninitialized case, as else only argument(1) is null checked ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14542/files - new: https://git.openjdk.org/jdk/pull/14542/files/5763fde8..47c05fd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14542&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14542&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14542.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14542/head:pull/14542 PR: https://git.openjdk.org/jdk/pull/14542 From chagedorn at openjdk.org Mon Jun 26 13:07:06 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 Jun 2023 13:07:06 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v3] In-Reply-To: References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: On Mon, 26 Jun 2023 13:05:13 GMT, Eric Nothum wrote: >> The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. >> Also generally null_check_receiver() should be combined with stopped(), which was not the case here. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > 8307625: adding null_check_receiver() for the uninitialized case, as else only argument(1) is null checked Thanks for adding the null check as discussed offline together with @TobiHartmann. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14542#pullrequestreview-1498475996 From gcao at openjdk.org Mon Jun 26 13:15:06 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 26 Jun 2023 13:15:06 GMT Subject: RFR: 8310192: RISC-V: Merge vector min & max instructs with similar match rules In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 10:45:03 GMT, Ludovic Henry wrote: >> Hi, We merged vector min and max instructions with similar matching rules in this PR, and modified some comments of the copy_memory function in stubGenerator_riscv.cpp. >> We can use Float256VectorTests.java Double256VectorTests.java to emit vmax_fp/vmin_fp nodes and the compilation log is as follows: >> >> 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 >> 13e loadV V2, [R17] # vector (rvv) >> 146 vmax_fp V3, V1, V2 >> 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 >> >> >> 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 >> 13e loadV V2, [R17] # vector (rvv) >> 146 vmin_fp V3, V1, V2 >> 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 >> >> Please take a look and have some reviews. Thanks a lot. >> >> ## Testing: >> - [x] Tier1 tests (release) >> - [x] Tier2 tests (release) >> - [x] Tier3 tests (release) >> - [x] test/jdk/jdk/incubator/vector (fastdebug) > > Marked as reviewed by luhenry (Committer). @luhenry @RealFYang Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14510#issuecomment-1607444898 From gcao at openjdk.org Mon Jun 26 13:23:16 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 26 Jun 2023 13:23:16 GMT Subject: Integrated: 8310192: RISC-V: Merge vector min & max instructs with similar match rules In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 09:27:39 GMT, Gui Cao wrote: > Hi, We merged vector min and max instructions with similar matching rules in this PR, and modified some comments of the copy_memory function in stubGenerator_riscv.cpp. > We can use Float256VectorTests.java Double256VectorTests.java to emit vmax_fp/vmin_fp nodes and the compilation log is as follows: > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmax_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > > 13e B22: # out( B50 B23 ) <- in( B21 ) Freq: 76.431 > 13e loadV V2, [R17] # vector (rvv) > 146 vmin_fp V3, V1, V2 > 15e bgeu R9, R13, B50 #@cmpU_branch P=0.000001 C=-1.000000 > > Please take a look and have some reviews. Thanks a lot. > > ## Testing: > - [x] Tier1 tests (release) > - [x] Tier2 tests (release) > - [x] Tier3 tests (release) > - [x] test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: 24abd105 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/24abd1054e236118877860dd8b89d52d499c261b Stats: 134 lines in 4 files changed: 12 ins; 97 del; 25 mod 8310192: RISC-V: Merge vector min & max instructs with similar match rules Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14510 From thartmann at openjdk.org Mon Jun 26 14:16:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 Jun 2023 14:16:05 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v3] In-Reply-To: References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: On Mon, 26 Jun 2023 13:05:13 GMT, Eric Nothum wrote: >> The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. >> Also generally null_check_receiver() should be combined with stopped(), which was not the case here. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > 8307625: adding null_check_receiver() for the uninitialized case, as else only argument(1) is null checked Just FTR: The intrinsified `Unsafe::allocateUninitializedArray0` can currently not be called with a null receiver because it's private and only ever called from `Unsafe::allocateUninitializedArray` which would throw a NPE. But to be on the safe side in case this ever changes, we should emit a null check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14542#issuecomment-1607583619 From epeter at openjdk.org Mon Jun 26 14:59:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 Jun 2023 14:59:25 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <-UengrhToQL0qKVGetApNHkjRfUPMo8pEte_gtvCK5g=.b9b70067-9a40-445a-b37b-6a4ddee35be5@github.com> On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. A few more comments, did not have much time today. More tomorrow ;) src/hotspot/share/opto/vmaskloop.cpp line 403: > 401: int opc = node->Opcode(); > 402: BasicType bt = elem_bt(node); > 403: int vlen = Matcher::max_vector_size(bt); Theoretically, different `bt` can have different `Matcher::vector_width_in_bytes`. So `vlen` would not always correspond to `MaxVectorSize / element_size`. It just means that here you would end up checking for a shorter length than maybe expected? But maybe that is intended, it depends on how you generate the nodes later. src/hotspot/share/opto/vmaskloop.cpp line 424: > 422: int vopc = 0; > 423: if (node->is_Mem()) { > 424: vopc = node->is_Store() ? Op_StoreVectorMasked : Op_LoadVectorMasked; Mabye just for good measure: add an assert that it can only be a Load or a Store. src/hotspot/share/opto/vmaskloop.cpp line 429: > 427: } > 428: if (vopc == 0 || > 429: !Matcher::match_rule_supported_vector_masked(vopc, vlen, bt)) { Do all nodes need to be maskable? Or is it enough if only load/store are maskable? src/hotspot/share/opto/vmaskloop.cpp line 442: > 440: // nodes to bail out for complex loops > 441: bool VectorMaskedLoop::analyze_loop_body_nodes() { > 442: VectorSet tracked(_arena); This is probably a good case where you could use `ResourceMark rm;` and just put the `VectorSet` on the default resource arena. src/hotspot/share/opto/vmaskloop.cpp line 465: > 463: for (int idx = 0; idx < n_nodes; idx++) { > 464: Node* node = _body_nodes.at(idx); > 465: if ((node->is_Mem() && node->as_Mem()->is_Store())) { Suggestion: if ((node->is_Mem() && node->is_Store())) { src/hotspot/share/opto/vmaskloop.cpp line 474: > 472: if (!in_body(out)) { > 473: trace_msg(node, "Node has out-of-loop user found"); > 474: return false; Can this be handled in the future with a extract node? I guess you would have to extract it from a variable element, as the last iteration is not always the same. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14581#pullrequestreview-1497718513 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241683613 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241665519 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1241668030 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242324850 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242330208 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1242333503 From volker.simonis at gmail.com Mon Jun 26 15:05:52 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 26 Jun 2023 17:05:52 +0200 Subject: Question regarding ReplayCompiles and multiple inlining In-Reply-To: <4877cd8d-c7cd-8edc-1222-0bfde69eaaec@oracle.com> References: <4877cd8d-c7cd-8edc-1222-0bfde69eaaec@oracle.com> Message-ID: Thanks for the confirmation of the problem. I've created "8310899:cProvide more accurate inlining information in ReplayDataFile" [1]. [1] https://bugs.openjdk.org/browse/JDK-8310899 On Fri, Jun 23, 2023 at 10:28?PM wrote: > > I think we need to file one. > > dl > > On 6/23/23 12:26 AM, Tobias Hartmann wrote: > > Should we file an RFE for this or is this already tracked? > > > > Thanks, > > Tobias > > > > On 22.06.23 08:35, dean.long at oracle.com wrote: > >> I noticed this problem before too. Unfortunately I can't think of a workaround. It seems like the > >> right fix is to change the replay file format to record more information. > >> > >> dl > >> > >> On 6/19/23 5:07 AM, Volker Simonis wrote: > >>> Hi, > >>> > >>> I try to reproduce a compiler issue with a ReplayDataFile but > >>> unfortunately I can't reproduce the crash. > >>> > >>> I hacked the VM to print out the the inlining tree just before the > >>> crashes and realized that the original inlining differes from the > >>> inlining done by ReplayCompiles. > >>> > >>> In my specific case I have the following inlining pattern during the > >>> crash (`foo::f1()` gets inlined twice into `foo::f0() `): > >>> . > >>> . > >>> @ 57 foo::f0() inline (hot) > >>> @ 48 foo::f1() inline (hot) > >>> @ 2 bar::f2() inline (hot) > >>> . > >>> . > >>> @ 48 foo::f1() inline (hot) > >>> @ 2 bar::f2() NodeCountInliningCutoff > >>> > >>> In the ReplayDataFile (in the `inline` part of the `compile` line) > >>> both, `foo::f1()` and `bar::f2()` are recorded only once (because they > >>> have the same bci, name/signature and inlining depth). > >>> > >>> When running the replay, I get the following inlining pattern: > >>> . > >>> . > >>> @ 57 foo::f0() force inline by ciReplay > >>> @ 48 foo::f1() force inline by ciReplay > >>> @ 2 bar::f2() force inline by ciReplay > >>> . > >>> . > >>> @ 48 foo::f1() force inline by ciReplay > >>> @ 2 bar::f2() force inline by ciReplay > >>> > >>> This is clearly different because in the replay we inline `bar::f2()` > >>> a second time (while in the original run it was skipped due to > >>> NodeCountInliningCutoff). > >>> > >>> From looking at `find_ciInlineRecord()` [1], it looks like the replay > >>> file only records the bci, inlining depth and method name/signature > >>> for an inlinee? How is this supposed to work if a method is inlined > >>> differently at the same level like in this example? > >>> > >>> Notice that I'm currently working with JDK 17 (because my problem > >>> doesn't reproduce with HEAD) but it seems the relevant code hasn't > >>> changed much in this area since JDK 17. > >>> > >>> Please let me know if this is a known problem and if there's any way > >>> to workaround it? > >>> > >>> Thank you and best regards, > >>> Volker > >>> > >>> [1] https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp#L992 From never at openjdk.org Mon Jun 26 16:47:03 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 26 Jun 2023 16:47:03 GMT Subject: RFR: JDK-8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:26:19 GMT, David Leopoldseder wrote: > Fix JVMCI handling of null dynamic constants and dynamic constant errors. > > For null dynamic constants, the JVMCI code was checking for `Universe::the_null_sentinel` but `ConstantPool::resolve_possibly_cached_constant_at` returns `nullptr` for null constants. > > In the case of errors thrown by a bootstrap method, `ConstantPool::resolve_possibly_cached_constant_at` already propagates the `BootstrapMethodError` so there's no need to check for `tag.is_dynamic_constant_in_error()`. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14582#pullrequestreview-1498969428 From sviswanathan at openjdk.org Mon Jun 26 17:33:03 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 26 Jun 2023 17:33:03 GMT Subject: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: <2X2ET6_aDFY_R_AsCbN0Y7YHXUyVRs9WoirObnV-zrs=.d1bce1c7-9c94-4224-9d3e-0c828276aa38@github.com> On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw significant performance degradation in VectorAPI JMH micros and some of our internal benchmarks. Following two issues were filed on this recently. > > 1/ [JDK-8310459](https://bugs.openjdk.org/browse/JDK-8310459): Performance drop in VectorAPI slice / unslice performance w.r.t to JDK-20. > 2/ [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373): Performance drop in Vector-API based kernel with JDK-21. > > A follow-up JBS [JDK-8310691](https://bugs.openjdk.org/browse/JDK-8310691) is created to address this in JDK-22. > > Best Regards, > Jatin Marked as reviewed by sviswanathan (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14629#pullrequestreview-1499048603 From jbhateja at openjdk.org Mon Jun 26 18:38:12 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 Jun 2023 18:38:12 GMT Subject: Integrated: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw significant performance degradation in VectorAPI JMH micros and some of our internal benchmarks. Following two issues were filed on this recently. > > 1/ [JDK-8310459](https://bugs.openjdk.org/browse/JDK-8310459): Performance drop in VectorAPI slice / unslice performance w.r.t to JDK-20. > 2/ [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373): Performance drop in Vector-API based kernel with JDK-21. > > A follow-up JBS [JDK-8310691](https://bugs.openjdk.org/browse/JDK-8310691) is created to address this in JDK-22. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: ff9a7541 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/ff9a7541097bd853306a8594c97774f36877a0f9 Stats: 3895 lines in 64 files changed: 1169 ins; 1819 del; 907 mod 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation Reviewed-by: thartmann, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/14629 From kvn at openjdk.org Mon Jun 26 19:37:07 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 26 Jun 2023 19:37:07 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 11:40:20 GMT, Tobias Hartmann wrote: >> Looking at https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md "uniform initialization", it's OK to use. > > Thanks for the background, I wasn't aware of that. I don't have a strong opinion but consistency in the same area would be nice. Please, change to normal `()`. Using '{}' is very confusing for not modern C++ experts and affects maintainability of this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1242671748 From haosun at openjdk.org Tue Jun 27 01:16:23 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 27 Jun 2023 01:16:23 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 02:55:31 GMT, Hao Sun wrote: >> `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. >> >> As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: >> >> >> JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value >> Option 'UseSHA3Intrinsics' should be enabled by default >> >> >> The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. >> >> Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. >> >> Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. >> >> Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. >> >> [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Add comment Ping? Can anyone else help review this patch? Thanks in advance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14551#issuecomment-1608557214 From fgao at openjdk.org Tue Jun 27 03:18:21 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 27 Jun 2023 03:18:21 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v2] In-Reply-To: References: Message-ID: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files - Merge branch 'master' into fg8308340 - 8308340: C2: Idealize Fma nodes Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14576/files - new: https://git.openjdk.org/jdk/pull/14576/files/8239531e..a22814d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=00-01 Stats: 8127 lines in 396 files changed: 3853 ins; 1754 del; 2520 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From fgao at openjdk.org Tue Jun 27 03:20:15 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 27 Jun 2023 03:20:15 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 18:10:26 GMT, Vladimir Kozlov wrote: > Check for `UseFMA` should be moved from `c2compiler.cpp` to `Matcher::match_rule_supported` in `.ad` files. I see we have such check for Fma vectors in `x86.ad` but not for scalars. Similar issue exist for other platforms. @vnkozlov Thanks for your review! I updated it in the latest commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1608699766 From fyang at openjdk.org Tue Jun 27 03:34:14 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Jun 2023 03:34:14 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: References: Message-ID: <-poSjVbiuBUNsibn9gs6i27icIiE398A3SGWwvVgwPI=.b5c1dfc3-c256-4c27-beb7-3151fa6e988f@github.com> On Wed, 21 Jun 2023 02:55:31 GMT, Hao Sun wrote: >> `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. >> >> As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: >> >> >> JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value >> Option 'UseSHA3Intrinsics' should be enabled by default >> >> >> The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. >> >> Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. >> >> Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. >> >> Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. >> >> [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Add comment LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14551#pullrequestreview-1499881688 From haosun at openjdk.org Tue Jun 27 03:59:02 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 27 Jun 2023 03:59:02 GMT Subject: RFR: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 [v2] In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:17:27 GMT, Andrew Haley wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > Marked as reviewed by aph (Reviewer). Thanks for your reviews, @theRealAph and @RealFYang Two tests in GHA failed because it failed to "Get bundles". I suppose it's due to network issue. I tried to re-run the failed tests several times but it still didn't pass. I don't think it's related to this patch. Hence, I will integrate it tomorrow if there is no more comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14551#issuecomment-1608753810 From duke at openjdk.org Tue Jun 27 04:13:27 2023 From: duke at openjdk.org (Chang Peng) Date: Tue, 27 Jun 2023 04:13:27 GMT Subject: Integrated: 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 In-Reply-To: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> References: <36ZTwqFZNbTRc5Xuc7LmVCoMJiueIp-lMgY0LMqczUA=.1841f16a-9104-4c9a-bbf0-f73ef01786a2@github.com> Message-ID: On Thu, 8 Jun 2023 02:44:08 GMT, Chang Peng wrote: > This patch optimizes VectorMask.firstTrue() on Neon when there are 2 or 4 elements in vector registers. > > VectorMask.firstTrue() should return VLEGNTH when vector mask is all false [1]. Current implementation uses rbit and then clz [2] to count leading zeros, then uses csel [3] (conditional select) to get the smaller value between VLENGTH and the number of unset lanes to ensure correctness. > > This patch sets the 16th or 32nd bit as 1, when there are only 2 or 4 elements in boolean masks, before rbit and clz. With this trick, maximum value calculated in such case will be VLENGTH (2 or 4). > > Test: > All vector and vectorapi test passed. > > Performance: > The benchmark functions are in MaskQueryOperationsBenchmark.java [4]. This patch also modifies above benchmark to measure mask operations' performance more effectively. > > Following data is collected on a 128-bit Neon machine. > > Benchmark (inputs) Mode Before After Units > MaskQueryOperationsBenchmark.testFirstTrueInt 1 thrpt 5952.670 7298.491 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 2 thrpt 5951.513 7297.620 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueInt 3 thrpt 5953.048 7298.072 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 1 thrpt 3496.990 4003.188 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 2 thrpt 3497.755 4002.577 ops/ms > MaskQueryOperationsBenchmark.testFirstTrueLong 3 thrpt 3500.085 4002.471 ops/ms > > [1]: https://docs.oracle.com/en/java/javase/20/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#firstTrue() > [2]: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L5540 > [3]: https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/CSEL--Conditional-Select- This pull request has now been integrated. Changeset: 45b581b7 Author: changpeng1997 Committer: Eric Liu URL: https://git.openjdk.org/jdk/commit/45b581b7d53a3181be0e3f324d599797981f530f Stats: 84 lines in 2 files changed: 14 ins; 58 del; 12 mod 8309583: AArch64: Optimize firstTrue() when amount of elements < 8 Reviewed-by: aph, eliu ------------- PR: https://git.openjdk.org/jdk/pull/14373 From dholmes at openjdk.org Tue Jun 27 06:19:49 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jun 2023 06:19:49 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" Message-ID: This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. Testing so far is Aarch64 only: - Tiers 1-3 - 50x the closed stackoverflow test that failed previously As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. Thanks. ------------- Commit messages: - 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" Changes: https://git.openjdk.org/jdk/pull/14669/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309209 Stats: 9 lines in 2 files changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From aph at openjdk.org Tue Jun 27 06:44:06 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 Jun 2023 06:44:06 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:12:47 GMT, David Holmes wrote: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14669#pullrequestreview-1500047562 From duke at openjdk.org Tue Jun 27 06:56:06 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 27 Jun 2023 06:56:06 GMT Subject: RFR: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs [v3] In-Reply-To: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> References: <9mkVs8-I1sj8D2pXeMDtCgdhMuHNMKhLcByn4GBv7Tc=.7f6c0e29-f2d6-4d9f-aa5b-4eacf16e4fde@github.com> Message-ID: On Tue, 20 Jun 2023 10:21:51 GMT, Daniel Skantz wrote: >> ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. >> >> Testing: tier1-tier3. >> >> Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. >> Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. >> >> Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. >> >> ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test #iterations Thanks Roberto and Tobias for review. Thanks Roberto for additional help with PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14492#issuecomment-1608899765 From tobias.hartmann at oracle.com Tue Jun 27 07:03:12 2023 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 27 Jun 2023 09:03:12 +0200 Subject: Question regarding ReplayCompiles and multiple inlining In-Reply-To: References: <4877cd8d-c7cd-8edc-1222-0bfde69eaaec@oracle.com> Message-ID: <897af8d4-ca76-71c4-2bf5-3351d9a7afdb@oracle.com> Thanks, Volker. Do you have a reproducer that you could share in the bug report? Best regards, Tobias On 26.06.23 17:05, Volker Simonis wrote: > Thanks for the confirmation of the problem. > > I've created "8310899:cProvide more accurate inlining information in > ReplayDataFile" [1]. > > [1] https://bugs.openjdk.org/browse/JDK-8310899 > > On Fri, Jun 23, 2023 at 10:28?PM wrote: >> >> I think we need to file one. >> >> dl >> >> On 6/23/23 12:26 AM, Tobias Hartmann wrote: >>> Should we file an RFE for this or is this already tracked? >>> >>> Thanks, >>> Tobias >>> >>> On 22.06.23 08:35, dean.long at oracle.com wrote: >>>> I noticed this problem before too. Unfortunately I can't think of a workaround. It seems like the >>>> right fix is to change the replay file format to record more information. >>>> >>>> dl >>>> >>>> On 6/19/23 5:07 AM, Volker Simonis wrote: >>>>> Hi, >>>>> >>>>> I try to reproduce a compiler issue with a ReplayDataFile but >>>>> unfortunately I can't reproduce the crash. >>>>> >>>>> I hacked the VM to print out the the inlining tree just before the >>>>> crashes and realized that the original inlining differes from the >>>>> inlining done by ReplayCompiles. >>>>> >>>>> In my specific case I have the following inlining pattern during the >>>>> crash (`foo::f1()` gets inlined twice into `foo::f0() `): >>>>> . >>>>> . >>>>> @ 57 foo::f0() inline (hot) >>>>> @ 48 foo::f1() inline (hot) >>>>> @ 2 bar::f2() inline (hot) >>>>> . >>>>> . >>>>> @ 48 foo::f1() inline (hot) >>>>> @ 2 bar::f2() NodeCountInliningCutoff >>>>> >>>>> In the ReplayDataFile (in the `inline` part of the `compile` line) >>>>> both, `foo::f1()` and `bar::f2()` are recorded only once (because they >>>>> have the same bci, name/signature and inlining depth). >>>>> >>>>> When running the replay, I get the following inlining pattern: >>>>> . >>>>> . >>>>> @ 57 foo::f0() force inline by ciReplay >>>>> @ 48 foo::f1() force inline by ciReplay >>>>> @ 2 bar::f2() force inline by ciReplay >>>>> . >>>>> . >>>>> @ 48 foo::f1() force inline by ciReplay >>>>> @ 2 bar::f2() force inline by ciReplay >>>>> >>>>> This is clearly different because in the replay we inline `bar::f2()` >>>>> a second time (while in the original run it was skipped due to >>>>> NodeCountInliningCutoff). >>>>> >>>>> From looking at `find_ciInlineRecord()` [1], it looks like the replay >>>>> file only records the bci, inlining depth and method name/signature >>>>> for an inlinee? How is this supposed to work if a method is inlined >>>>> differently at the same level like in this example? >>>>> >>>>> Notice that I'm currently working with JDK 17 (because my problem >>>>> doesn't reproduce with HEAD) but it seems the relevant code hasn't >>>>> changed much in this area since JDK 17. >>>>> >>>>> Please let me know if this is a known problem and if there's any way >>>>> to workaround it? >>>>> >>>>> Thank you and best regards, >>>>> Volker >>>>> >>>>> [1] https://urldefense.com/v3/__https://github.com/openjdk/jdk17u-dev/blob/852c26c0/src/hotspot/share/ci/ciReplay.cpp*L992__;Iw!!ACWV5N9M2RV99hQ!MGIABONonnI2VqUCozMO-Yc9QZsz68ExBXIoxLS0QJ3pyI_nt2cm-PbE-M-d1mYyoLfC3KVzspXCwpSR6vWltPVUJAAm$ From duke at openjdk.org Tue Jun 27 07:09:16 2023 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 27 Jun 2023 07:09:16 GMT Subject: Integrated: 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 11:13:02 GMT, Daniel Skantz wrote: > ShortLoopOptimizer might lift instructions before their inputs on some graph shapes. We propose adding a check that the insertion point for an instruction that is a candidate for hoisting should not be higher up the dominator tree than any inputs to the instruction. > > Testing: tier1-tier3. > > Additional testing: observed that `(cur_invariant && !v.is_valid())` never occurs on tier1-tier3 before the added test case. > Also verified that the depth check is equivalent to `(*vp->block() == _insert->block()) || dominates(*vp, _insert)` on all of tier1-tier3. > > Failure case: in the attached image the `arraylength` instruction from B10 is lifted to B0, as the dominator of B10 is calculated as B0. This is based on the logic in [`ComputeLinearScanOrder::compute_dominator_impl`](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/c1/c1_IR.cpp#L801). But the array input is in Block 3. This is later spotted in `c1_LIRAssembler.cpp` with `Error: ShouldNotReachHere()`. We can reproduce the error on other instructions too -- the reader may refer to the test case provided. > > ![image](https://github.com/openjdk/jdk/assets/111436254/9130516c-073d-45d1-a38a-af776e5e6672) This pull request has now been integrated. Changeset: 73d7aa1d Author: Daniel Skantz Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/73d7aa1d2cb037fed69263a1990258866333664d Stats: 144 lines in 2 files changed: 143 ins; 0 del; 1 mod 8301489: C1: ShortLoopOptimizer might lift instructions before their inputs Reviewed-by: thartmann, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/14492 From dholmes at openjdk.org Tue Jun 27 07:20:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jun 2023 07:20:03 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:41:01 GMT, Andrew Haley wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > Marked as reviewed by aph (Reviewer). Thanks for the review @theRealAph ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1608932528 From thartmann at openjdk.org Tue Jun 27 07:32:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 27 Jun 2023 07:32:03 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:12:47 GMT, David Holmes wrote: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. Looks good to me as well. Maybe @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32) could help with the implementation / testing on the other platforms. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14669#pullrequestreview-1500134198 From davleopo at openjdk.org Tue Jun 27 08:33:12 2023 From: davleopo at openjdk.org (David Leopoldseder) Date: Tue, 27 Jun 2023 08:33:12 GMT Subject: Integrated: JDK-8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:26:19 GMT, David Leopoldseder wrote: > Fix JVMCI handling of null dynamic constants and dynamic constant errors. > > For null dynamic constants, the JVMCI code was checking for `Universe::the_null_sentinel` but `ConstantPool::resolve_possibly_cached_constant_at` returns `nullptr` for null constants. > > In the case of errors thrown by a bootstrap method, `ConstantPool::resolve_possibly_cached_constant_at` already propagates the `BootstrapMethodError` so there's no need to check for `tag.is_dynamic_constant_in_error()`. This pull request has now been integrated. Changeset: 15878360 Author: David Leopoldseder Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/15878360bf22c88a6e4038f05efa6db08d72b309 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/14582 From fyang at openjdk.org Tue Jun 27 09:23:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 Jun 2023 09:23:03 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:12:47 GMT, David Holmes wrote: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 770: > 768: ld(t0, Address(rthread, JavaThread::stack_guard_state_offset())); > 769: beq(t0, StackOverflow::stack_guard_enabled, no_reserved_zone_enabling); > 770: Hi, Thanks for handling RISC-V. But this won't build on RISC-V. We would need something like: diff --git a/src/hotspot/cpu/riscv/interp_masm_riscv.cpp b/src/hotspot/cpu/riscv/interp_masm_riscv.cpp index edec2e08c83..99906c0fea6 100644 --- a/src/hotspot/cpu/riscv/interp_masm_riscv.cpp +++ b/src/hotspot/cpu/riscv/interp_masm_riscv.cpp @@ -764,6 +764,11 @@ void InterpreterMacroAssembler::remove_activation( // testing if reserved zone needs to be re-enabled Label no_reserved_zone_enabling; + // check if already enabled - if so no re-enabling needed + ld(t0, Address(xthread, JavaThread::stack_guard_state_offset())); + sub(t0, t0, StackOverflow::stack_guard_enabled); + beqz(t0, no_reserved_zone_enabling); + ld(t0, Address(xthread, JavaThread::reserved_stack_activation_offset())); ble(t1, t0, no_reserved_zone_enabling); ------------- PR Review: https://git.openjdk.org/jdk/pull/14669#pullrequestreview-1500358057 PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1243413208 From roland at openjdk.org Tue Jun 27 10:11:18 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jun 2023 10:11:18 GMT Subject: RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 Message-ID: The crash happens after the following steps: 1- pre/main/post loops are created with assert predicates above the main loop. 2- the main loop is peeled 3- as a consequence, the `OpaqueZeroTripGuard` for the main loop is removed 4- That allows narrowing of the type of the CastII that was added right after the zero trip guard during pre/main/post loops creation 5- The CastII feeds into a range check CastII for the peeled iteration that becomes top because the narrowed type of the first CastII conflicts with the type recorded in the range check CastII. 6- The assert predicate that should fold to protect the range check CastII doesn't because of the fix for JDK-8282592: on assert predicate updates, the CastII at the zero trip guard is skipped. So the range check CastII sees the narrowing of the type of the CastII at the zero trip guard but the assert predicate doesn't. The fix I propose is to revert that part of the change from JDK-8282592 so both the range check CastII and the assert predicate have the CastII at the zero trip guard as input and observe its type updates. I went back to that bug and tried to reproduce the failure again but couldn't. Reverting JDK-8281429 causes the bug to reproduce again. I tried tweaking the test so the crash reproduces with JDK-8281429 applied but couldn't. This is caused by JDK-8305189 because step 3- happens because of it. Before JDK-8305189, 3- happened after loop opts are over. I think what happened then was that a template assertion predicate that was in the process of having its `OpaqueLoopInit` and `OpaqueLoopStride` removed constant folded so the crash wouldn't reproduce. ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/14672/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14672&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309902 Stats: 64 lines in 2 files changed: 58 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14672.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14672/head:pull/14672 PR: https://git.openjdk.org/jdk/pull/14672 From mdoerr at openjdk.org Tue Jun 27 10:54:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 Jun 2023 10:54:02 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:12:47 GMT, David Holmes wrote: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. PPC64 implementation: --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp @@ -888,6 +888,11 @@ void InterpreterMacroAssembler::remove_activation(TosState state, // Test if reserved zone needs to be enabled. Label no_reserved_zone_enabling; + // check if already enabled - if so no re-enabling needed + lwz(R0, in_bytes(JavaThread::stack_guard_state_offset()), R16_thread); + cmpwi(CCR0, R0, StackOverflow::stack_guard_enabled); + beq_predict_taken(CCR0, no_reserved_zone_enabling); + // Compare frame pointers. There is no good stack pointer, as with stack // frame compression we can get different SPs when we do calls. A subsequent // call could have a smaller SP, so that this compare succeeds for an Is the RISCV version correct? `StackGuardState` is an `enum` and should typically have 4 Bytes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1609261546 From epeter at openjdk.org Tue Jun 27 11:56:27 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jun 2023 11:56:27 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes Message-ID: For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. **Motivation** I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. **How to use it** All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` which would match with IR nodes dumped like that: `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. Some examples: 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (useful for tests where the `LoopMaxUnroll` is artificially lowered, which sometimes prevents the maximal filling of vectors). 6. `@IR(counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int, max_float)", ">0"})` -> find at least one `VectorCastI2X` node that casts to type `float`, and where the size is exactly equals to the smaller maximal size for `ints` and `floats`. This is helpful when there are multiple types in the loop, and the number of elements is limited by the sizes of multiple types. I had to change lots of occurrences, hence you can find many more examples in the tests. **Details** Vector nodes that should be tested for `type` and `size` now are to be created with `VECTOR_PREFIX` and `vectorNode`, see `IRNode.java`. When specifying such a `vectorNode` in an IR rule, one first uses the `irNodePlaceholder` (eg `Load_VI`), and following it one can optionally add a `IRNode.VECTOR_SIZE` specifier, which is then parsed by `parseVectorNodeSize`. This allows either naming a concrete size (eg `IRNode.VECTOR_SIZE_8`), a tag (`IRNode.VECTOR_SIZE + ""`) where the the tag can be one of the tags listed in `parseVectorNodeSizeTag`, or a `min(...)` clause which computes the minimum value of a comma separated list of tags. As a last resort one can match for any size (`IRNode.VECTOR_SIZE_ANY`). The maximal vector size for any type is computed in `getMaxElementsForType`, under consideration of the CPU features and the `MaxVectorSize`. **Changes to tests** Unfortunately, I had to change a lot of IR rules, though not substantially. Most changes are because we usually had nodes like `MAX_V` or `LOAD_VECTOR` which matched for any type, and I had to create one node per type now (eg `MAX_VF, MAX_VD`, or `LOAD_VI, LOAD_VL, LOAD_VF, ...`). While this was a lot of work, it is still good to know that we are generating the nodes with the correct types. In the VectorAPI tests there were many which required concrete sizes due to the concrete size of the vector species. This is nice to test, since it guarantees that the vector species indeed generate the expected vector sizes. A few tests required more attention, where I had to use patterns like `IRNode.VECTOR_SIZE + "min(...)"`. These are especially interesting, as they test cases like mixed types (eg casting between types). **Future Work** There are a few nodes that I did not yet handle with `vectorNode` (eg `VECTOR_REINTERPRET`, `OR_V_MASK`, `MACRO_LOGIC_V`, `LOAD_VECTOR_GATHER(_MASKED)`). Some of these only have very few tests and are all from the Vector API which was not my priority here. They can easily be converted should the need arise in the future. While looking at lots of IR tests I also came up with these RFE's: [JDK-8310891](https://bugs.openjdk.org/browse/JDK-8310891) C2 SuperWord tests: move platform requirements to IR rules [JDK-8310523](https://bugs.openjdk.org/browse/JDK-8310523) Add IR tests for nodes that have too few IR tests yet [JDK-8310533](https://bugs.openjdk.org/browse/JDK-8310533) [IR Framework] Add possibility to automatically verify that a test method always returns the same result **Testing** tier1-tier6 and stress-testing **Running**. ------------- Commit messages: - fix whitespace - fix 3 tests with old IRNode names - vector cast - VECTOR_CAST_I2X - vector mask cmp and blend - remove some remaining any size cases - implement vector node min(...) tag parsing - small refactoring - cmove and bad format test - Merge branch 'master' into JDK-8310308 - ... and 32 more: https://git.openjdk.org/jdk/compare/9057b350...1466083f Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310308 Stats: 3003 lines in 62 files changed: 943 ins; 16 del; 2044 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From dholmes at openjdk.org Tue Jun 27 12:18:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jun 2023 12:18:04 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 07:29:21 GMT, Tobias Hartmann wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > Looks good to me as well. Maybe @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32) could help with the implementation / testing on the other platforms. Thanks @TobiHartmann for the review. Note there is no arm32 version here as for some reason it does not have the reserved stack access support, at least in this area. @RealFYang - thanks for that I will apply your change. @TheRealMDoerr - thanks for PPC code. I will look into the issue with using the enum as for x64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1609381942 From simonis at openjdk.org Tue Jun 27 12:49:36 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 27 Jun 2023 12:49:36 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) [v2] In-Reply-To: References: Message-ID: > This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): > > > # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 > # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer > # > # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) > ... > Current CompileTask: > C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) > > Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) > V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) > V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) > V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) > V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) > V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) > V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) > V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) > V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) > V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) > V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) > V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) > V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) > V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) > V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) > V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) > V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) > V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) > V [libjvm.so+0x1230036] thread_native_entry(Thread*)+0x1a5 (os_linux.cpp:778) > ... > ``` > > `SubTypeC... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Added Roland's new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14600/files - new: https://git.openjdk.org/jdk/pull/14600/files/9e8d10a4..7a73d39c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14600&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14600&range=00-01 Stats: 89 lines in 1 file changed: 89 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14600.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14600/head:pull/14600 PR: https://git.openjdk.org/jdk/pull/14600 From simonis at openjdk.org Tue Jun 27 12:50:21 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 27 Jun 2023 12:50:21 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 16:21:18 GMT, Roland Westrelin wrote: >> This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): >> >> >> # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 >> # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) >> ... >> Current CompileTask: >> C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) >> >> Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) >> V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) >> V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) >> V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) >> V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) >> V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) >> V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) >> V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) >> V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) >> V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) >> V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) >> V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) >> V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) >> V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) >> V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) >> V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) >> V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) >> V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) >> V [libjvm.so+0x1230036] thread_na... > > @simonis I reproduced it and I'm taking a closer look. I've added @rwestrel's test to the PR and verified that the code generated for the new test with this fix is the same like the code that was generated before [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691) and also the same like the code generated if we run with `-XX:+StressIGVN` but a different `StressSeed`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1609436447 From roland at openjdk.org Tue Jun 27 13:20:04 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jun 2023 13:20:04 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 12:49:36 GMT, Volker Simonis wrote: >> This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): >> >> >> # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 >> # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) >> ... >> Current CompileTask: >> C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) >> >> Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) >> V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) >> V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) >> V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) >> V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) >> V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) >> V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) >> V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) >> V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) >> V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) >> V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) >> V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) >> V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) >> V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) >> V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) >> V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) >> V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) >> V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) >> V [libjvm.so+0x1230036] thread_na... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Added Roland's new test FWIW, I propose an alternate fix here. https://github.com/openjdk/jdk/compare/master...rwestrel:jdk:JDK-8303279 Seeing null or a nullable value at a `SubTypeCheck` could be a bug as the expectation is that inputs are null checked and the implementation of `SubTypeCheck` would crash with a null input. So I added an assert to `SubTypeCheckNode::sub` to catch a nullable input. The assert fires with the test because split if runs with a non yet fully collapsed dead path. So I tweak split if so it's delayed until the path is collapsed. When running testing I found that the assert would fire in other cases because of values known to be non null be not marked as such. The end result is a bigger patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1609490054 From simonis at openjdk.org Tue Jun 27 14:00:05 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 27 Jun 2023 14:00:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 12:49:36 GMT, Volker Simonis wrote: >> This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): >> >> >> # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 >> # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) >> ... >> Current CompileTask: >> C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) >> >> Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) >> V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) >> V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) >> V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) >> V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) >> V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) >> V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) >> V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) >> V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) >> V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) >> V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) >> V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) >> V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) >> V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) >> V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) >> V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) >> V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) >> V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) >> V [libjvm.so+0x1230036] thread_na... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Added Roland's new test > FWIW, I propose an alternate fix here. [master...rwestrel:jdk:JDK-8303279](https://github.com/openjdk/jdk/compare/master...rwestrel:jdk:JDK-8303279) Seeing null or a nullable value at a `SubTypeCheck` could be a bug as the expectation is that inputs are null checked and the implementation of `SubTypeCheck` would crash with a null input. So I added an assert to `SubTypeCheckNode::sub` to catch a nullable input. The assert fires with the test because split if runs with a non yet fully collapsed dead path. So I tweak split if so it's delayed until the path is collapsed. When running testing I found that the assert would fire in other cases because of values known to be non null be not marked as such. The end result is a bigger patch. Thanks @rwestrel. I'm fine with your patch. Do you want to take JDK-8303279 and propose your fix as PR? I will then close mine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1609565121 From roland at openjdk.org Tue Jun 27 14:16:05 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jun 2023 14:16:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 13:17:38 GMT, Roland Westrelin wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Added Roland's new test > > FWIW, I propose an alternate fix here. https://github.com/openjdk/jdk/compare/master...rwestrel:jdk:JDK-8303279 > Seeing null or a nullable value at a `SubTypeCheck` could be a bug as the expectation is that inputs are null checked and the implementation of `SubTypeCheck` would crash with a null input. So I added an assert to `SubTypeCheckNode::sub` to catch a nullable input. The assert fires with the test because split if runs with a non yet fully collapsed dead path. So I tweak split if so it's delayed until the path is collapsed. When running testing I found that the assert would fire in other cases because of values known to be non null be not marked as such. The end result is a bigger patch. > Thanks @rwestrel. I'm fine with your patch. Do you want to take JDK-8303279 and propose your fix as PR? I will then close mine. Let me open the PR. I will away for a week though starting later today. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1609593728 From simonis at openjdk.org Tue Jun 27 14:25:19 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 27 Jun 2023 14:25:19 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 14:13:06 GMT, Roland Westrelin wrote: >> FWIW, I propose an alternate fix here. https://github.com/openjdk/jdk/compare/master...rwestrel:jdk:JDK-8303279 >> Seeing null or a nullable value at a `SubTypeCheck` could be a bug as the expectation is that inputs are null checked and the implementation of `SubTypeCheck` would crash with a null input. So I added an assert to `SubTypeCheckNode::sub` to catch a nullable input. The assert fires with the test because split if runs with a non yet fully collapsed dead path. So I tweak split if so it's delayed until the path is collapsed. When running testing I found that the assert would fire in other cases because of values known to be non null be not marked as such. The end result is a bigger patch. > >> Thanks @rwestrel. I'm fine with your patch. Do you want to take JDK-8303279 and propose your fix as PR? I will then close mine. > > Let me open the PR. I will away for a week though starting later today. Closing this PR in favour of @rwestrel 's. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14600#issuecomment-1609611389 From simonis at openjdk.org Tue Jun 27 14:25:20 2023 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 27 Jun 2023 14:25:20 GMT Subject: Withdrawn: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) In-Reply-To: References: Message-ID: <0GiI0QdgrTBUy-BdcnwT3lqhb3wJmn3iCLWsWnD8peU=.01bbbab7-44b7-4817-a2c4-975a3b0a6d1d@github.com> On Wed, 21 Jun 2023 17:25:38 GMT, Volker Simonis wrote: > This is a problem probably introduced by [JDK-8238691](https://bugs.openjdk.org/browse/JDK-8238691). It could reproduce it with JDK 17, 18 and 21 and results in the following crash (see [JBS-issue](https://bugs.openjdk.org/browse/JDK-8303279) for more details): > > > # Internal Error (/priv/simonisv/OpenJDK/Git/jdk/src/hotspot/share/opto/type.hpp:2059), pid=1152816, tid=1154124 > # assert(_base >= OopPtr && _base <= AryPtr) failed: Not a Java pointer > # > # JRE version: OpenJDK Runtime Environment (21.0) (slowdebug build 21-internal-adhoc.simonisv.jdk) > ... > Current CompileTask: > C2: 91009 8214 ! 4 io.grpc.kotlin.ServerCalls$serverCallListener$requests$1::invokeSuspend (285 bytes) > > Stack: [0x00007fff1306b000,0x00007fff1316c000], sp=0x00007fff13166fe0, free space=1007k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x61d636] Type::is_oopptr() const+0x4e (type.hpp:2059) > V [libjvm.so+0x14cee55] SubTypeCheckNode::sub(Type const*, Type const*) const+0x53 (subtypenode.cpp:37) > V [libjvm.so+0x14c66b0] SubNode::Value(PhaseGVN*) const+0xa6 (subnode.cpp:107) > V [libjvm.so+0xcdacb3] split_if(IfNode*, PhaseIterGVN*)+0x2ce (ifnode.cpp:111) > V [libjvm.so+0xce044c] IfNode::Ideal_common(PhaseGVN*, bool)+0x128 (ifnode.cpp:1438) > V [libjvm.so+0xce0496] IfNode::Ideal(PhaseGVN*, bool)+0x30 (ifnode.cpp:1448) > V [libjvm.so+0x1298244] PhaseGVN::apply_ideal(Node*, bool)+0x70 (phaseX.cpp:667) > V [libjvm.so+0x129a0fd] PhaseIterGVN::transform_old(Node*)+0x12d (phaseX.cpp:1196) > V [libjvm.so+0x12998df] PhaseIterGVN::optimize()+0x16b (phaseX.cpp:1045) > V [libjvm.so+0x93f89e] Compile::Optimize()+0xce0 (compile.cpp:2378) > V [libjvm.so+0x9385fa] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x16ca (compile.cpp:842) > V [libjvm.so+0x806ab4] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1a0 (c2compiler.cpp:118) > V [libjvm.so+0x958bc8] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xa04 (compileBroker.cpp:2265) > V [libjvm.so+0x9576fa] CompileBroker::compiler_thread_loop()+0x462 (compileBroker.cpp:1944) > V [libjvm.so+0x97b14a] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:58) > V [libjvm.so+0xd434ce] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:719) > V [libjvm.so+0xd43368] JavaThread::run()+0x258 (javaThread.cpp:704) > V [libjvm.so+0x15481ea] Thread::call_run()+0x1a8 (thread.cpp:217) > V [libjvm.so+0x1230036] thread_native_entry(Thread*)+0x1a5 (os_linux.cpp:778) > ... > ``` > > `SubTypeC... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14600 From roland at openjdk.org Tue Jun 27 14:48:33 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 Jun 2023 14:48:33 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) Message-ID: The crash occurs because at split if during IGVN, a `SubTypeCheck` is created with null as input. That happens because the control path the `SubTypeCheck` is cloned for is dead. To fix that I propose delaying split if until dead paths are collapsed. I added an assert to check a nullable first input to `SubTypeCheck` nodes (which should be impossible because it should be null checked). When I ran testing, a number of cases showed up with known non null values non properly marked as non null. I fixed them. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/14678/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303279 Stats: 103 lines in 6 files changed: 89 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/14678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14678/head:pull/14678 PR: https://git.openjdk.org/jdk/pull/14678 From cslucas at openjdk.org Tue Jun 27 15:02:04 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 27 Jun 2023 15:02:04 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v19] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Merge branch 'openjdk:master' into rematerialization-of-merges - Rome minor refactorings. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges Catching up with master. - Address PR review 6: debug format output & some refactoring. - Catching up with master branch. Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address PR review 6: refactoring around rematerialization & improve test cases. - Address PR review 5: refactor on rematerialization & add tests. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - ... and 10 more: https://git.openjdk.org/jdk/compare/5ca4cdd2...d7cf00af ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=18 Stats: 2732 lines in 26 files changed: 2484 ins; 108 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From kvn at openjdk.org Tue Jun 27 16:09:21 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 Jun 2023 16:09:21 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 03:18:21 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Looks good to me. You need second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14576#pullrequestreview-1501356956 From kvn at openjdk.org Tue Jun 27 16:11:22 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 Jun 2023 16:11:22 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v3] In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 08:51:27 GMT, Emanuel Peter wrote: >> Removed a spurious assert before optimization bailout. >> >> I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. >> >> I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. >> >> I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. >> Testing up to tier6 and stress testing. **Running, passing except for some IR rules I had to fix, rerunning...** > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > require 64 bit for test with OR_REDUCTION_V Update is good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14494#pullrequestreview-1501363492 From kvn at openjdk.org Tue Jun 27 16:34:05 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 Jun 2023 16:34:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 14:40:49 GMT, Roland Westrelin wrote: > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. Looks reasonable. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1501407243 From duke at openjdk.org Tue Jun 27 16:59:28 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 27 Jun 2023 16:59:28 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v10] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: minor cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/25fa86e9..2bd04191 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=08-09 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From epeter at openjdk.org Tue Jun 27 17:52:15 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jun 2023 17:52:15 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. General question: Do you have any tests with varying loop limit, and check that you stop exactly at the right iteration? Would be even more interesting with mixed type examples. Just to see that you do not over/under duplicate the vectors. src/hotspot/share/opto/vmaskloop.cpp line 595: > 593: uint tree_depth = exact_log2(large) - exact_log2(small) + 1; > 594: // All vector masks construct a perfect binary tree of "2 ^ depth - 1" nodes > 595: // We create a list of "2 ^ depth" nodes for easier computation. Assume we have a small and a large type (byte and long). Size 1 and 8. `tree_depth = log2(8) - log2(1) + 1 = 3 - 0 + 1 = 4`. Then you generate a tree with `2^4-1 = 15` nodes. Did I calculate this right? That seems a bit excessive. Would be interesting to see benchmarks for mixed type cases. src/hotspot/share/opto/vmaskloop.cpp line 735: > 733: vnode = new StoreVectorMaskedNode(ctrl, mem, addr, val, at, mask); > 734: } > 735: } else if (VectorNode::is_convert_opcode(opc)) { Ok, this does work for same size conversions: `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java` public class Test { static int RANGE = 1024; public static void main(String[] strArr) { double a[] = new double[RANGE]; long b[] = new long[RANGE]; test0(a, b); } static void test0(double[] a, long[] b) { for (int i = 0; i < RANGE; i++) { b[i] = (long)a[i]; } } } Good to see some conversion is possible. But if I replace double with float, I get `Vector element size does not match`. Can that limitation be lifted? src/hotspot/share/opto/vmaskloop.cpp line 785: > 783: } > 784: > 785: // Duplicate vectorized operations with given vector element size Got to here today. There should probably be some comment higher up that you first replace scalars with one vector each, and then duplicate them for the larger types that need multiple vectors. I'm also concerned that there may be some platforms where the max vector width in bytes is not the same for all types. But maybe all platforms that support masked register ops also all have the same vector width in bytes for all types? ------------- PR Review: https://git.openjdk.org/jdk/pull/14581#pullrequestreview-1501451796 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244088279 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244114831 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244126073 From epeter at openjdk.org Tue Jun 27 17:52:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jun 2023 17:52:18 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Fri, 23 Jun 2023 14:44:15 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vmaskloop.cpp line 550: > >> 548: // 2) Address is growing down (index scale * loop stride < 0) >> 549: // 3) Memory access scale is different from data size >> 550: // 4) The loop increment node is on the SWPointer's node stack > > Why should the `incr` not be on the node stack? Does that not prevent `a[i+1]` from being accepted? > src/hotspot/share/opto/vmaskloop.cpp line 595: > >> 593: uint tree_depth = exact_log2(large) - exact_log2(small) + 1; >> 594: // All vector masks construct a perfect binary tree of "2 ^ depth - 1" nodes >> 595: // We create a list of "2 ^ depth" nodes for easier computation. > > Assume we have a small and a large type (byte and long). Size 1 and 8. `tree_depth = log2(8) - log2(1) + 1 = 3 - 0 + 1 = 4`. Then you generate a tree with `2^4-1 = 15` nodes. Did I calculate this right? That seems a bit excessive. Would be interesting to see benchmarks for mixed type cases. Can there be cases where creating the masks makes vectorization unprofitable? > src/hotspot/share/opto/vmaskloop.cpp line 785: > >> 783: } >> 784: >> 785: // Duplicate vectorized operations with given vector element size > > Got to here today. There should probably be some comment higher up that you first replace scalars with one vector each, and then duplicate them for the larger types that need multiple vectors. > > I'm also concerned that there may be some platforms where the max vector width in bytes is not the same for all types. But maybe all platforms that support masked register ops also all have the same vector width in bytes for all types? Assume we only allow `32` bit registers for `int`, but `64` bits for doubles. Now you'd be assuming that there need to be double as many `double` vectors as `int` vectors. But actually, they need the same amount of vectors, because vectors of both sizes fit exactly `8` elements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244068613 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244093283 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244130010 From epeter at openjdk.org Tue Jun 27 17:52:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 Jun 2023 17:52:20 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Tue, 27 Jun 2023 17:16:11 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 595: >> >>> 593: uint tree_depth = exact_log2(large) - exact_log2(small) + 1; >>> 594: // All vector masks construct a perfect binary tree of "2 ^ depth - 1" nodes >>> 595: // We create a list of "2 ^ depth" nodes for easier computation. >> >> Assume we have a small and a large type (byte and long). Size 1 and 8. `tree_depth = log2(8) - log2(1) + 1 = 3 - 0 + 1 = 4`. Then you generate a tree with `2^4-1 = 15` nodes. Did I calculate this right? That seems a bit excessive. Would be interesting to see benchmarks for mixed type cases. > > Can there be cases where creating the masks makes vectorization unprofitable? I have an example here: public class Test { static int RANGE = 1024; public static void main(String[] strArr) { byte a[] = new byte[RANGE]; long b[] = new long[RANGE]; test0(a, b); } static void test0(byte[] a, long[] b) { for (int i = 0; i < RANGE; i++) { a[i]++; b[i]++; } } } `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 Test.java` This are the masks: Generated vector masks in vmask tree Lane_size = 1 3710 LoopVectorMask === _ 367 26 [[ 3711 3712 ]] #vectormask[64]:{byte} Lane_size = 2 3711 ExtractLowMask === _ 3710 [[ 3713 3714 ]] #vectormask[32]:{short} 3712 ExtractHighMask === _ 3710 [[ 3715 3716 ]] #vectormask[32]:{short} Lane_size = 4 3713 ExtractLowMask === _ 3711 [[ 3717 3718 ]] #vectormask[16]:{int} 3714 ExtractHighMask === _ 3711 [[ 3719 3720 ]] #vectormask[16]:{int} 3715 ExtractLowMask === _ 3712 [[ 3721 3722 ]] #vectormask[16]:{int} 3716 ExtractHighMask === _ 3712 [[ 3723 3724 ]] #vectormask[16]:{int} Lane_size = 8 3717 ExtractLowMask === _ 3713 [[ ]] #vectormask[8]:{long} 3718 ExtractHighMask === _ 3713 [[ ]] #vectormask[8]:{long} 3719 ExtractLowMask === _ 3714 [[ ]] #vectormask[8]:{long} 3720 ExtractHighMask === _ 3714 [[ ]] #vectormask[8]:{long} 3721 ExtractLowMask === _ 3715 [[ ]] #vectormask[8]:{long} 3722 ExtractHighMask === _ 3715 [[ ]] #vectormask[8]:{long} 3723 ExtractLowMask === _ 3716 [[ ]] #vectormask[8]:{long} 3724 ExtractHighMask === _ 3716 [[ ]] #vectormask[8]:{long} That is indeed `15` masks. Hmm. Maybe that is the best one can do. And maybe it is not all that bad. But again, would be interesting to see the benchmarks for that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244104209 From cslucas at openjdk.org Tue Jun 27 18:41:31 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 27 Jun 2023 18:41:31 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v20] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Addressing PR feedback. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/d7cf00af..4acfcbcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=18-19 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Tue Jun 27 18:41:32 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 27 Jun 2023 18:41:32 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: <3G-J7S82KT6w5oWaxYK-3soNIQDfcR65ESTRLA_LfDc=.bdde8aa7-4044-44de-9c01-951013d7707d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <3G-J7S82KT6w5oWaxYK-3soNIQDfcR65ESTRLA_LfDc=.bdde8aa7-4044-44de-9c01-951013d7707d@github.com> Message-ID: <9deZcQk5phYkxtQtINTCxZ3gX4_jwN8L0gfqyjwtmho=.8fb89c9a-7577-4378-a5ca-6a1bcc356587@github.com> On Fri, 23 Jun 2023 21:24:20 GMT, Vladimir Ivanov wrote: >> @iwanowww - I'm confused by what a "Diagnostic" flag is. According to [this documentation](https://wiki.openjdk.org/display/HotSpot/Hotspot+Command-line+Flags%3A+Kinds%2C+Lifecycle+and+the+CSR+Process) "Diagnostic flags are not meant for VM tuning or for product modes. They are to be used for VM quality assurance or field diagnosis of VM bugs [...]" I believe the patch I'm proposing is a VM tuning optimization, so should it really be a diagnostic flag? Besides, I think we'll try _at a later moment_ to make this a product flag. Do you think an experimental flag is more appropriate? Thank you. > > You can look at it in the following way: since the flag is set to true by default, the feature is unconditionally available in product binaries. The only reason to explicitly specify the flag is to turn the optimization off and it may be needed to diagnose VM crashes or performance regressions. > > As an afterthrought, maybe C2 should check a compiler directive (and not a global flag) to be able to control the optimization up to per-method granularity. Thank you @iwanowww for clarifying. Now I understand this better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1244190081 From cslucas at openjdk.org Tue Jun 27 18:56:16 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 27 Jun 2023 18:56:16 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Wed, 14 Jun 2023 20:48:36 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Merge branch 'openjdk:master' into rematerialization-of-merges >> - Rome minor refactorings. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> Catching up with master. >> - Address PR review 6: debug format output & some refactoring. >> - Catching up with master branch. >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address PR review 6: refactoring around rematerialization & improve test cases. >> - Address PR review 5: refactor on rematerialization & add tests. >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - ... and 9 more: https://git.openjdk.org/jdk/compare/57b82512...939dcffe > > src/hotspot/share/opto/c2compiler.cpp line 150: > >> 148: if (C.failure_reason_is(retry_no_reduce_allocation_merges())) { >> 149: assert(do_reduce_allocation_merges, "must make progress"); >> 150: do_reduce_allocation_merges = false; > > I consider the check here as a safety net which is intended to provide graceful degradation in performance if RAM optimization misbehaves for some reason. But bailing out an optimization is better than bailing out the whole compilation. I suggest to introduce new diagnostic flag (e.g., `VerifyReduceAllocationMerges`) and add a guarantee call here which signals whenever we encounter a problematic case. I'm fine with handling that as a separate enhancement (it makes sense to dump additional diagnostic info at the place where such bail outs are triggered ). I created a new [work item](https://bugs.openjdk.org/browse/JDK-8310980) to track this work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1244208611 From vlivanov at openjdk.org Tue Jun 27 20:31:05 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 Jun 2023 20:31:05 GMT Subject: RFR: 8303279: C2 Compiler crash (triggered by Kotlin 1.8.10) In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 14:40:49 GMT, Roland Westrelin wrote: > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. Proposed fix looks good. The testing revealed a failure in the newly introduced assertion (attached logs to the bug). Also, the bug summary is way too generic and lacks any details about the actual problem. Please, update it. src/hotspot/share/opto/ifnode.cpp line 95: > 93: uint i4; > 94: RegionNode* phi_region = phi->region(); > 95: for(i4 = 1; i4 < phi->req(); i4++ ) { Missing space: `for (i4`. src/hotspot/share/opto/subtypenode.cpp line 37: > 35: const Type* SubTypeCheckNode::sub(const Type* sub_t, const Type* super_t) const { > 36: const TypeKlassPtr* superk = super_t->isa_klassptr(); > 37: assert(sub_t != Type::TOP && !TypePtr::NULL_PTR->higher_equal(sub_t), "should be not null"); There's a failure observed during testing. I attached logs to the bug. test/hotspot/jtreg/compiler/splitif/TestCrashAtIGVNSplitIfSubType.java line 28: > 26: * @bug 8303279 > 27: * @summary C2 Compiler crash (triggered by Kotlin 1.8.10) > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:+PrintCompilation -XX:CompileOnly=TestCrashAtIGVNSplitIfSubType::test -XX:CompileCommand=quiet -XX:+StressIGVN -XX:StressSeed=598200189 TestCrashAtIGVNSplitIfSubType Missing flag: `-XX:+StressIGVN` requires `-XX:+UnlockDiagnosticVMOptions`. ------------- PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1501774702 PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1244299611 PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1244307870 PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1244307228 From dholmes at openjdk.org Tue Jun 27 23:29:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jun 2023 23:29:16 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v2] In-Reply-To: References: Message-ID: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Updated RISC-V code, and new PPC code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/1d001c3b..23318e06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=00-01 Stats: 8 lines in 2 files changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From dholmes at openjdk.org Tue Jun 27 23:51:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 27 Jun 2023 23:51:05 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 10:51:05 GMT, Martin Doerr wrote: > The other implementation don't look correct to me. StackGuardState is an enum and should typically have 4 Bytes. @TheRealMDoerr could you elaborate please. We have the following: - x86: we use `void cmpl(Address dst, int32_t imm32);` - Aarch64: we use `cmp(Register Rd, unsigned char imm8)` and cast to `u1` - RISC-V: we use `sub (Register Rd, Register Rn, int64_t decrement, Register temp = t0);` (perhaps should be `subw` for 32-bit? @RealFYang ?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1610371177 From fyang at openjdk.org Wed Jun 28 01:18:23 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 28 Jun 2023 01:18:23 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 23:48:40 GMT, David Holmes wrote: >> PPC64 implementation: >> >> --- a/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp >> +++ b/src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp >> @@ -888,6 +888,11 @@ void InterpreterMacroAssembler::remove_activation(TosState state, >> // Test if reserved zone needs to be enabled. >> Label no_reserved_zone_enabling; >> >> + // check if already enabled - if so no re-enabling needed >> + lwz(R0, in_bytes(JavaThread::stack_guard_state_offset()), R16_thread); >> + cmpwi(CCR0, R0, StackOverflow::stack_guard_enabled); >> + beq_predict_taken(CCR0, no_reserved_zone_enabling); >> + >> // Compare frame pointers. There is no good stack pointer, as with stack >> // frame compression we can get different SPs when we do calls. A subsequent >> // call could have a smaller SP, so that this compare succeeds for an >> >> >> The other implementation don't look correct to me. `StackGuardState` is an `enum` and should typically have 4 Bytes. > >> The other implementation don't look correct to me. StackGuardState is an enum and should typically have 4 Bytes. > > @TheRealMDoerr could you elaborate please. We have the following: > - x86: we use `void cmpl(Address dst, int32_t imm32);` > - Aarch64: we use `cmp(Register Rd, unsigned char imm8)` and cast to `u1` > - RISC-V: we use `sub (Register Rd, Register Rn, int64_t decrement, Register temp = t0);` (perhaps should be `subw` for 32-bit? @RealFYang ?) @dholmes-ora : Yes, that makes sense to me. And I will also need to change to use 32-bit load (lw) to get the guard state. diff --git a/src/hotspot/cpu/riscv/interp_masm_riscv.cpp b/src/hotspot/cpu/riscv/interp_masm_riscv.cpp index edec2e08c83..1e981498fcc 100644 --- a/src/hotspot/cpu/riscv/interp_masm_riscv.cpp +++ b/src/hotspot/cpu/riscv/interp_masm_riscv.cpp @@ -764,6 +764,11 @@ void InterpreterMacroAssembler::remove_activation( // testing if reserved zone needs to be re-enabled Label no_reserved_zone_enabling; + // check if already enabled - if so no re-enabling needed + lw(t0, Address(xthread, JavaThread::stack_guard_state_offset())); + subw(t0, t0, StackOverflow::stack_guard_enabled); + beqz(t0, no_reserved_zone_enabling); + ld(t0, Address(xthread, JavaThread::reserved_stack_activation_offset())); ble(t1, t0, no_reserved_zone_enabling); ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1610439335 From haosun at openjdk.org Wed Jun 28 02:03:10 2023 From: haosun at openjdk.org (Hao Sun) Date: Wed, 28 Jun 2023 02:03:10 GMT Subject: Integrated: 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 In-Reply-To: References: Message-ID: <2Su6HEBpsfShXTDNLxVuq_8j9XU9pFVJRIQpuh_kfKk=.71445bc6-859d-4237-b205-ff093643f56a@github.com> On Tue, 20 Jun 2023 06:35:15 GMT, Hao Sun wrote: > `UseSHA3Intrinsics` was introduced in JDK-8252204, but it was not auto-enabled due to the lack of real hardware. In JDK-8297092, the intrinsic was evaluated on existing hardware with the support of SHA3 feature (including Neoverse N2/V1 and Apple silicon), and it was auto-enabled by default on Apple silicon only. See the code [1]. > > As a result, test case `TestUseSHA3IntrinsicsOptionOnSupportedCPU.java` fails on Neoverse N2 and V1 with the following error message: > > > JavaTest Message: Test threw exception: java.lang.AssertionError: Option 'UseSHA3Intrinsics' is expected to have 'true' value > Option 'UseSHA3Intrinsics' should be enabled by default > > > The group of test cases `TestUseXXXIntrinsicsOptionOnSupportedCPU.java` are designed to verify that, option `UseXXXIntrinsics` should be enabled by default if the underlying hardware supports the corresponding CPU feature. > > Apparently this check condition doesn't work for `UseSHA3Intrinsics`. The other expcetion case is `UseSHA512Intrinsics`. See JDK-8257796. > > Fix: One `@requires` condition is added in this patch to limit that this test case is only run on macOS on Apple silicon. Note that SHA3 feature is currently supported by AArch64 only. > > Test: this test case passed on Linux/Neoverse N2, Linux/Neoverse V1 and macOS on Apple silicon. > > [1] https://github.com/openjdk/jdk/pull/11382/files#diff-a87e260510f34ca7d9b0feb089ad982be8268c5c8aa5a71221f6738b051ea488R338-R345 This pull request has now been integrated. Changeset: afdaa2a3 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/afdaa2a3305461538f3a36de2b0b540fe2da9b37 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8309109: AArch64: [TESTBUG] compiler/intrinsics/sha/cli/TestUseSHA3IntrinsicsOptionOnSupportedCPU.java fails on Neoverse N2 and V1 Reviewed-by: aph, fyang ------------- PR: https://git.openjdk.org/jdk/pull/14551 From dholmes at openjdk.org Wed Jun 28 02:38:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 02:38:25 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v3] In-Reply-To: References: Message-ID: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Update to 32-bit load and sub ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/23318e06..e11d4b8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From dholmes at openjdk.org Wed Jun 28 02:38:26 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 02:38:26 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 23:29:16 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Updated RISC-V code, and new PPC code All modified platforms are building correctly - see GHA results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1610557738 From fyang at openjdk.org Wed Jun 28 03:20:07 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 28 Jun 2023 03:20:07 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 02:38:25 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update to 32-bit load and sub src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 768: > 766: > 767: // check if already enabled - if so no re-enabling needed > 768: ldw(t0, Address(xthread, JavaThread::stack_guard_state_offset())); It's `lw` instead of `ldw`. See my previous comment. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1244616187 From duke at openjdk.org Wed Jun 28 04:31:03 2023 From: duke at openjdk.org (sid8606) Date: Wed, 28 Jun 2023 04:31:03 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 06:05:12 GMT, sid8606 wrote: > Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. > > Ran tier1 test cases passing with release, fastdebug and slowdebug. @RealLucy @TheRealMDoerr Please review this PR as per your availability. Thank you ------------- PR Comment: https://git.openjdk.org/jdk/pull/14647#issuecomment-1610695341 From amitkumar at openjdk.org Wed Jun 28 05:39:04 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 28 Jun 2023 05:39:04 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 06:05:12 GMT, sid8606 wrote: > Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. > > Ran tier1 test cases passing with release, fastdebug and slowdebug. LGTM and test passed, Thanks for keeping it short. I have changed `mrthod` to `method` in title. Probably you want to do the same in our ZenHub. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/14647#pullrequestreview-1502313412 From epeter at openjdk.org Wed Jun 28 05:55:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 05:55:18 GMT Subject: Integrated: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Thu, 15 Jun 2023 14:42:14 GMT, Emanuel Peter wrote: > Removed a spurious assert before optimization bailout. > > I assumed that the "scalar input" of a Reduction node must always be either a Phi or another Reduction node. But that is incorrect, partial vectorization can lead to a reduction node chain where we have vector reductions and scalar reductions at the same time. In those cases, we cannot move the UnorderedReductions out of the loop, so a optimization bailout is appropriate. > > I assessed the other asserts in `PhaseIdealLoop::move_unordered_reduction_out_of_loop`, and I think they are all justified. However, one assert would have lead to a `continue` in production, which would not break out of the nested loop correctly. I changed it to a `return`, so that would be a bailout from the optimization. This assert should not be triggered because in `SuperWord::mark_reductions` we forbid that a reduction node has any uses inside the loop except for the successor node in the reduction chain. > > I have one regression test delivered by the fuzzer, and one that I constructed myself after understanding the issue. > Testing up to tier6 and stress testing. This pull request has now been integrated. Changeset: 526dba1a Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/526dba1a2942e444bf11d03d8eaf014b5ef20ccf Stats: 145 lines in 3 files changed: 141 ins; 0 del; 4 mod 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14494 From epeter at openjdk.org Wed Jun 28 05:55:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 05:55:17 GMT Subject: RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 16:08:36 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> require 64 bit for test with OR_REDUCTION_V > > Update is good Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14494#issuecomment-1610785723 From dholmes at openjdk.org Wed Jun 28 06:04:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 06:04:22 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v4] In-Reply-To: References: Message-ID: <4xnxYtLtS7E49LcDhtH_0CIGc8kiv-OCi5R8zDcqJBQ=.f5dbb2c6-7c83-4dfa-8364-d39eaf35a451@github.com> > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/e11d4b8c..7c21f1c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From amitkumar at openjdk.org Wed Jun 28 06:14:11 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 28 Jun 2023 06:14:11 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v4] In-Reply-To: <4xnxYtLtS7E49LcDhtH_0CIGc8kiv-OCi5R8zDcqJBQ=.f5dbb2c6-7c83-4dfa-8364-d39eaf35a451@github.com> References: <4xnxYtLtS7E49LcDhtH_0CIGc8kiv-OCi5R8zDcqJBQ=.f5dbb2c6-7c83-4dfa-8364-d39eaf35a451@github.com> Message-ID: On Wed, 28 Jun 2023 06:04:22 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Here is s390x port: diff --git a/src/hotspot/cpu/s390/interp_masm_s390.cpp b/src/hotspot/cpu/s390/interp_masm_s390.cpp index 79c758f8b11..7555eb05b1f 100644 --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp @@ -952,6 +952,11 @@ void InterpreterMacroAssembler::remove_activation(TosState state, // Test if reserved zone needs to be enabled. Label no_reserved_zone_enabling; + // check if already enabled - if so no re-enabling needed + guarantee(sizeof(StackOverflow::StackGuardState) == 4, "unexptected size"); + z_ly(Z_R0, Address(Z_thread, JavaThread::stack_guard_state_offset())); + compare32_and_branch(Z_R0, StackOverflow::stack_guard_enabled, bcondEqual, no_reserved_zone_enabling); + // Compare frame pointers. There is no good stack pointer, as with stack // frame compression we can get different SPs when we do calls. A subsequent // call could have a smaller SP, so that this compare succeeds for an @TheRealMDoerr I have a 2nd patch as well, Would you please confirm which will be better: diff --git a/src/hotspot/cpu/s390/interp_masm_s390.cpp b/src/hotspot/cpu/s390/interp_masm_s390.cpp index 79c758f8b11..a6774326286 100644 --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp @@ -952,6 +952,12 @@ void InterpreterMacroAssembler::remove_activation(TosState state, // Test if reserved zone needs to be enabled. Label no_reserved_zone_enabling; + // check if already enabled - if so no re-enabling needed + guarantee(sizeof(StackOverflow::StackGuardState) == 4, "unexptected size"); + z_cli(Address(Z_thread, JavaThread::stack_guard_state_offset() + in_ByteSize(sizeof(StackOverflow::StackGuardState) - 1)), + StackOverflow::stack_guard_enabled); + z_bre(no_reserved_zone_enabling); + // Compare frame pointers. There is no good stack pointer, as with stack // frame compression we can get different SPs when we do calls. A subsequent // call could have a smaller SP, so that this compare succeeds for an ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1610806669 From duke at openjdk.org Wed Jun 28 07:04:06 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 07:04:06 GMT Subject: RFR: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call [v3] In-Reply-To: References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: On Mon, 26 Jun 2023 13:05:13 GMT, Eric Nothum wrote: >> The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. >> Also generally null_check_receiver() should be combined with stopped(), which was not the case here. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > 8307625: adding null_check_receiver() for the uninitialized case, as else only argument(1) is null checked Thanks for the clarifications and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14542#issuecomment-1610877642 From duke at openjdk.org Wed Jun 28 07:28:10 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 07:28:10 GMT Subject: Integrated: 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call In-Reply-To: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> References: <2r2lmJ5VIvU0oRomDEN5U8Kv-YFsiiRH9gYnfUBLlao=.bfe05962-f755-49f9-ba60-7b506491bada@github.com> Message-ID: On Mon, 19 Jun 2023 14:13:38 GMT, Eric Nothum wrote: > The null_check_receiver() calls in generate_method_call are redundant as all callers of generate_method_call already perform this check. For future uses of generate_method_call a new assert is introduced that fails if the caller does not null check the receiver. > Also generally null_check_receiver() should be combined with stopped(), which was not the case here. This pull request has now been integrated. Changeset: c3f10e84 Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/c3f10e847999ec254893de5a1a5de32fd07f715a Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8307625: Redundant receiver null check in LibraryCallKit::generate_method_call Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14542 From dholmes at openjdk.org Wed Jun 28 08:05:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 08:05:19 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: Message-ID: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: S390 code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/7c21f1c5..a6bb4a47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=03-04 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From dholmes at openjdk.org Wed Jun 28 08:05:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 08:05:22 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v4] In-Reply-To: References: <4xnxYtLtS7E49LcDhtH_0CIGc8kiv-OCi5R8zDcqJBQ=.f5dbb2c6-7c83-4dfa-8364-d39eaf35a451@github.com> Message-ID: On Wed, 28 Jun 2023 06:10:55 GMT, Amit Kumar wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typo > > Here is s390x port: > > diff --git a/src/hotspot/cpu/s390/interp_masm_s390.cpp b/src/hotspot/cpu/s390/interp_masm_s390.cpp > index 79c758f8b11..7555eb05b1f 100644 > --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp > +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp > @@ -952,6 +952,11 @@ void InterpreterMacroAssembler::remove_activation(TosState state, > // Test if reserved zone needs to be enabled. > Label no_reserved_zone_enabling; > > + // check if already enabled - if so no re-enabling needed > + guarantee(sizeof(StackOverflow::StackGuardState) == 4, "unexptected size"); > + z_ly(Z_R0, Address(Z_thread, JavaThread::stack_guard_state_offset())); > + compare32_and_branch(Z_R0, StackOverflow::stack_guard_enabled, bcondEqual, no_reserved_zone_enabling); > + > // Compare frame pointers. There is no good stack pointer, as with stack > // frame compression we can get different SPs when we do calls. A subsequent > // call could have a smaller SP, so that this compare succeeds for an > > > > @TheRealMDoerr I have a 2nd patch as well, Would you please confirm which will be better: > > > diff --git a/src/hotspot/cpu/s390/interp_masm_s390.cpp b/src/hotspot/cpu/s390/interp_masm_s390.cpp > index 79c758f8b11..a6774326286 100644 > --- a/src/hotspot/cpu/s390/interp_masm_s390.cpp > +++ b/src/hotspot/cpu/s390/interp_masm_s390.cpp > @@ -952,6 +952,12 @@ void InterpreterMacroAssembler::remove_activation(TosState state, > // Test if reserved zone needs to be enabled. > Label no_reserved_zone_enabling; > > + // check if already enabled - if so no re-enabling needed > + guarantee(sizeof(StackOverflow::StackGuardState) == 4, "unexptected size"); > + z_cli(Address(Z_thread, JavaThread::stack_guard_state_offset() + in_ByteSize(sizeof(StackOverflow::StackGuardState) - 1)), > + StackOverflow::stack_guard_enabled); > + z_bre(no_reserved_zone_enabling); > + > // Compare frame pointers. There is no good stack pointer, as with stack > // frame compression we can get different SPs when we do calls. A subsequent > // call could have a smaller SP, so that this compare succeeds for an Thanks @offamitkumar ! I've applied the first variant for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1610946742 From dholmes at openjdk.org Wed Jun 28 08:05:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 08:05:23 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 03:16:52 GMT, Fei Yang wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Update to 32-bit load and sub > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 768: > >> 766: >> 767: // check if already enabled - if so no re-enabling needed >> 768: ldw(t0, Address(xthread, JavaThread::stack_guard_state_offset())); > > It's `lw` instead of `ldw`. See my previous comment. Thanks. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1244835112 From duke at openjdk.org Wed Jun 28 08:07:33 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 08:07:33 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v2] In-Reply-To: References: Message-ID: > The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: Changed TestCompilationTimeout and WaitForCompilationTimeout to expect seconds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14649/files - new: https://git.openjdk.org/jdk/pull/14649/files/22fbeedd..c43697d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=00-01 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14649/head:pull/14649 PR: https://git.openjdk.org/jdk/pull/14649 From duke at openjdk.org Wed Jun 28 08:16:55 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 08:16:55 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v3] In-Reply-To: References: Message-ID: > The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. Eric Nothum has updated the pull request incrementally with two additional commits since the last revision: - Revert "change s to ms in README.md" This reverts commit 4dc035f4b694645f9582d412a03632d4447b74e7. - Revert "Update README.md" This reverts commit 22fbeedd1633366a0de1b673bcd57a89bf3931d3. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14649/files - new: https://git.openjdk.org/jdk/pull/14649/files/c43697d4..bee55417 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14649/head:pull/14649 PR: https://git.openjdk.org/jdk/pull/14649 From duke at openjdk.org Wed Jun 28 08:17:14 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 08:17:14 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:07:33 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > Changed TestCompilationTimeout and WaitForCompilationTimeout to expect seconds Reverted the changes in README. Changed the code to now expect a value in seconds from the user. The timing itself is still implemented in milliseconds, which is why I added MS to the internal variable name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14649#issuecomment-1610968912 From fgao at openjdk.org Wed Jun 28 08:17:37 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 28 Jun 2023 08:17:37 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v3] In-Reply-To: References: Message-ID: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into fg8308340 - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files - Merge branch 'master' into fg8308340 - 8308340: C2: Idealize Fma nodes Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14576/files - new: https://git.openjdk.org/jdk/pull/14576/files/a22814d8..06162d88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=01-02 Stats: 7098 lines in 250 files changed: 2631 ins; 2299 del; 2168 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From fgao at openjdk.org Wed Jun 28 08:21:08 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 28 Jun 2023 08:21:08 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 16:05:54 GMT, Vladimir Kozlov wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into fg8308340 >> - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files >> - Merge branch 'master' into fg8308340 >> - 8308340: C2: Idealize Fma nodes >> >> Some platforms, like aarch64, ppc, and riscv, support fusing >> `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating >> partially symmetric match rules like: >> >> ``` >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> ``` >> >> Since `Fma` is partially communitive, the patch is to convert >> `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, >> making node patterns canonical. Then we can remove redundant >> rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on >> platforms supporting `Fma` instructions before matcher, so we >> can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform >> decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Looks good to me. > You need second review. Thanks for your review @vnkozlov . I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1610974224 From epeter at openjdk.org Wed Jun 28 08:31:37 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 08:31:37 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: revert to ANY for TestAutoVectorization2DArray.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/1466083f..c04d5164 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From yyang at openjdk.org Wed Jun 28 09:09:15 2023 From: yyang at openjdk.org (Yi Yang) Date: Wed, 28 Jun 2023 09:09:15 GMT Subject: RFR: 8311010: C1 array access causes SIGSEGV due to lack of range check Message-ID: int[] a = { 11 } ; for (int i = -1; i <= 0; i++) { for (int j = -3; j <= 2147483646 * i - 3; j++) { b += a[j + 3]; } } C1 eliminates range check before accessing array, because he did the following deduction: lower - const <= x <= upper - const lower <= x + const <= upper This is wrong, because (lower - const + const) and (upper - const + const) may overflow/underflow, e.g. -3 <= x <= min_jint - 3 0 <= x + 3 <= min_jint (wrong) The proposed change is to assume the worst case whenever upper or lower is found, which may be somewhat conservative. ------------- Commit messages: - 8311010 C1 array access causes SIGSEGV due to lack of range check Changes: https://git.openjdk.org/jdk/pull/14689/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14689&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311010 Stats: 71 lines in 2 files changed: 69 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14689.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14689/head:pull/14689 PR: https://git.openjdk.org/jdk/pull/14689 From mdoerr at openjdk.org Wed Jun 28 10:00:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 Jun 2023 10:00:08 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: Message-ID: <9B6TIpzjhLnUVqFo82q8hwdJIpO1o_fCpCz8fVIlu0M=.7c3415c9-5633-4c4b-8932-1a54b47be07c@github.com> On Wed, 28 Jun 2023 08:05:19 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > S390 code src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 698: > 696: // check if already enabled - if so no re-enabling needed > 697: ldr(rscratch1, Address(rthread, JavaThread::stack_guard_state_offset())); > 698: cmp(rscratch1, (u1)StackOverflow::stack_guard_enabled); Not ldrw + cmpw? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1244983783 From chagedorn at openjdk.org Wed Jun 28 10:38:04 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jun 2023 10:38:04 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:16:55 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "change s to ms in README.md" > > This reverts commit 4dc035f4b694645f9582d412a03632d4447b74e7. > - Revert "Update README.md" > > This reverts commit 22fbeedd1633366a0de1b673bcd57a89bf3931d3. That looks good but you also need to update the new variable names here: https://github.com/openjdk/jdk/blob/c3f10e847999ec254893de5a1a5de32fd07f715a/test/hotspot/jtreg/compiler/lib/ir_framework/test/CustomRunTest.java#L136-L140 ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14649#pullrequestreview-1502793519 From epeter at openjdk.org Wed Jun 28 11:09:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 11:09:23 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 08:24:19 GMT, Pengfei Li wrote: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. Here a few more detailed comments. I'll no go over to a more over-all feedback (https://github.com/openjdk/jdk/pull/14581#issuecomment-1603978076). src/hotspot/share/opto/vmaskloop.cpp line 89: > 87: cl->mark_loop_vectorized(); > 88: cl->mark_vector_masked(); > 89: _phase->C->set_max_vector_size(MaxVectorSize); What is this for? src/hotspot/share/opto/vmaskloop.cpp line 531: > 529: if (!addp->is_AddP() || !operates_on_array_of_type(addp, mem_type)) { > 530: return nullptr; > 531: } I guess this prevents you from having `Unsafe` use type mismatched loads/stores. But it also prevents vectorization in cases where one might just store shorts into an int array using `Unsafe`. This saves you a lot of headaches. You probably don't lose too much for not vectorizing those cases. src/hotspot/share/opto/vmaskloop.cpp line 642: > 640: > 641: // Helper method for finding or creating a vector input at specified index > 642: Node* VectorMaskedLoop::get_vector_input(Node* node, uint idx) { This is analogous to `SuperWord::vector_opd`. Can we not refactor things so that we can share the code? src/hotspot/share/opto/vmaskloop.cpp line 790: > 788: // Compute vector duplication count and the vmask tree level > 789: int dup_cnt = lane_size / _size_stats.smallest_size(); > 790: int level = exact_log2(dup_cnt); Rename `level` to something more expressive. Maybe just `vmask_tree_level`. Also in all other methods. Otherwise it is not quite clear what it is supposed to be. src/hotspot/share/opto/vmaskloop.cpp line 798: > 796: if (type2aelembytes(statement_bottom_type(stmt)) != lane_size) { > 797: continue; > 798: } You could assert here, that the max vector size for bt is as expected. src/hotspot/share/opto/vmaskloop.cpp line 854: > 852: } > 853: } > 854: } What happens if you have a int and a float slice? You don't seem to separate them here but just thread them together. `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java` public class Test { static int RANGE = 1024; public static void main(String[] strArr) { float a[] = new float[RANGE]; int b[] = new int[RANGE]; short c[] = new short[RANGE]; test0(a, b, c); } static void test0(float[] a, int[] b, short[] c) { for (int i = 0; i < RANGE; i++) { a[i] ++; b[i] ++; c[i] ++; } } } It seems the memory state is now passed between the int and float `StoreVectorMasked`: Duplicated vector nodes with lane size = 4 Offset = 0 3524 StoreVectorMasked === 479 484 475 3525 3510 [[ 3527 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=3519,[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13) 3525 AddVF === _ 3526 3517 [[ 3524 ]] #vectorz[16]:{float} !orig=3518,[473],[130] !jvms: Test::test0 @ bci:14 (line 13) 3526 LoadVectorMasked === 502 484 475 3510 [[ 3525 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=3516,[474],[128] !jvms: Test::test0 @ bci:12 (line 13) 3527 StoreVectorMasked === 479 3524 471 3528 3510 [[ 3519 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=3523,[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14) 3528 AddVI === _ 3529 3521 [[ 3527 ]] #vectorz[16]:{int} !orig=3522,[462],[186] !jvms: Test::test0 @ bci:21 (line 14) 3529 LoadVectorMasked === 502 480 471 3510 [[ 3528 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=3520,[470],[185] !jvms: Test::test0 @ bci:19 (line 14) Offset = 1 3519 StoreVectorMasked === 479 3527 3530 3518 3511 [[ 493 484 3523 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13) 3518 AddVF === _ 3516 3517 [[ 3519 ]] #vectorz[16]:{float} !orig=[473],[130] !jvms: Test::test0 @ bci:14 (line 13) 3516 LoadVectorMasked === 502 484 3531 3511 [[ 3518 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=[474],[128] !jvms: Test::test0 @ bci:12 (line 13) 3523 StoreVectorMasked === 479 3519 3532 3522 3511 [[ 491 480 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14) 3522 AddVI === _ 3520 3521 [[ 3523 ]] #vectorz[16]:{int} !orig=[462],[186] !jvms: Test::test0 @ bci:21 (line 14) 3520 LoadVectorMasked === 502 480 3533 3511 [[ 3522 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=[470],[185] !jvms: Test::test0 @ bci:19 (line 14) src/hotspot/share/opto/vmaskloop.cpp line 874: > 872: void VectorMaskedLoop::adjust_vector_node(Node* vn, Node_List* vmask_tree, > 873: int level, int mask_off) { > 874: Node* vmask = vmask_tree->at((1 << level) + mask_off); Again, rename `level`. Maybe it could be `vmask_tree_level` and `vmask_tree_level_offset`? Here I finally understood what you mean by the two variables `level` and `mask_off`. src/hotspot/share/opto/vmaskloop.cpp line 876: > 874: Node* vmask = vmask_tree->at((1 << level) + mask_off); > 875: int lane_size = type2aelembytes(Matcher::vector_element_basic_type(vmask)); > 876: uint vector_size_in_bytes = Matcher::max_vector_size(T_BYTE); Can you add an assert that this is the same as `Matcher::vector_width_in_bytes(Matcher::vector_element_basic_type(vmask))` ? src/hotspot/share/opto/vmaskloop.cpp line 884: > 882: Node* ptr = vn->in(MemNode::Address); > 883: Node* base = ptr->in(AddPNode::Base); > 884: int mem_scale = Matcher::max_vector_size(T_BYTE); Duplicate of `vector_size_in_bytes`? src/hotspot/share/opto/vmaskloop.cpp line 893: > 891: // 2) For populate index, update start index for non-zero mask offset > 892: if (mask_off != 0) { > 893: int v_stride = vector_size_in_bytes / lane_size * _cl->stride_con(); Is there any test for PopulateIndex with stride that is not `1`? For now I guess only `-1` would even be allowed. src/hotspot/share/opto/vmaskloop.cpp line 939: > 937: Node* root_vmask = vmask_tree->at(1); > 938: > 939: // Replace vectorization candidate nodes to vector nodes Expand explanation. Say that you are for now only generating a single vector node per scalar node. And that the duplication afterwards makes sure that all scalar nodes are "widened" to the same number of elements. The smalles type using a single vector, larger types using multiple (duplicated) vectors per scalar node. test/hotspot/jtreg/compiler/vectorization/runner/ArrayCopyTest.java line 82: > 80: @IR(applyIfCPUFeature = {"sve", "true"}, > 81: applyIf = {"UseMaskedLoop", "true"}, > 82: counts = {IRNode.LOOP_VECTOR_MASK, ">0"}) We could also do this: If the CPU features do not support the features for `UseMaskedLoop`, then just put it back to `false`. That way, we do not have to check for the required cpu features. Because when the flag it `true`, we know the platform must also support the corresponding masked instructions. test/hotspot/jtreg/compiler/vectorization/runner/ArrayInvariantFillTest.java line 69: > 67: @Test > 68: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > 69: applyIf = {"OptimizeFill", "false"}, This seems unrelated. Why did you have to add this? test/hotspot/jtreg/compiler/vectorization/runner/VectorizationTestRunner.java line 84: > 82: TestFramework irTest = new TestFramework(klass); > 83: // Add extra VM options to enable more auto-vectorization chances > 84: irTest.addFlags("-XX:-OptimizeFill"); Aha, you removed this too. Fair enough. But since the runner is currently requiring everything to be `flagless`, now I cannot actually force `-XX:-OptimizeFill` from the outside. And that means that potentially the tests are never actually run with `OptimizeFill` off, and we never actually can check the IR rules. We lose test coverage. That makes me a bit nervous. Suggestion: if tests actually require the flag off to execute the IR rule, then we should have two scenarios, one where the flag is on, and one when it is off. ------------- PR Review: https://git.openjdk.org/jdk/pull/14581#pullrequestreview-1502701701 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245029651 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245036824 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245039774 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245007902 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244964446 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1244994154 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245010353 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245018819 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245020257 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245022534 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245041760 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245046129 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245047803 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245052883 From epeter at openjdk.org Wed Jun 28 11:09:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 11:09:23 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Tue, 27 Jun 2023 17:25:11 GMT, Emanuel Peter wrote: >> Can there be cases where creating the masks makes vectorization unprofitable? > > I have an example here: > > public class Test { > static int RANGE = 1024; > > public static void main(String[] strArr) { > byte a[] = new byte[RANGE]; > long b[] = new long[RANGE]; > test0(a, b); > } > > static void test0(byte[] a, long[] b) { > for (int i = 0; i < RANGE; i++) { > a[i]++; > b[i]++; > } > } > } > > `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 Test.java` > This are the masks: > > Generated vector masks in vmask tree > Lane_size = 1 > 3710 LoopVectorMask === _ 367 26 [[ 3711 3712 ]] #vectormask[64]:{byte} > Lane_size = 2 > 3711 ExtractLowMask === _ 3710 [[ 3713 3714 ]] #vectormask[32]:{short} > 3712 ExtractHighMask === _ 3710 [[ 3715 3716 ]] #vectormask[32]:{short} > Lane_size = 4 > 3713 ExtractLowMask === _ 3711 [[ 3717 3718 ]] #vectormask[16]:{int} > 3714 ExtractHighMask === _ 3711 [[ 3719 3720 ]] #vectormask[16]:{int} > 3715 ExtractLowMask === _ 3712 [[ 3721 3722 ]] #vectormask[16]:{int} > 3716 ExtractHighMask === _ 3712 [[ 3723 3724 ]] #vectormask[16]:{int} > Lane_size = 8 > 3717 ExtractLowMask === _ 3713 [[ ]] #vectormask[8]:{long} > 3718 ExtractHighMask === _ 3713 [[ ]] #vectormask[8]:{long} > 3719 ExtractLowMask === _ 3714 [[ ]] #vectormask[8]:{long} > 3720 ExtractHighMask === _ 3714 [[ ]] #vectormask[8]:{long} > 3721 ExtractLowMask === _ 3715 [[ ]] #vectormask[8]:{long} > 3722 ExtractHighMask === _ 3715 [[ ]] #vectormask[8]:{long} > 3723 ExtractLowMask === _ 3716 [[ ]] #vectormask[8]:{long} > 3724 ExtractHighMask === _ 3716 [[ ]] #vectormask[8]:{long} > > That is indeed `15` masks. Hmm. Maybe that is the best one can do. And maybe it is not all that bad. But again, would be interesting to see the benchmarks for that case. Aha, maybe here we could just get away with 1 vmask for `byte`, and then directly extract 8 vmasks for `long`, since we do not need the ones in the middle? You'd have to generalize your `Extract(High/Low)Mask`. >> ![image](https://github.com/openjdk/jdk/assets/32593061/a00e4973-2faf-428e-9794-48abb945e815) >> >> That indeed looks like a mixup in the int/float memory slices. Not sure if there are any bad consequences, but that should be fixed. > > I just added some shorts, so that the int and float would be duplicated ;) Suggested solution: track the last memory state per slice, just like I recently did in `SuperWord::schedule_reorder_memops` with `current_state_in_slice`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245012558 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245004049 From epeter at openjdk.org Wed Jun 28 11:09:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 11:09:24 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 10:06:45 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > src/hotspot/share/opto/vmaskloop.cpp line 854: > >> 852: } >> 853: } >> 854: } > > What happens if you have a int and a float slice? You don't seem to separate them here but just thread them together. > > `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java` > > > > public class Test { > static int RANGE = 1024; > > public static void main(String[] strArr) { > float a[] = new float[RANGE]; > int b[] = new int[RANGE]; > short c[] = new short[RANGE]; > test0(a, b, c); > } > > static void test0(float[] a, int[] b, short[] c) { > for (int i = 0; i < RANGE; i++) { > a[i] ++; > b[i] ++; > c[i] ++; > } > } > } > > > It seems the memory state is now passed between the int and float `StoreVectorMasked`: > > > Duplicated vector nodes with lane size = 4 > Offset = 0 > 3524 StoreVectorMasked === 479 484 475 3525 3510 [[ 3527 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=3519,[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13) > 3525 AddVF === _ 3526 3517 [[ 3524 ]] #vectorz[16]:{float} !orig=3518,[473],[130] !jvms: Test::test0 @ bci:14 (line 13) > 3526 LoadVectorMasked === 502 484 475 3510 [[ 3525 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=3516,[474],[128] !jvms: Test::test0 @ bci:12 (line 13) > 3527 StoreVectorMasked === 479 3524 471 3528 3510 [[ 3519 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=3523,[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14) > 3528 AddVI === _ 3529 3521 [[ 3527 ]] #vectorz[16]:{int} !orig=3522,[462],[186] !jvms: Test::test0 @ bci:21 (line 14) > 3529 LoadVectorMasked === 502 480 471 3510 [[ 3528 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=3520,[470],[185] !jvms: Test::test0 @ bci:19 (line 14) > Offset = 1 > 3519 StoreVectorMasked === 479 3527 3530 3518 3511 [[ 493 484 3523 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serial... ![image](https://github.com/openjdk/jdk/assets/32593061/a00e4973-2faf-428e-9794-48abb945e815) That indeed looks like a mixup in the int/float memory slices. Not sure if there are any bad consequences, but that should be fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245001527 From epeter at openjdk.org Wed Jun 28 11:09:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 Jun 2023 11:09:25 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 10:13:53 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 854: >> >>> 852: } >>> 853: } >>> 854: } >> >> What happens if you have a int and a float slice? You don't seem to separate them here but just thread them together. >> >> `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java` >> >> >> >> public class Test { >> static int RANGE = 1024; >> >> public static void main(String[] strArr) { >> float a[] = new float[RANGE]; >> int b[] = new int[RANGE]; >> short c[] = new short[RANGE]; >> test0(a, b, c); >> } >> >> static void test0(float[] a, int[] b, short[] c) { >> for (int i = 0; i < RANGE; i++) { >> a[i] ++; >> b[i] ++; >> c[i] ++; >> } >> } >> } >> >> >> It seems the memory state is now passed between the int and float `StoreVectorMasked`: >> >> >> Duplicated vector nodes with lane size = 4 >> Offset = 0 >> 3524 StoreVectorMasked === 479 484 475 3525 3510 [[ 3527 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched Memory: @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; !orig=3519,[472],[151],829 !jvms: Test::test0 @ bci:15 (line 13) >> 3525 AddVF === _ 3526 3517 [[ 3524 ]] #vectorz[16]:{float} !orig=3518,[473],[130] !jvms: Test::test0 @ bci:14 (line 13) >> 3526 LoadVectorMasked === 502 484 475 3510 [[ 3525 ]] @float[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=6; mismatched #vectorz[16]:{float} !orig=3516,[474],[128] !jvms: Test::test0 @ bci:12 (line 13) >> 3527 StoreVectorMasked === 479 3524 471 3528 3510 [[ 3519 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; !orig=3523,[461],[207],823 !jvms: Test::test0 @ bci:22 (line 14) >> 3528 AddVI === _ 3529 3521 [[ 3527 ]] #vectorz[16]:{int} !orig=3522,[462],[186] !jvms: Test::test0 @ bci:21 (line 14) >> 3529 LoadVectorMasked === 502 480 471 3510 [[ 3528 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=7; mismatched #vectorz[16]:{int} !orig=3520,[470],[185] !jvms: Test::test0 @ bci:19 (line 14) >> Offset = 1 >> 3519 StoreVectorMasked === 479 3527 ... > > ![image](https://github.com/openjdk/jdk/assets/32593061/a00e4973-2faf-428e-9794-48abb945e815) > > That indeed looks like a mixup in the int/float memory slices. Not sure if there are any bad consequences, but that should be fixed. I just added some shorts, so that the int and float would be duplicated ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1245002118 From duke at openjdk.org Wed Jun 28 11:10:27 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 11:10:27 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v4] In-Reply-To: References: Message-ID: > The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: Changed variable name for CustomRunTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14649/files - new: https://git.openjdk.org/jdk/pull/14649/files/bee55417..dc2b46fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14649&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14649/head:pull/14649 PR: https://git.openjdk.org/jdk/pull/14649 From chagedorn at openjdk.org Wed Jun 28 11:12:03 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jun 2023 11:12:03 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 11:10:27 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > Changed variable name for CustomRunTest.java Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14649#pullrequestreview-1502848268 From duke at openjdk.org Wed Jun 28 11:12:05 2023 From: duke at openjdk.org (Eric Nothum) Date: Wed, 28 Jun 2023 11:12:05 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:16:55 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "change s to ms in README.md" > > This reverts commit 4dc035f4b694645f9582d412a03632d4447b74e7. > - Revert "Update README.md" > > This reverts commit 22fbeedd1633366a0de1b673bcd57a89bf3931d3. True good catch, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14649#issuecomment-1611209283 From yyang at openjdk.org Wed Jun 28 11:23:13 2023 From: yyang at openjdk.org (Yi Yang) Date: Wed, 28 Jun 2023 11:23:13 GMT Subject: Withdrawn: 8311010: C1 array access causes SIGSEGV due to lack of range check In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 09:02:10 GMT, Yi Yang wrote: > int[] a = { 11 } ; > for (int i = -1; i <= 0; i++) { > // Insert deopt check > if (2147483646 * i >=1) { goto deopt_stub;} > for (int j = -3; j <= 2147483646 * i - 3; j++) { > b += a[j + 3]; > } > } > > C1 eliminates range check before accessing array and inserts a deoptimization check before loop header, because he did the following deduction: > > lower - const <= x <= upper - const > lower <= x + const <= upper > > This is wrong, because (lower - const + const) and (upper - const + const) may overflow/underflow, e.g. > > -3 <= x <= min_jint - 3 > 0 <= x + 3 <= min_jint (wrong) > > The proposed change is to assume the worst case whenever upper or lower is found, which may be somewhat conservative. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14689 From jsjolen at openjdk.org Wed Jun 28 12:26:18 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Jun 2023 12:26:18 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Revert to list init ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14530/files - new: https://git.openjdk.org/jdk/pull/14530/files/7035bb4e..52e3acfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14530&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14530/head:pull/14530 PR: https://git.openjdk.org/jdk/pull/14530 From jsjolen at openjdk.org Wed Jun 28 12:38:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Jun 2023 12:38:03 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v4] In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 19:34:16 GMT, Vladimir Kozlov wrote: >> Thanks for the background, I wasn't aware of that. I don't have a strong opinion but consistency in the same area would be nice. > > Please, change to normal `()`. Using '{}' is very confusing for not modern C++ experts and affects maintainability of this code. Thanks for the input, reverted to `()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14530#discussion_r1245142713 From mbaesken at openjdk.org Wed Jun 28 12:57:07 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 28 Jun 2023 12:57:07 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 14:20:30 GMT, Alan Bateman wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > src/java.sql/share/classes/java/sql/DriverManager.java line 658: > >> 656: * (which is invoking this class indirectly) >> 657: * classloader, so that the JDBC driver class outside the image >> 658: * can be loaded from here. > > This code should probably be changed to use VM.isSystemDomainLoader(callerCL). > > I think the comment should be replaced because it doesn't match what it actually does and it's nothing to do with the whether the JDBC driver is in the run-time image or not. How about: > > "If the caller is defined to the bootstrap or platform class loader then use the Thread CCL as the initiating class loader so that a JDBC on the class path, or bundled with an application, is found." Hi Alan, regarding usage of class VM I get 'package jdk.internal.misc is declared in module java.base, which does not export it to module java.sql' Is there any concern to export it as well to module java.sql ? And btw did you mean to use it like this, in the if ? ` if (callerCL == null || VM.isSystemDomainLoader(callerCL)) { callerCL = Thread.currentThread().getContextClassLoader(); }` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1245164604 From simonis at openjdk.org Wed Jun 28 13:08:03 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 28 Jun 2023 13:08:03 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 14:40:49 GMT, Roland Westrelin wrote: > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. src/hotspot/share/opto/library_call.hpp line 183: > 181: return generate_method_call(method_id, false, true, res_not_null); > 182: } > 183: CallJavaNode* generate_method_call_virtual(vmIntrinsics::ID method_id) { `generate_method_call_virtual()` doesn't seem to be used anywhere in the code base so maybe we can drop it instead of updating it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1245179117 From chagedorn at openjdk.org Wed Jun 28 13:09:34 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jun 2023 13:09:34 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:31:37 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` o... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > revert to ANY for TestAutoVectorization2DArray.java This is a great enhancement! Thanks for working on that. I have left some comments in the IR framework code (will also have a look at the test updates later) but here are some more general comments: - We should provide some better description when misusing the new features. Example: @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min(4)", ">0"}) @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min()", ">0"}) Output: - Provided invalid value "_ at min(4)" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min(4)" for IR rule 2 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). - Provided invalid value "_ at min()" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min()" for IR rule 3 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). We could give the user some more information about what's wrong here. You might want to play around with other wrong usages of the new features and check if the format violation is precise enough. You could also add these wrong usages to `TestBadFormat.java`. - We should have a (sanity) test that explicitely uses `IRNode.VECTOR_SIZE_ANY` and `IRNode.VECTOR_SIZE_MAX`. - We should also make sure to have some sanity tests for all the different variations that are now possible with the new features (if not already covered by your updated tests). test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java line 64: > 62: // should not generate vectorized Min/Max nodes for them. > 63: @Test > 64: @IR(failOn = {IRNode.MIN_VI, IRNode.MIN_VF, IRNode.MIN_VD}) We could think about keeping generic vector nodes that match any type and restrict their usage to `failOn` constraints. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 77: > 75: * be directly specified in {@link #vectorNode}. For IR rules that are looking for a > 76: * non-zero count of this node, the size is assumed to be the maximal number of elements > 77: * that can fit in a vector of the specified type. This depends on the VM flag MaxVectorSize Suggestion: When using `{@link IR#counts()}` with a non-zero count, the size is assumed... test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 79: > 77: * that can fit in a vector of the specified type. This depends on the VM flag MaxVectorSize > 78: * and CPU features. For IR rules that are looking for zero such nodes, or use failOn, > 79: * there we match for any {@link #VECTOR_SIZE_ANY} size. This should be helpful in most cases Suggestion: When using `{@link IR#failOn} or `{@link IR#counts}` with a zero count, we match... test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 84: > 82: * to specify the size with {@link #VECTOR_SIZE}, followed by a size tag or comma separated > 83: * list of sizes. > 84: * Can you also add a description of the new matching features to the [README](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md)? Maybe you can also add some examples there and/or at [IRExample.java](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/IRExample.java). test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 167: > 165: public static final String ABS_VB = VECTOR_PREFIX + "ABS_VB" + POSTFIX; > 166: static { > 167: vectorNode(ABS_VB, "AbsVB", "byte"); I suggest to use private String constants for all the primitive types used with vector nodes. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 249: > 247: public static final String ADD_REDUCTION_VD = PREFIX + "ADD_REDUCTION_VD" + POSTFIX; > 248: static { > 249: beforeMatchingNameRegex(ADD_REDUCTION_VD, "AddReductionVD"); Shouldn't add reduction nodes only be created in Superword? test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 683: > 681: } > 682: > 683: public static final String LOAD_VB = VECTOR_PREFIX + "LOAD_VB" + POSTFIX; I suggest to keep `VECTOR` (i.e. `LOAD_VECTOR_B`) because the IR node is called `LoadVector` (while it is okay to abbreviate `AND_V` because the IR node is also called `AndV`). This makes it easier to find this `IRNode` entry when someone wants to write an IR rule with a `LoadVector`. Same for `VectorBlend`, `VectorMaskCmp` etc. below. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1591: > 1589: public static final String VECTOR_CAST_B2S = VECTOR_PREFIX + "VECTOR_CAST_B2S" + POSTFIX; > 1590: static { > 1591: vectorNode(VECTOR_CAST_B2S, "VectorCastB2", "short"); Since we match substrings it's not wrong but I think we should specify the complete IR node name here (was "wrong" before): Suggestion: vectorNode(VECTOR_CAST_B2S, "VectorCastB2", "short"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1596: > 1594: public static final String VECTOR_CAST_B2I = VECTOR_PREFIX + "VECTOR_CAST_B2I" + POSTFIX; > 1595: static { > 1596: vectorNode(VECTOR_CAST_B2I, "VectorCastB2", "int"); Suggestion: vectorNode(VECTOR_CAST_B2I, "VectorCastB2X", "int"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1601: > 1599: public static final String VECTOR_CAST_B2L = VECTOR_PREFIX + "VECTOR_CAST_B2L" + POSTFIX; > 1600: static { > 1601: vectorNode(VECTOR_CAST_B2L, "VectorCastB2", "long"); Suggestion: vectorNode(VECTOR_CAST_B2L, "VectorCastB2X", "long"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1606: > 1604: public static final String VECTOR_CAST_B2F = VECTOR_PREFIX + "VECTOR_CAST_B2F" + POSTFIX; > 1605: static { > 1606: vectorNode(VECTOR_CAST_B2F, "VectorCastB2", "float"); Suggestion: vectorNode(VECTOR_CAST_B2F, "VectorCastB2X", "float"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1611: > 1609: public static final String VECTOR_CAST_B2D = VECTOR_PREFIX + "VECTOR_CAST_B2D" + POSTFIX; > 1610: static { > 1611: vectorNode(VECTOR_CAST_B2D, "VectorCastB2", "double"); Suggestion: vectorNode(VECTOR_CAST_B2D, "VectorCastB2X", "double"); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1974: > 1972: } > 1973: > 1974: public static final String COMPRESS_BITSVI = VECTOR_PREFIX + "COMPRESS_BITSVI" + POSTFIX; Was inconsistent before but I suggest to add a `_`: Suggestion: public static final String COMPRESS_BITSVI = VECTOR_PREFIX + "COMPRESS_BITS_VI" + POSTFIX; Same below and for `EXPAND_BITSV`. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2099: > 2097: */ > 2098: private static void vectorNode(String irNodePlaceholder, String irNodeRegex, String typeString) { > 2099: TestFramework.check(isVectorIRNode(irNodePlaceholder), "vectorNode: failed prefix check for irNodePlaceholder " + irNodePlaceholder + " -> did you use VECTOR_PREFIX?"); You might want to wrap this long line. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2248: > 2246: String regex = ""; > 2247: for (int i = 0; i < sizes.length; i++) { > 2248: int s = 0; Suggestion: int size = 0; test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2250: > 2248: int s = 0; > 2249: try { > 2250: s = Integer.parseInt(sizes[i]); We should also check if the size is a reasonable number (i.e. a positive multiple of 2 and maybe an upper limit(?)) and report a format violation if that is not the case as for example in: @IR(counts = {IRNode.ADD_VI, IRNode.VECTOR_SIZE + "3,-2", "1"}) test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2269: > 2267: if (sizeTagString.startsWith("min(")) { > 2268: TestFormat.checkNoReport(sizeTagString.endsWith(")"), "Vector node size \"min(...)\" must end with \")\" \"" + sizeTagString + "\""); > 2269: String[] tags = sizeTagString.substring(4,sizeTagString.length() - 1).split(","); Suggestion: String[] tags = sizeTagString.substring(4, sizeTagString.length() - 1).split(","); test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2271: > 2269: String[] tags = sizeTagString.substring(4,sizeTagString.length() - 1).split(","); > 2270: TestFormat.checkNoReport(tags.length > 1, "Vector node size \"min(...)\" must have at least 2 comma separated arguments, got \"" + sizeTagString + "\""); > 2271: int min_val = 1024; Suggestion: int minVal = 1024; test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2290: > 2288: return String.valueOf(getMaxElementsForType(typeString, vmInfo)); > 2289: case "max_byte": > 2290: return parseVectorNodeSizeTag("max_for_type", "byte", vmInfo); Can't we directly return `String.valueOf(getMaxElementsForType("byte", vmInfo))` here? Also above on L2273 because it does not seem that we allow nested `min(min(..))`. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2307: > 2305: default: > 2306: return sizeTagString; > 2307: } Might be better to extract this and the `min()` parsing to separate methods. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2334: > 2332: if (avx512 && (typeString.equals("byte") || typeString.equals("short") || typeString.equals("char"))) { > 2333: maxBytes = avx512bw ? 64 : 32; > 2334: } Could be guarded with `Platform.isX64() || Platform.isX86()` and extracted to a separate method. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/IRRule.java line 51: > 49: this.ruleId = ruleId; > 50: this.irAnno = irAnno; > 51: this.matcher = new MatchableMatcher(new CompilePhaseIRRuleBuilder(irAnno, compilation).build(vmInfo)); For consistency, I would move `vmInfo` directly into the `CompilePhaseIRRuleBuilder` constructor. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/CheckAttributeReader.java line 72: > 70: return userPostfix; > 71: } else if (IRNode.isVectorIRNode(node)) { > 72: String irNode = IRNode.getIRNodeAccessString(node); Unused and can be removed. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/CheckAttributeReader.java line 74: > 72: String irNode = IRNode.getIRNodeAccessString(node); > 73: if (iterator.hasNext()) { > 74: String maybe_vt = iterator.next(); In Java, we should avoid underlines for local variables and non-static-final fields and use camal-case, i.e. `maybeVt` or `maybeVT`. There are some other places in this patch where you should change that. And it might be better to avoid abbreviations and use "vector type" since it might not be evidently clear in the context of this class. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/CheckAttributeReader.java line 81: > 79: iterator.previous(); > 80: } > 81: return CheckAttributeString.invalid(); I suggest to move this to a separate method and the composite reading as well. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 62: > 60: if (IRNode.isVectorIRNode(node)) { > 61: String type = IRNode.getVectorNodeType(node); > 62: TestFormat.checkNoReport(IRNode.getTypeSizeInBytes(type) > 0, "Vector node's type must have valid type, got \"" + type + "\" for \"" + node + "\""); You might want to move the error message to a new line to avoid long lines. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 71: > 69: size = vectorSizeDefault; > 70: } > 71: String size_regex = IRNode.parseVectorNodeSize(size, type, vmInfo); Suggestion: String sizeRegex = IRNode.parseVectorNodeSize(size, type, vmInfo); test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java line 72: > 70: } > 71: String size_regex = IRNode.parseVectorNodeSize(size, type, vmInfo); > 72: nodeRegex = nodeRegex.replaceAll(IRNode.IS_REPLACED, "vector[A-Za-z]\\\\[" + size_regex + "\\\\]:\\\\{" + type + "\\\\}"); You could move this to a separate method. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 59: > 57: private boolean expectMaxSizeForVectorNode() { > 58: switch (comparison.getComparator()) { > 59: case "<": You should indent the `case` statements. You can use the [enhanced switch](https://openjdk.org/jeps/361) style with arrows: case "<" -> { ... } Same for other places where you used `switch`. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 60: > 58: switch (comparison.getComparator()) { > 59: case "<": > 60: TestFormat.checkNoReport(comparison.getGivenValue() > 1, "Node count comparison \"<" + comparison.getGivenValue() + "\" should be rewritten as \"=0\""); Generally, you should only use `checkNoReport()` if it would not make sense to continue at this point because the state is broken (i.e. reading a value that does not exist and then operating on it). But for these kind of violations, where the user just passed something meaningless that has no further impact on the processing of this and other values, you can use `checkNoThrow()` which records the failure. After all the `Matchable` objects are created, we will report all collected violations with `throwIfAnyFailures()` here: https://github.com/openjdk/jdk/blob/c3f10e847999ec254893de5a1a5de32fd07f715a/test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/TestClassParser.java#L69-L74 Maybe you want to check all usages of `checkNoReport()` again and change to `checkNoThrow()` whenever possible (when bulk reporting the violations, it makes it easier for the user to fix them all at once). Sometimes, I also throw and catch `TestFormatExceptions` because I want to continue to check for more violations. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 75: > 73: return true; // max > 74: case "!=": > 75: TestFormat.checkNoReport(false, "Not-equal comparator not supported for node count: \"" + comparison.getComparator() + "\". Please rewrite the rule."); You can directly use `throw new TestFormatException(string)`. `checkNoReport()` is just a shortcut for if (!condition) { throw new TestFormatException(string) } test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 86: > 84: public Constraint parse(CompilePhase compilePhase, String compilationOutput, VMInfo vmInfo) { > 85: TestFramework.check(compilePhase != CompilePhase.DEFAULT, "must not be default"); > 86: String vectorSizeTag = expectMaxSizeForVectorNode() ? "max_for_type" : "any"; We could think about using IR framework internal static final fields for `max_for_type` and `any` as these strings are used at other places as well. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/IREncodingParser.java line 49: > 47: private static final Pattern IR_ENCODING_PATTERN = > 48: Pattern.compile("(?<=" + IREncodingPrinter.START + "\r?\n).*\\R([\\s\\S]*)(?=" + IREncodingPrinter.END + ")"); > 49: private static final Pattern VMINFO_PATTERN = Suggestion: private static final Pattern VM_INFO_PATTERN = test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 45: > 43: } > 44: > 45: public String getString(String key, String otherwise) { Suggestion: public String getStringValue(String key, String otherwise) { test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 52: > 50: } > 51: > 52: public long getLong(String key, long otherwise) { Suggestion: public long getLongValue(String key, long otherwise) { test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 62: > 60: > 61: public boolean hasCPUFeature(String feature) { > 62: TestFramework.check(isKey("cpuFeatures"), "VMInfo does not contain cpuFeatures"); I suggest to add this verification to the constructor and also verify that you find `MaxVectorSize` and `LoopMaxUnroll` since you are always emitting those. By doing so, I think you can also remove the `otherwise` values above because you should always find a valid value for the keys. test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 48: > 46: public static final String END = "----- END -----"; > 47: public static final int NO_RULE_APPLIED = -1; > 48: public static final String START_VMINFO = "##### IRMatchingVMInfo - used by TestFramework #####"; Suggestion: public static final String START_VM_INFO = "##### IRMatchingVMInfo - used by TestFramework #####"; test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 49: > 47: public static final int NO_RULE_APPLIED = -1; > 48: public static final String START_VMINFO = "##### IRMatchingVMInfo - used by TestFramework #####"; > 49: public static final String END_VMINFO = "----- END VMInfo -----"; Suggestion: public static final String END_VM_INFO = "----- END VMInfo -----"; test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java line 383: > 381: vmInfo.append("LoopMaxUnroll:" + loopMaxUnroll).append(System.lineSeparator()); > 382: vmInfo.append(END_VMINFO); > 383: TestFrameworkSocket.write(vmInfo.toString(), "VMInfo"); I suggest to move the VM info encoding printing to a separate class `VMInfoPrinter`. ------------- PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1502511643 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245058120 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245062957 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245063866 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245065891 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245054479 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245070302 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245073737 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245088533 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245088670 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245088952 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245087849 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245089379 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245092449 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245093404 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245139571 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245142927 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245098394 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245098919 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245162632 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245161059 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245164715 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244839933 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244904824 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244848292 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244842478 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244896029 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245096303 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244907381 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244889074 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244870683 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244854900 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244909322 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244911325 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245036479 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245036586 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245034003 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244911665 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244911898 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1244932302 From chagedorn at openjdk.org Wed Jun 28 13:09:35 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jun 2023 13:09:35 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:59:17 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/IREncodingParser.java line 49: > >> 47: private static final Pattern IR_ENCODING_PATTERN = >> 48: Pattern.compile("(?<=" + IREncodingPrinter.START + "\r?\n).*\\R([\\s\\S]*)(?=" + IREncodingPrinter.END + ")"); >> 49: private static final Pattern VMINFO_PATTERN = > > Suggestion: > > private static final Pattern VM_INFO_PATTERN = As for splitting the VM info printing to a separate class, I also suggest to split the VM info parsing into a separate class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1245045543 From alanb at openjdk.org Wed Jun 28 13:19:03 2023 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 28 Jun 2023 13:19:03 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 12:54:10 GMT, Matthias Baesken wrote: > Hi Alan, regarding usage of class VM I get 'package jdk.internal.misc is declared in module java.base, which does not export it to module java.sql' Is there any concern to export it as well to module java.sql ? And btw did you mean to use it like this, in the if ? > > `if (callerCL == null || VM.isSystemDomainLoader(callerCL)) { callerCL = Thread.currentThread().getContextClassLoader(); }` It was just a passing comment, I didn't meant to suggest changing it as part of this PR. We have always think twice before adding qualified exports from java.base and this is case where java.sql is very "non-core", we don't want to give it any access to java.base internals. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1245197137 From mbaesken at openjdk.org Wed Jun 28 13:25:07 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 28 Jun 2023 13:25:07 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 13:16:30 GMT, Alan Bateman wrote: >> Hi Alan, regarding usage of class VM I get >> 'package jdk.internal.misc is declared in module java.base, which does not export it to module java.sql' >> Is there any concern to export it as well to module java.sql ? >> And btw did you mean to use it like this, in the if ? >> >> ` >> if (callerCL == null || VM.isSystemDomainLoader(callerCL)) { >> callerCL = Thread.currentThread().getContextClassLoader(); >> } >> ` > >> Hi Alan, regarding usage of class VM I get 'package jdk.internal.misc is declared in module java.base, which does not export it to module java.sql' Is there any concern to export it as well to module java.sql ? And btw did you mean to use it like this, in the if ? >> >> `if (callerCL == null || VM.isSystemDomainLoader(callerCL)) { callerCL = Thread.currentThread().getContextClassLoader(); }` > > It was just a passing comment, I didn't meant to suggest changing it as part of this PR. We have always think twice before adding qualified exports from java.base and this is case where java.sql is very "non-core", we don't want to give it any access to java.base internals. Hi Alan, thanks for clarifying. So I should only adjust the comment, correct ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1245204791 From jsjolen at openjdk.org Wed Jun 28 13:34:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Jun 2023 13:34:05 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 06:20:38 GMT, Christian Hagedorn wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to list init > > Looks good! I'm wondering, why we don't stack allocate both `Node_List` instead of `new Node_List()`. But regardless of that, we should indeed add a `ResourceMark`. @chhagedorn , @TobiHartmann , would you mind re-approving this PR? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14530#issuecomment-1611419407 From chagedorn at openjdk.org Wed Jun 28 14:09:07 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 Jun 2023 14:09:07 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: <6vuKiOWNFi8CsWPbJUrsmEWSnkXI0gtOlmrmzfSxyFI=.5117beb2-eff1-471e-acea-2098f613bcfc@github.com> On Wed, 28 Jun 2023 12:26:18 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Revert to list init Update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1503190404 From thartmann at openjdk.org Wed Jun 28 14:18:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 28 Jun 2023 14:18:06 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: <0gnln43021vYwAS8E-bXS0CrPXII9yA3Yg_EFfx4k4M=.310023eb-b1db-4837-92c1-0b32f2eb32b7@github.com> On Wed, 28 Jun 2023 12:26:18 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Revert to list init Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1503215193 From thartmann at openjdk.org Wed Jun 28 14:45:26 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 28 Jun 2023 14:45:26 GMT Subject: [jdk21] RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Message-ID: Backport of [JDK-8310130](https://bugs.openjdk.java.net/browse/JDK-8310130). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Changes: https://git.openjdk.org/jdk21/pull/77/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=77&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310130 Stats: 145 lines in 3 files changed: 141 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk21/pull/77.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/77/head:pull/77 PR: https://git.openjdk.org/jdk21/pull/77 From ecaspole at openjdk.org Wed Jun 28 14:55:04 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 28 Jun 2023 14:55:04 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v3] In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 17:28:16 GMT, Aleksey Shipilev wrote: >> Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright header and apply Alekseys comments > > I think the benchmark code needs massaging for style and other issues. > See e.g. the cursory review: @shipilev - ping, Aleksey is this version OK with you? Thanks, Eric ------------- PR Comment: https://git.openjdk.org/jdk/pull/14521#issuecomment-1611593885 From shade at openjdk.org Wed Jun 28 16:38:21 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 Jun 2023 16:38:21 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v3] In-Reply-To: References: Message-ID: On Tue, 20 Jun 2023 20:13:19 GMT, Eric Caspole wrote: >> Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. >> This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. >> The defaults are set very low by default and the intent is that they would be customized for any given study. > > Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright header and apply Alekseys comments Some more things that are visible from here are below. I don't think I captured _all_ the problematic things that are there, please take a closer look at the file from the style perspective. test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 82: > 80: String[] classNames; > 81: > 82: int index = 0; Why this `index` is here? It seems not be used except that within the `setupClasses`. test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 196: > 194: + " }" > 195: + " " > 196: + " public Integer get2( Map m, String k, Integer depth) { " Here and later `get2( ` has redundant spaces after the parenthesis. test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 321: > 319: > 320: final String k = "key"; > 321: final Integer v = 1000; What are these fields? Do they need to be here? Do they need to be `static final` and be on top? test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 325: > 323: final String methodNames[] = { > 324: "get" > 325: }; Does it have to be an array? There is only a single element, should it be a single `static final` field? ------------- PR Review: https://git.openjdk.org/jdk/pull/14521#pullrequestreview-1503528418 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245484664 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245483757 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245482734 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245483221 From ecaspole at openjdk.org Wed Jun 28 17:48:56 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 28 Jun 2023 17:48:56 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 16:33:35 GMT, Aleksey Shipilev wrote: >> Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright header and apply Alekseys comments > > test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 82: > >> 80: String[] classNames; >> 81: >> 82: int index = 0; > > Why this `index` is here? It seems not be used except that within the `setupClasses`. Using index in this way to coordinate creating classes named like "B"+index is a convention that we have used in earlier tests such as hotspot/jtreg/serviceability/jvmti/RedefineClasses/TestMultipleClasses.java > test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 196: > >> 194: + " }" >> 195: + " " >> 196: + " public Integer get2( Map m, String k, Integer depth) { " > > Here and later `get2( ` has redundant spaces after the parenthesis. Thanks, fixed in the next rev. > test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 321: > >> 319: >> 320: final String k = "key"; >> 321: final Integer v = 1000; > > What are these fields? Do they need to be here? Do they need to be `static final` and be on top? They are used both for filling the maps passed as arguments and the key is passed later as a parameter when calling these generated methods. They could be static, yes. > test/micro/org/openjdk/bench/vm/compiler/CodeCacheStress.java line 325: > >> 323: final String methodNames[] = { >> 324: "get" >> 325: }; > > Does it have to be an array? There is only a single element, should it be a single `static final` field? In earlier versions it called more than 1 method in the generated classes, and I would prefer to keep that capability. It could be static, yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245565544 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245565263 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245564598 PR Review Comment: https://git.openjdk.org/jdk/pull/14521#discussion_r1245564910 From ecaspole at openjdk.org Wed Jun 28 17:59:13 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 28 Jun 2023 17:59:13 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v4] In-Reply-To: References: Message-ID: > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: Cleanups from Alekseys suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14521/files - new: https://git.openjdk.org/jdk/pull/14521/files/8fdb3c96..a81f0d4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14521&range=02-03 Stats: 25 lines in 1 file changed: 6 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/14521.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14521/head:pull/14521 PR: https://git.openjdk.org/jdk/pull/14521 From jbhateja at openjdk.org Wed Jun 28 18:08:07 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 Jun 2023 18:08:07 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. Message-ID: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. 1) Disable intrinsification if effective index do not lie within byte value range. 2) Use GT predicate while computing comparison mask for all the indices above vector length. No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations with JDK-8310691. Kindly review and share feedback. Best Regards, Jatin ------------- Commit messages: - Some code refactoring. - 8309531: Incorrect result with unwrapped iotaShuffle. Changes: https://git.openjdk.org/jdk/pull/14700/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309531 Stats: 131 lines in 2 files changed: 103 ins; 5 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From kvn at openjdk.org Wed Jun 28 18:07:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 Jun 2023 18:07:52 GMT Subject: [jdk21] RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 14:38:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310130](https://bugs.openjdk.java.net/browse/JDK-8310130). Applies cleanly. > > Thanks, > Tobias Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/77#pullrequestreview-1503688771 From kvn at openjdk.org Wed Jun 28 18:08:58 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 Jun 2023 18:08:58 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 12:26:18 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Revert to list init Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14530#pullrequestreview-1503690220 From kvn at openjdk.org Wed Jun 28 18:11:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 Jun 2023 18:11:55 GMT Subject: RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 08:56:26 GMT, Roland Westrelin wrote: > Before 8275201, loading the element klass of an array returned: > > > TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); > > > that is exact if the array type is exact. I changed it to: > > > tkls->is_aryklassptr()->elem(); > > > When the array type is exact (newly allocated array for instance) but > the element class has subclasses, this doesn't return an exact class > (so the logic is different from the one that was there before). That > affects array store checks that no longer constant fold. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14536#pullrequestreview-1503694021 From kvn at openjdk.org Wed Jun 28 18:17:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 Jun 2023 18:17:55 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v4] In-Reply-To: References: Message-ID: <8XwFfkUAP5eHwmKUkWYpYfzIEYTMMrtMbM8jl5LG508=.d31a8100-8e4b-4374-8f47-5a16a817d6f8@github.com> On Wed, 28 Jun 2023 11:10:27 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > Changed variable name for CustomRunTest.java Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14649#pullrequestreview-1503702350 From kvn at openjdk.org Wed Jun 28 18:26:54 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 Jun 2023 18:26:54 GMT Subject: RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 10:02:37 GMT, Roland Westrelin wrote: > The crash happens after the following steps: > > 1- pre/main/post loops are created with assert predicates above the main loop. > > 2- the main loop is peeled > > 3- as a consequence, the `OpaqueZeroTripGuard` for the main loop is removed > > 4- That allows narrowing of the type of the CastII that was added right > after the zero trip guard during pre/main/post loops creation > > 5- The CastII feeds into a range check CastII for the peeled iteration > that becomes top because the narrowed type of the first CastII > conflicts with the type recorded in the range check CastII. > > 6- The assert predicate that should fold to protect the range check > CastII doesn't because of the fix for JDK-8282592: on assert > predicate updates, the CastII at the zero trip guard is skipped. So > the range check CastII sees the narrowing of the type of the CastII > at the zero trip guard but the assert predicate doesn't. > > The fix I propose is to revert that part of the change from > JDK-8282592 so both the range check CastII and the assert predicate > have the CastII at the zero trip guard as input and observe its type > updates. I went back to that bug and tried to reproduce the failure > again but couldn't. Reverting JDK-8281429 causes the bug to reproduce > again. I tried tweaking the test so the crash reproduces with > JDK-8281429 applied but couldn't. > > This is caused by JDK-8305189 because step 3- happens because of > it. Before JDK-8305189, 3- happened after loop opts are over. I think > what happened then was that a template assertion predicate that was in > the process of having its `OpaqueLoopInit` and `OpaqueLoopStride` > removed constant folded so the crash wouldn't reproduce. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14672#pullrequestreview-1503715507 From thartmann at openjdk.org Wed Jun 28 18:30:10 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 28 Jun 2023 18:30:10 GMT Subject: [jdk21] RFR: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 14:38:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310130](https://bugs.openjdk.java.net/browse/JDK-8310130). Applies cleanly. > > Thanks, > Tobias Thanks, Vladimir. ------------- PR Comment: https://git.openjdk.org/jdk21/pull/77#issuecomment-1611885103 From thartmann at openjdk.org Wed Jun 28 18:30:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 28 Jun 2023 18:30:12 GMT Subject: [jdk21] Integrated: 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 14:38:43 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310130](https://bugs.openjdk.java.net/browse/JDK-8310130). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 687863d9 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/687863d9003c58d45841906a9fd5674eeeb132a1 Stats: 145 lines in 3 files changed: 141 ins; 0 del; 4 mod 8310130: C2: assert(false) failed: scalar_input is neither phi nor a matchin reduction Reviewed-by: kvn Backport-of: 526dba1a2942e444bf11d03d8eaf014b5ef20ccf ------------- PR: https://git.openjdk.org/jdk21/pull/77 From thartmann at openjdk.org Wed Jun 28 18:35:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 28 Jun 2023 18:35:58 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 11:10:27 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > Changed variable name for CustomRunTest.java Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14649#pullrequestreview-1503729729 From dcubed at openjdk.org Wed Jun 28 20:14:55 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 28 Jun 2023 20:14:55 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:05:19 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > S390 code S390 typo: s/unexptected/unexpected/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1612038887 From dholmes at openjdk.org Wed Jun 28 21:16:56 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 21:16:56 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 20:12:24 GMT, Daniel D. Daugherty wrote: > S390 typo: s/unexptected/unexpected/ Thanks @dcubed-ojdk , I spotted that before comitting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1612112667 From dholmes at openjdk.org Wed Jun 28 21:17:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 28 Jun 2023 21:17:00 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: <9B6TIpzjhLnUVqFo82q8hwdJIpO1o_fCpCz8fVIlu0M=.7c3415c9-5633-4c4b-8932-1a54b47be07c@github.com> References: <9B6TIpzjhLnUVqFo82q8hwdJIpO1o_fCpCz8fVIlu0M=.7c3415c9-5633-4c4b-8932-1a54b47be07c@github.com> Message-ID: On Wed, 28 Jun 2023 09:57:07 GMT, Martin Doerr wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> S390 code > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 698: > >> 696: // check if already enabled - if so no re-enabling needed >> 697: ldr(rscratch1, Address(rthread, JavaThread::stack_guard_state_offset())); >> 698: cmp(rscratch1, (u1)StackOverflow::stack_guard_enabled); > > Not ldrw + cmpw? I've no idea, I used the code below as the pattern here. @theRealAph reviewed this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1245777845 From jsjolen at openjdk.org Wed Jun 28 21:24:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Jun 2023 21:24:03 GMT Subject: RFR: 8310264: In PhaseChaitin::Split defs and phis are leaked [v5] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 12:26:18 GMT, Johan Sj?len wrote: >> Hi, >> >> `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. >> >> Please consider, thanks. >> >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Revert to list init Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14530#issuecomment-1612118347 From jsjolen at openjdk.org Wed Jun 28 21:24:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 28 Jun 2023 21:24:05 GMT Subject: Integrated: 8310264: In PhaseChaitin::Split defs and phis are leaked In-Reply-To: References: Message-ID: On Sat, 17 Jun 2023 16:08:53 GMT, Johan Sj?len wrote: > Hi, > > `defs` and `phis` are leaked as they are resource allocated but not protected by a `ResourceMark`. The intention might have been for these to also live in the `split_arena`.. This change is the most conservative one, however, and does fix the memory leak. > > Please consider, thanks. > > Johan This pull request has now been integrated. Changeset: 02b17d79 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/02b17d793bfcea611c654049c9ab680b70fb5685 Stats: 22 lines in 1 file changed: 6 ins; 4 del; 12 mod 8310264: In PhaseChaitin::Split defs and phis are leaked Reviewed-by: thartmann, chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14530 From dlong at openjdk.org Thu Jun 29 01:04:57 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 29 Jun 2023 01:04:57 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: <9B6TIpzjhLnUVqFo82q8hwdJIpO1o_fCpCz8fVIlu0M=.7c3415c9-5633-4c4b-8932-1a54b47be07c@github.com> Message-ID: On Wed, 28 Jun 2023 21:13:08 GMT, David Holmes wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 698: >> >>> 696: // check if already enabled - if so no re-enabling needed >>> 697: ldr(rscratch1, Address(rthread, JavaThread::stack_guard_state_offset())); >>> 698: cmp(rscratch1, (u1)StackOverflow::stack_guard_enabled); >> >> Not ldrw + cmpw? > > I've no idea, I used the code below as the pattern here. @theRealAph reviewed this. ldrw + cmpw does seem more correct, plus an asset that sizeof _stack_guard_state == 4. ldr+cmp is only going to work for little-endian, and only as long as the alignment padding between _stack_guard_state and _stack_overflow_limit is all zeroes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1245971331 From dholmes at openjdk.org Thu Jun 29 01:56:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Jun 2023 01:56:20 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v6] In-Reply-To: References: Message-ID: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with two additional commits since the last revision: - Change guarantee to assert - Dean's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/a6bb4a47..2070db9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=04-05 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From dholmes at openjdk.org Thu Jun 29 01:56:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 29 Jun 2023 01:56:20 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v5] In-Reply-To: References: <9B6TIpzjhLnUVqFo82q8hwdJIpO1o_fCpCz8fVIlu0M=.7c3415c9-5633-4c4b-8932-1a54b47be07c@github.com> Message-ID: On Thu, 29 Jun 2023 01:02:25 GMT, Dean Long wrote: >> I've no idea, I used the code below as the pattern here. @theRealAph reviewed this. > > ldrw + cmpw does seem more correct, plus an asset that sizeof _stack_guard_state == 4. ldr+cmp is only going to work for little-endian, and only as long as the alignment padding between _stack_guard_state and _stack_overflow_limit is all zeroes. Changed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14669#discussion_r1245998685 From jbhateja at openjdk.org Thu Jun 29 03:39:09 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jun 2023 03:39:09 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v2] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fix GHA sanity. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/8a15c0c7..a6aae353 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From bulasevich at openjdk.org Thu Jun 29 05:36:06 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 29 Jun 2023 05:36:06 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v6] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 12:15:40 GMT, David Holmes wrote: > Note there is no arm32 version here as for some reason it does not have the reserved stack access support, at least in this area. Yes, 'JEP 270: Reserved Stack Areas for Critical Sections' is not implemented for arm32, so the current change is not applicable to the platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1612456209 From duke at openjdk.org Thu Jun 29 06:54:54 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 29 Jun 2023 06:54:54 GMT Subject: RFR: 8295191: IR framework timeout options expect ms instead of s [v4] In-Reply-To: References: Message-ID: <_QGKwjeURa8leCvfQhuABuqpJtwGbIfegdqPybXqlxs=.7882fa71-c6b3-4a9a-976e-68e25ef26463@github.com> On Wed, 28 Jun 2023 11:10:27 GMT, Eric Nothum wrote: >> The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. > > Eric Nothum has updated the pull request incrementally with one additional commit since the last revision: > > Changed variable name for CustomRunTest.java Thanks for the reviews everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14649#issuecomment-1612515153 From duke at openjdk.org Thu Jun 29 07:02:01 2023 From: duke at openjdk.org (Eric Nothum) Date: Thu, 29 Jun 2023 07:02:01 GMT Subject: Integrated: 8295191: IR framework timeout options expect ms instead of s In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 08:37:31 GMT, Eric Nothum wrote: > The flags `-DTestCompilationTimeout` and `-DWaitForCompilationTimeout` expect values in ms. The example in the README was misleading as one could infer from "default: 10s" that the flag expects values in ms. Therefore I changed the values and units in the example to ms. This pull request has now been integrated. Changeset: b2eae16c Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/b2eae16c4504fb13bd06c999ef97f2faf0ad4932 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod 8295191: IR framework timeout options expect ms instead of s Reviewed-by: chagedorn, kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14649 From roland at openjdk.org Thu Jun 29 07:29:15 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:29:15 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v2] In-Reply-To: References: Message-ID: > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - review - review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14678/files - new: https://git.openjdk.org/jdk/pull/14678/files/cb3a04f0..0cf15596 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=00-01 Stats: 7 lines in 3 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14678/head:pull/14678 PR: https://git.openjdk.org/jdk/pull/14678 From roland at openjdk.org Thu Jun 29 07:38:15 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:38:15 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: References: Message-ID: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14678/files - new: https://git.openjdk.org/jdk/pull/14678/files/0cf15596..017d60b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14678/head:pull/14678 PR: https://git.openjdk.org/jdk/pull/14678 From roland at openjdk.org Thu Jun 29 07:38:16 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:38:16 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 07:29:15 GMT, Roland Westrelin wrote: >> The crash occurs because at split if during IGVN, a `SubTypeCheck` is >> created with null as input. That happens because the control path the >> `SubTypeCheck` is cloned for is dead. To fix that I propose delaying >> split if until dead paths are collapsed. >> >> I added an assert to check a nullable first input to `SubTypeCheck` >> nodes (which should be impossible because it should be null >> checked). When I ran testing, a number of cases showed up with known >> non null values non properly marked as non null. I fixed them. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - review > - review Thanks for reviewing this and running tests. > The testing revealed a failure in the newly introduced assertion (attached logs to the bug). I pushed a fix. The problem is that the return type of a boxing method is marked not null but if the method is inlined late, the result from inlining may not have a non null type. > Also, the bug summary is way too generic and lacks any details about the actual problem. Please, update it. Does the new one look ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14678#issuecomment-1612555774 From roland at openjdk.org Thu Jun 29 07:38:18 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:38:18 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 20:14:39 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/ifnode.cpp line 95: > >> 93: uint i4; >> 94: RegionNode* phi_region = phi->region(); >> 95: for(i4 = 1; i4 < phi->req(); i4++ ) { > > Missing space: `for (i4`. Done. > test/hotspot/jtreg/compiler/splitif/TestCrashAtIGVNSplitIfSubType.java line 28: > >> 26: * @bug 8303279 >> 27: * @summary C2 Compiler crash (triggered by Kotlin 1.8.10) >> 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:+PrintCompilation -XX:CompileOnly=TestCrashAtIGVNSplitIfSubType::test -XX:CompileCommand=quiet -XX:+StressIGVN -XX:StressSeed=598200189 TestCrashAtIGVNSplitIfSubType > > Missing flag: `-XX:+StressIGVN` requires `-XX:+UnlockDiagnosticVMOptions`. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1246232169 PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1246232289 From roland at openjdk.org Thu Jun 29 07:38:19 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:38:19 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 13:05:07 GMT, Volker Simonis wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/library_call.hpp line 183: > >> 181: return generate_method_call(method_id, false, true, res_not_null); >> 182: } >> 183: CallJavaNode* generate_method_call_virtual(vmIntrinsics::ID method_id) { > > `generate_method_call_virtual()` doesn't seem to be used anywhere in the code base so maybe we can drop it instead of updating it? Thanks for looking at this. Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1246232028 From roland at openjdk.org Thu Jun 29 07:44:08 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:44:08 GMT Subject: RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 10:58:22 GMT, Tobias Hartmann wrote: >> Before 8275201, loading the element klass of an array returned: >> >> >> TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); >> >> >> that is exact if the array type is exact. I changed it to: >> >> >> tkls->is_aryklassptr()->elem(); >> >> >> When the array type is exact (newly allocated array for instance) but >> the element class has subclasses, this doesn't return an exact class >> (so the logic is different from the one that was there before). That >> affects array store checks that no longer constant fold. > > Looks good to me. @TobiHartmann @vnkozlov thanks for the reviews @sviswa7 thanks for running some testing with the patch ------------- PR Comment: https://git.openjdk.org/jdk/pull/14536#issuecomment-1612566933 From roland at openjdk.org Thu Jun 29 07:44:10 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:44:10 GMT Subject: Integrated: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Mon, 19 Jun 2023 08:56:26 GMT, Roland Westrelin wrote: > Before 8275201, loading the element klass of an array returned: > > > TypeKlassPtr::make(tkls->ptr(), elem, 0/*offset*/); > > > that is exact if the array type is exact. I changed it to: > > > tkls->is_aryklassptr()->elem(); > > > When the array type is exact (newly allocated array for instance) but > the element class has subclasses, this doesn't return an exact class > (so the logic is different from the one that was there before). That > affects array store checks that no longer constant fold. This pull request has now been integrated. Changeset: be64d3ac Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/be64d3ac3cf9da2658038d64233f080da8011dc8 Stats: 64 lines in 3 files changed: 63 ins; 0 del; 1 mod 8310299: C2: 8275201 broke constant folding of array store check in some cases Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14536 From epeter at openjdk.org Thu Jun 29 07:49:56 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 07:49:56 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop [v2] In-Reply-To: <6dGgFqEqlnBcJGqRZjuM8jnrkWfgfo_QGkQs0ehUOJ0=.a421e9fb-be9a-49a0-bb81-89fa5d0154ca@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> <6dGgFqEqlnBcJGqRZjuM8jnrkWfgfo_QGkQs0ehUOJ0=.a421e9fb-be9a-49a0-bb81-89fa5d0154ca@github.com> Message-ID: On Thu, 29 Jun 2023 07:44:01 GMT, Roland Westrelin wrote: >> The test contains a loop nest with 2 loops. The outer loop is an >> irreducible loop. The safepoint for that loop is also in the inner >> loop. Because `IdealLoopTree::check_safepts()` ignores irreducible >> loops, that safepoint is not marked as being required and is >> eliminated from the inner loop. The inner loop is then optimized out >> and the outer loop becomes an infinite loop with no safepoint (a >> single node loop). That, in turn, causes the loop to be eliminated >> because it has not use and the assert fires. >> >> The fix I propose is to make `IdealLoopTree::check_safepts()` work >> with irreducible loops. I think >> `IdealLoopTree::allpaths_check_safepts()` can be used for that. When >> working on this I wondered if that method could be called with a loop >> whose head has more than 3 inputs. I couldn't write a test case with >> an irreducible loop whose head had more than 3 inputs but I added an >> assert in the method and ran some testing. That assert fired so I also >> propose to tweak the method so it's robust in that case. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8307927 > - test Thanks for the changes @rwestrel , looks good now! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14522#pullrequestreview-1504735398 From roland at openjdk.org Thu Jun 29 07:49:58 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:49:58 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 16 Jun 2023 15:59:40 GMT, Roland Westrelin wrote: > The test contains a loop nest with 2 loops. The outer loop is an > irreducible loop. The safepoint for that loop is also in the inner > loop. Because `IdealLoopTree::check_safepts()` ignores irreducible > loops, that safepoint is not marked as being required and is > eliminated from the inner loop. The inner loop is then optimized out > and the outer loop becomes an infinite loop with no safepoint (a > single node loop). That, in turn, causes the loop to be eliminated > because it has not use and the assert fires. > > The fix I propose is to make `IdealLoopTree::check_safepts()` work > with irreducible loops. I think > `IdealLoopTree::allpaths_check_safepts()` can be used for that. When > working on this I wondered if that method could be called with a loop > whose head has more than 3 inputs. I couldn't write a test case with > an irreducible loop whose head had more than 3 inputs but I added an > assert in the method and ran some testing. That assert fired so I also > propose to tweak the method so it's robust in that case. @eme64 thanks for reviewing this (and making suggestions). Does the updated change look ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14522#issuecomment-1612565143 From roland at openjdk.org Thu Jun 29 07:49:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:49:55 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop [v2] In-Reply-To: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: <6dGgFqEqlnBcJGqRZjuM8jnrkWfgfo_QGkQs0ehUOJ0=.a421e9fb-be9a-49a0-bb81-89fa5d0154ca@github.com> > The test contains a loop nest with 2 loops. The outer loop is an > irreducible loop. The safepoint for that loop is also in the inner > loop. Because `IdealLoopTree::check_safepts()` ignores irreducible > loops, that safepoint is not marked as being required and is > eliminated from the inner loop. The inner loop is then optimized out > and the outer loop becomes an infinite loop with no safepoint (a > single node loop). That, in turn, causes the loop to be eliminated > because it has not use and the assert fires. > > The fix I propose is to make `IdealLoopTree::check_safepts()` work > with irreducible loops. I think > `IdealLoopTree::allpaths_check_safepts()` can be used for that. When > working on this I wondered if that method could be called with a loop > whose head has more than 3 inputs. I couldn't write a test case with > an irreducible loop whose head had more than 3 inputs but I added an > assert in the method and ran some testing. That assert fired so I also > propose to tweak the method so it's robust in that case. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into JDK-8307927 - test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14522/files - new: https://git.openjdk.org/jdk/pull/14522/files/3bb72120..bbf4d581 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14522&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14522&range=00-01 Stats: 65999 lines in 1460 files changed: 44160 ins; 15886 del; 5953 mod Patch: https://git.openjdk.org/jdk/pull/14522.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14522/head:pull/14522 PR: https://git.openjdk.org/jdk/pull/14522 From roland at openjdk.org Thu Jun 29 07:49:59 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:49:59 GMT Subject: RFR: 8307927: C2: "malformed control flow" with irreducible loop [v2] In-Reply-To: References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: On Fri, 23 Jun 2023 06:52:57 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8307927 >> - test > > Looks reasonable to me. All tests passed. > > @eme64 Please have a look as well. @TobiHartmann @eme64 thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14522#issuecomment-1612569853 From roland at openjdk.org Thu Jun 29 07:50:01 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 07:50:01 GMT Subject: Integrated: 8307927: C2: "malformed control flow" with irreducible loop In-Reply-To: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> References: <9LdnPN14vMLLgYozirlIKNz06WqQgjXhwKeg6dDjapA=.6dc88f94-2956-4b93-aa93-3bcb37899ad9@github.com> Message-ID: <8Ey_wRxrH8dl7PR0AF6JkMuG2Dvc8iBj-9wdFrnVjUc=.1bce1f64-358a-4979-9230-919044c13aee@github.com> On Fri, 16 Jun 2023 15:59:40 GMT, Roland Westrelin wrote: > The test contains a loop nest with 2 loops. The outer loop is an > irreducible loop. The safepoint for that loop is also in the inner > loop. Because `IdealLoopTree::check_safepts()` ignores irreducible > loops, that safepoint is not marked as being required and is > eliminated from the inner loop. The inner loop is then optimized out > and the outer loop becomes an infinite loop with no safepoint (a > single node loop). That, in turn, causes the loop to be eliminated > because it has not use and the assert fires. > > The fix I propose is to make `IdealLoopTree::check_safepts()` work > with irreducible loops. I think > `IdealLoopTree::allpaths_check_safepts()` can be used for that. When > working on this I wondered if that method could be called with a loop > whose head has more than 3 inputs. I couldn't write a test case with > an irreducible loop whose head had more than 3 inputs but I added an > assert in the method and ran some testing. That assert fired so I also > propose to tweak the method so it's robust in that case. This pull request has now been integrated. Changeset: 690d6269 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/690d6269954cbacea5f0e64653a49e4fcf986bd9 Stats: 192 lines in 3 files changed: 143 ins; 3 del; 46 mod 8307927: C2: "malformed control flow" with irreducible loop Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/14522 From chagedorn at openjdk.org Thu Jun 29 08:17:55 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jun 2023 08:17:55 GMT Subject: RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: <7SuMbMI6L5aBD3ktTAZhdUZqo0DPuJcC9lFxMhmPFFI=.7392c54b-aab1-4a1d-a79c-05cf66996582@github.com> On Tue, 27 Jun 2023 10:02:37 GMT, Roland Westrelin wrote: > The crash happens after the following steps: > > 1- pre/main/post loops are created with assert predicates above the main loop. > > 2- the main loop is peeled > > 3- as a consequence, the `OpaqueZeroTripGuard` for the main loop is removed > > 4- That allows narrowing of the type of the CastII that was added right > after the zero trip guard during pre/main/post loops creation > > 5- The CastII feeds into a range check CastII for the peeled iteration > that becomes top because the narrowed type of the first CastII > conflicts with the type recorded in the range check CastII. > > 6- The assert predicate that should fold to protect the range check > CastII doesn't because of the fix for JDK-8282592: on assert > predicate updates, the CastII at the zero trip guard is skipped. So > the range check CastII sees the narrowing of the type of the CastII > at the zero trip guard but the assert predicate doesn't. > > The fix I propose is to revert that part of the change from > JDK-8282592 so both the range check CastII and the assert predicate > have the CastII at the zero trip guard as input and observe its type > updates. I went back to that bug and tried to reproduce the failure > again but couldn't. Reverting JDK-8281429 causes the bug to reproduce > again. I tried tweaking the test so the crash reproduces with > JDK-8281429 applied but couldn't. > > This is caused by JDK-8305189 because step 3- happens because of > it. Before JDK-8305189, 3- happened after loop opts are over. I think > what happened then was that a template assertion predicate that was in > the process of having its `OpaqueLoopInit` and `OpaqueLoopStride` > removed constant folded so the crash wouldn't reproduce. Looks reasonable. I will also remove any skipping of cast nodes in my patch for JDK-8288981. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14672#pullrequestreview-1504787172 From pli at openjdk.org Thu Jun 29 09:31:11 2023 From: pli at openjdk.org (Pengfei Li) Date: Thu, 29 Jun 2023 09:31:11 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 15:15:02 GMT, Emanuel Peter wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > I'm in the middle of reviewing, but have to end it here for the week now ? > > For now it's a lot of detail-feedback. I'll give a more overall-feedback once I'm done reading through, and reflecting on it. > > Still: this is good work. We will have to discuss the performance benefits vs the code complexity. And maybe we first need to refactor some things to reduce code duplication. But this looks much better than the previous post-loop vectorization. > > Have a great weekend, > Emanuel Hi @eme64, I guess you have done your first round of review. @fg1417 and I really appreciate all your constructive inputs. By reading your comments, I believe you have reviewed this patch in very detail. Thanks again! What I am doing now: - I'm trying to fix the issues which I think can be fixed immediately. - I'm trying to answer all your simple questions ASAP. For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. We may need some discussion about it. But it's great to know more about your "hybrid vectorizer" plan from your feedback. It looks like a grand plan, and requires significant effort and cooperation. I strongly agree that we need some conversation to discuss where we should move forward and what we can cooperate. Could you give us a moment to digest your idea before we schedule a conversation? BTW: What's your preferred time for a conversation? We are based in Shanghai (GMT+8) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1612703985 From jsjolen at openjdk.org Thu Jun 29 10:50:03 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 29 Jun 2023 10:50:03 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early Message-ID: Hi, please consider this PR. Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. Thanks, Johan ------------- Commit messages: - Do it. Changes: https://git.openjdk.org/jdk/pull/14707/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14707&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311087 Stats: 24 lines in 1 file changed: 4 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14707.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14707/head:pull/14707 PR: https://git.openjdk.org/jdk/pull/14707 From epeter at openjdk.org Thu Jun 29 10:57:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 10:57:05 GMT Subject: RFR: JDK-8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: <8KPkr2loby3RVIrYQBiXWv3Ph2E0saSLVDBMFHi88LQ=.b1ffb28d-54a8-4dcc-9472-e53b055a72ee@github.com> On Thu, 29 Jun 2023 09:28:20 GMT, Pengfei Li wrote: >> I'm in the middle of reviewing, but have to end it here for the week now ? >> >> For now it's a lot of detail-feedback. I'll give a more overall-feedback once I'm done reading through, and reflecting on it. >> >> Still: this is good work. We will have to discuss the performance benefits vs the code complexity. And maybe we first need to refactor some things to reduce code duplication. But this looks much better than the previous post-loop vectorization. >> >> Have a great weekend, >> Emanuel > > Hi @eme64, > > I guess you have done your first round of review. @fg1417 and I really appreciate all your constructive inputs. By reading your comments, I believe you have reviewed this patch in very detail. Thanks again! > > What I am doing now: > > - I'm trying to fix the issues which I think can be fixed immediately. > - I'm trying to answer all your simple questions ASAP. > > For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. We may need some discussion about it. But it's great to know more about your "hybrid vectorizer" plan from your feedback. It looks like a grand plan, and requires significant effort and cooperation. I strongly agree that we need some conversation to discuss where we should move forward and what we can cooperate. Could you give us a moment to digest your idea before we schedule a conversation? > > BTW: What's your preferred time for a conversation? We are based in Shanghai (GMT+8) Hi @pfustc ! I'm grad you appreciate my review. > For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. Are you under some time constraint? No pressure from my side, take the time you need. I would very much love to have a conversation over a video call with you. I think that would be beneficial for all of us. The problem from our side (Oracle) are intellectual property concerns. OpenJDK emails and PR's are all under the Oracle Contributor Agreement. So there I'm free to have conversations. I'm trying to figure out if we can have a similar frame for a video call, sadly it may take a few weeks or months to get that sorted, as many people are on summer vacation. Please take some time to digest the feedback. This is a big change set, it will take a while to be ready for integration at any rate. And again, I would really urge you to consider some refactoring of SuperWord in a separate RFE before this change here. I'm looking forward to more collaboration - over PR comments, emails, and hopefully eventually video calls as well ? Emanuel ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1612877985 From thartmann at openjdk.org Thu Jun 29 11:23:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 29 Jun 2023 11:23:53 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 10:44:45 GMT, Johan Sj?len wrote: > Hi, please consider this PR. > > Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. > > I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. > > Thanks, > Johan Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14707#pullrequestreview-1505073276 From thartmann at openjdk.org Thu Jun 29 11:39:23 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 29 Jun 2023 11:39:23 GMT Subject: [jdk21] RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases Message-ID: Backport of [JDK-8310299](https://bugs.openjdk.java.net/browse/JDK-8310299). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8310299: C2: 8275201 broke constant folding of array store check in some cases Changes: https://git.openjdk.org/jdk21/pull/81/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=81&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310299 Stats: 64 lines in 3 files changed: 63 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/81.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/81/head:pull/81 PR: https://git.openjdk.org/jdk21/pull/81 From chagedorn at openjdk.org Thu Jun 29 12:02:01 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jun 2023 12:02:01 GMT Subject: [jdk21] RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 11:32:13 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310299](https://bugs.openjdk.java.net/browse/JDK-8310299). Applies cleanly. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/81#pullrequestreview-1505140789 From thartmann at openjdk.org Thu Jun 29 12:05:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 29 Jun 2023 12:05:01 GMT Subject: [jdk21] RFR: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 11:32:13 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310299](https://bugs.openjdk.java.net/browse/JDK-8310299). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/81#issuecomment-1613021417 From epeter at openjdk.org Thu Jun 29 12:43:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 12:43:14 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v3] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: WIP for new tests, still fail ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/c04d5164..08d10732 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=01-02 Stats: 98 lines in 3 files changed: 94 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 12:56:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 12:56:06 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v4] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Fixed TestBadFormat.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/08d10732..70b3b53c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=02-03 Stats: 40 lines in 1 file changed: 20 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From thartmann at openjdk.org Thu Jun 29 13:04:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 29 Jun 2023 13:04:08 GMT Subject: [jdk21] Integrated: 8310299: C2: 8275201 broke constant folding of array store check in some cases In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 11:32:13 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310299](https://bugs.openjdk.java.net/browse/JDK-8310299). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 91598a94 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/91598a94f20128e65e31a57273d7952bbf9cae7b Stats: 64 lines in 3 files changed: 63 ins; 0 del; 1 mod 8310299: C2: 8275201 broke constant folding of array store check in some cases Reviewed-by: chagedorn Backport-of: be64d3ac3cf9da2658038d64233f080da8011dc8 ------------- PR: https://git.openjdk.org/jdk21/pull/81 From epeter at openjdk.org Thu Jun 29 13:55:16 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 13:55:16 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v5] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: TestBadFormat.java and TestVectorNode.java are in good state now ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/70b3b53c..0d590e13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=03-04 Stats: 99 lines in 3 files changed: 86 ins; 7 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 13:55:54 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 13:55:54 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v5] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 08:01:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> TestBadFormat.java and TestVectorNode.java are in good state now > > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/IRRule.java line 51: > >> 49: this.ruleId = ruleId; >> 50: this.irAnno = irAnno; >> 51: this.matcher = new MatchableMatcher(new CompilePhaseIRRuleBuilder(irAnno, compilation).build(vmInfo)); > > For consistency, I would move `vmInfo` directly into the `CompilePhaseIRRuleBuilder` constructor. For consistency with what exactly? How would I pass it on/down from the `CompilePhaseIRRuleBuilder` constructor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1246655137 From epeter at openjdk.org Thu Jun 29 13:55:19 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 13:55:19 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 11:10:37 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java line 64: > >> 62: // should not generate vectorized Min/Max nodes for them. >> 63: @Test >> 64: @IR(failOn = {IRNode.MIN_VI, IRNode.MIN_VF, IRNode.MIN_VD}) > > We could think about keeping generic vector nodes that match any type and restrict their usage to `failOn` constraints. I'm not sure it is worth it now. There are very few use cases, this is one of them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1246650871 From epeter at openjdk.org Thu Jun 29 14:38:29 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 14:38:29 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v6] In-Reply-To: References: Message-ID: <5_fRLZZoXBzIxPhEe8R4sT1WNltaTu3oDQJpoYN5UD8=.26f9ca8e-3ede-4c19-a011-97dadc884642@github.com> > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: part 2 of review updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/0d590e13..eeb54a32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=04-05 Stats: 139 lines in 4 files changed: 47 ins; 47 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From chagedorn at openjdk.org Thu Jun 29 14:49:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jun 2023 14:49:02 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 13:50:16 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java line 64: >> >>> 62: // should not generate vectorized Min/Max nodes for them. >>> 63: @Test >>> 64: @IR(failOn = {IRNode.MIN_VI, IRNode.MIN_VF, IRNode.MIN_VD}) >> >> We could think about keeping generic vector nodes that match any type and restrict their usage to `failOn` constraints. > > I'm not sure it is worth it now. There are very few use cases, this is one of them. Okay, if there are only few cases, it's fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1246726774 From roland at openjdk.org Thu Jun 29 14:54:30 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 14:54:30 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - whitespace - reworked change - Merge branch 'master' into JDK-8308869 - more test failures - Merge branch 'master' into JDK-8308869 - whitespaces - test failures - review - 32 bit fix - white spaces - ... and 1 more: https://git.openjdk.org/jdk/compare/591890fc...101399eb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/684f7520..101399eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=05-06 Stats: 10540 lines in 525 files changed: 5401 ins; 2092 del; 3047 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From chagedorn at openjdk.org Thu Jun 29 14:55:04 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 29 Jun 2023 14:55:04 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v6] In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 13:53:31 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/IRRule.java line 51: >> >>> 49: this.ruleId = ruleId; >>> 50: this.irAnno = irAnno; >>> 51: this.matcher = new MatchableMatcher(new CompilePhaseIRRuleBuilder(irAnno, compilation).build(vmInfo)); >> >> For consistency, I would move `vmInfo` directly into the `CompilePhaseIRRuleBuilder` constructor. > > For consistency with what exactly? How would I pass it on/down from the `CompilePhaseIRRuleBuilder` constructor? In the IR framework, I'm usually passing the required information to the constructor of the builder class and then just call `build()`. You could also do the same here since you otherwise need to pass the `vmInfo` around inside the `CompilePhaseIRRuleBuilder` class to the different private methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1246733841 From roland at openjdk.org Thu Jun 29 15:03:00 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 15:03:00 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 29 Jun 2023 14:54:30 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - whitespace > - reworked change > - Merge branch 'master' into JDK-8308869 > - more test failures > - Merge branch 'master' into JDK-8308869 > - whitespaces > - test failures > - review > - 32 bit fix > - white spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/5e3c5c26...101399eb I just pushed a new commit that reworks the change. In principle, it's very similar but rather than attaching profile data to a `SubTypeCheck` node with extra inputs, I added method/bci fields that ca be used to retrieve profile data. To prevent commoning of `SubTypeCheck` nodes with different profile, the `hash` method returns `NO_HASH` if profile data is attached to the node. I also found a case I had missed where not commoning of `SubTypeCheck` nodes can cause an optimization to be missed (split if). I still think it's important to change profile collection to have data that is as accurate as possible. If the change needs to be split, I think profile collection changes should go in first. I also still think that preventing commoning is important so some path doesn't end up with profile data from some other path. I don't see many ways to deal with that issue so I would rather see it go in with the rest of the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1613338016 From roland at openjdk.org Thu Jun 29 15:09:06 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 15:09:06 GMT Subject: RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: <34xlEw-3LlV8u2kuNg6l93oL746t78y-7lFPJQlasuw=.6d09e110-e3ff-43d0-ab23-0497d2df578e@github.com> On Wed, 28 Jun 2023 18:24:31 GMT, Vladimir Kozlov wrote: >> The crash happens after the following steps: >> >> 1- pre/main/post loops are created with assert predicates above the main loop. >> >> 2- the main loop is peeled >> >> 3- as a consequence, the `OpaqueZeroTripGuard` for the main loop is removed >> >> 4- That allows narrowing of the type of the CastII that was added right >> after the zero trip guard during pre/main/post loops creation >> >> 5- The CastII feeds into a range check CastII for the peeled iteration >> that becomes top because the narrowed type of the first CastII >> conflicts with the type recorded in the range check CastII. >> >> 6- The assert predicate that should fold to protect the range check >> CastII doesn't because of the fix for JDK-8282592: on assert >> predicate updates, the CastII at the zero trip guard is skipped. So >> the range check CastII sees the narrowing of the type of the CastII >> at the zero trip guard but the assert predicate doesn't. >> >> The fix I propose is to revert that part of the change from >> JDK-8282592 so both the range check CastII and the assert predicate >> have the CastII at the zero trip guard as input and observe its type >> updates. I went back to that bug and tried to reproduce the failure >> again but couldn't. Reverting JDK-8281429 causes the bug to reproduce >> again. I tried tweaking the test so the crash reproduces with >> JDK-8281429 applied but couldn't. >> >> This is caused by JDK-8305189 because step 3- happens because of >> it. Before JDK-8305189, 3- happened after loop opts are over. I think >> what happened then was that a template assertion predicate that was in >> the process of having its `OpaqueLoopInit` and `OpaqueLoopStride` >> removed constant folded so the crash wouldn't reproduce. > > Looks good. @vnkozlov @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/14672#issuecomment-1613344510 From roland at openjdk.org Thu Jun 29 15:09:08 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 29 Jun 2023 15:09:08 GMT Subject: Integrated: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 10:02:37 GMT, Roland Westrelin wrote: > The crash happens after the following steps: > > 1- pre/main/post loops are created with assert predicates above the main loop. > > 2- the main loop is peeled > > 3- as a consequence, the `OpaqueZeroTripGuard` for the main loop is removed > > 4- That allows narrowing of the type of the CastII that was added right > after the zero trip guard during pre/main/post loops creation > > 5- The CastII feeds into a range check CastII for the peeled iteration > that becomes top because the narrowed type of the first CastII > conflicts with the type recorded in the range check CastII. > > 6- The assert predicate that should fold to protect the range check > CastII doesn't because of the fix for JDK-8282592: on assert > predicate updates, the CastII at the zero trip guard is skipped. So > the range check CastII sees the narrowing of the type of the CastII > at the zero trip guard but the assert predicate doesn't. > > The fix I propose is to revert that part of the change from > JDK-8282592 so both the range check CastII and the assert predicate > have the CastII at the zero trip guard as input and observe its type > updates. I went back to that bug and tried to reproduce the failure > again but couldn't. Reverting JDK-8281429 causes the bug to reproduce > again. I tried tweaking the test so the crash reproduces with > JDK-8281429 applied but couldn't. > > This is caused by JDK-8305189 because step 3- happens because of > it. Before JDK-8305189, 3- happened after loop opts are over. I think > what happened then was that a template assertion predicate that was in > the process of having its `OpaqueLoopInit` and `OpaqueLoopStride` > removed constant folded so the crash wouldn't reproduce. This pull request has now been integrated. Changeset: 26efff75 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/26efff758684b9c5615fb3b087538d713e6eca10 Stats: 64 lines in 2 files changed: 58 ins; 6 del; 0 mod 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14672 From epeter at openjdk.org Thu Jun 29 15:14:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 15:14:23 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v7] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: part 3 review updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/eeb54a32..daa9f2a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=05-06 Stats: 135 lines in 6 files changed: 85 ins; 33 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 15:31:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 15:31:18 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v8] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: part 4 review updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/daa9f2a4..3f1076ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=06-07 Stats: 80 lines in 5 files changed: 62 ins; 16 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 15:50:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 15:50:22 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v9] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: part 5 review updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/3f1076ae..b3a623f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=07-08 Stats: 20 lines in 2 files changed: 5 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 15:59:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 15:59:31 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v10] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: part 6 review update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/b3a623f4..8958d71e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=08-09 Stats: 159 lines in 1 file changed: 8 ins; 0 del; 151 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Thu Jun 29 15:59:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 29 Jun 2023 15:59:31 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 13:05:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > This is a great enhancement! Thanks for working on that. > > I have left some comments in the IR framework code (will also have a look at the test updates later) but here are some more general comments: > > - We should provide some better description when misusing the new features. Example: > > @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min(4)", ">0"}) > @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min()", ">0"}) > > Output: > > - Provided invalid value "_ at min(4)" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min(4)" for IR rule 2 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). > - Provided invalid value "_ at min()" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min()" for IR rule 3 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). > > We could give the user some more information about what's wrong here. You might want to play around with other wrong usages of the new features and check if the format violation is precise enough. You could also add these wrong usages to `TestBadFormat.java`. > - We should have a (sanity) test that explicitely uses `IRNode.VECTOR_SIZE_ANY` and `IRNode.VECTOR_SIZE_MAX`. > - We should also make sure to have some sanity tests for all the different variations that are now possible with the new features (if not already covered by your updated tests). Thanks @chhagedorn , I addressed some of your feedback already. More to come tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1613452657 From jbhateja at openjdk.org Thu Jun 29 17:23:47 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 29 Jun 2023 17:23:47 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v3] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding string overflow range check, NULL to nullptr replacements. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/a6aae353..cde4bf53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From xliu at openjdk.org Thu Jun 29 22:54:11 2023 From: xliu at openjdk.org (Xin Liu) Date: Thu, 29 Jun 2023 22:54:11 GMT Subject: RFR: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation Message-ID: There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. One of them never uses 'phase' in the pattern-matching effort. C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to gain the instance for it. I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does use PhaseValue* phase to get constant nodes, so leave it alone. ------------- Commit messages: - 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation Changes: https://git.openjdk.org/jdk/pull/14719/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14719&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311125 Stats: 36 lines in 11 files changed: 0 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/14719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14719/head:pull/14719 PR: https://git.openjdk.org/jdk/pull/14719 From sviswanathan at openjdk.org Thu Jun 29 22:58:56 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 29 Jun 2023 22:58:56 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v3] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: <1CPsc0Ak2X2vE10y0GK1V37-FEVdSotQ7Eb-RK-Bwq0=.5fa1a134-73c5-432c-a47a-f5c37d662989@github.com> On Thu, 29 Jun 2023 17:23:47 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding string overflow range check, NULL to nullptr replacements. Thanks for looking into this. I have couple of comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 616: > 614: int effective_min_index = start_val->get_con(); > 615: int effective_max_index = start_val->get_con() + step_val->get_con() * (num_elem - 1); > 616: effective_indices_in_range = effective_max_index > effective_min_index && effective_min_index >= -128 && effective_max_index <= 127; effective_max_index >= effective_min_index (step_val could be zero). src/hotspot/share/opto/vectorIntrinsics.cpp line 688: > 686: > 687: // Make the indices greater than lane count as -ve values to match the java side implementation. > 688: res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt)); Is it correct that here we are setting the mask to be true for within range good lane indices. What happens if the index is -ve? The BoolTest:gt would not catch that as it would still be true. We could instead check for equality of indices before and after the AndV at line 688 below. If not equal then value was out of range. May be I am missing something here. ------------- PR Review: https://git.openjdk.org/jdk/pull/14700#pullrequestreview-1506214378 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247230251 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247239408 From xgong at openjdk.org Fri Jun 30 02:32:54 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 Jun 2023 02:32:54 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. Hi there, I'v filed the vm option and cpu feature sync issue here for AArch64: https://bugs.openjdk.org/browse/JDK-8311130, and will address the comment with it. Thanks again for the advice! Hi @eme64 , besides the sync issue, does the change to IR framework make sense to you? Currently, if we use an architecture specific vm options with `applyIf` for an IR check, and run the test on another different architecture, the whole test will fail by throwing exceptions, even if we add the `applyIfCPUFeature` to do the cpu check. The changes in the IR framework can fix this issue. If that part seems fine to you, maybe we can let this PR in first? Since the test failure will noise our internal ci testing. WDYT? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1614028972 From dholmes at openjdk.org Fri Jun 30 06:23:56 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 30 Jun 2023 06:23:56 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v6] In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 01:56:20 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with two additional commits since the last revision: > > - Change guarantee to assert > - Dean's comments If there are no further comments I will integrate on Monday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1614184165 From chagedorn at openjdk.org Fri Jun 30 06:32:55 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jun 2023 06:32:55 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 10:44:45 GMT, Johan Sj?len wrote: > Hi, please consider this PR. > > Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. > > I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. > > Thanks, > Johan Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/opto/cfgnode.cpp line 1944: > 1942: Node* n = in(j); > 1943: > 1944: if (rc == nullptr || !rc->is_Proj()) continue; Maybe you could put braces around the `continue` statements. src/hotspot/share/opto/cfgnode.cpp line 1966: > 1964: delay = true; > 1965: break; > 1966: } Just an idea, how about putting this into a separate method `should_delay()` (or something like that) and replacing `continue` with `return false` and `break` with `return true`? If `should_delay()` is true at some point, we can push `this` to the worklist and return true. But looks good either way. ------------- PR Review: https://git.openjdk.org/jdk/pull/14707#pullrequestreview-1505271452 PR Review Comment: https://git.openjdk.org/jdk/pull/14707#discussion_r1247486085 PR Review Comment: https://git.openjdk.org/jdk/pull/14707#discussion_r1246605085 From chagedorn at openjdk.org Fri Jun 30 06:54:55 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jun 2023 06:54:55 GMT Subject: RFR: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 22:46:57 GMT, Xin Liu wrote: > There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. > One of them never uses 'phase' in the pattern-matching effort. > > C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat > warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to > gain the instance for it. > > I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does > use PhaseValue* phase to get constant nodes, so leave it alone. Otherwise, the cleanup looks good. src/hotspot/share/opto/memnode.cpp line 632: > 630: > 631: if (ac != nullptr && ac->is_clonebasic()) { > 632: AllocateNode* alloc = AllocateNode::Ideal_allocation(ac->in(ArrayCopyNode::Dest)); You can also remove the `phase` parameter of this method since this was the only usage. src/hotspot/share/opto/memnode.cpp line 1716: > 1714: } > 1715: > 1716: AllocateNode* LoadNode::is_new_object_mark_load(PhaseGVN *phase) const { You can also remove the `phase` parameter of this method since there is now no usage left. src/hotspot/share/opto/parse1.cpp line 1014: > 1012: // then barrier introduced by allocation node can be removed. > 1013: if (DoEscapeAnalysis && alloc_with_final()) { > 1014: AllocateNode *alloc = AllocateNode::Ideal_allocation(alloc_with_final()); Suggestion: AllocateNode* alloc = AllocateNode::Ideal_allocation(alloc_with_final()); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14719#pullrequestreview-1506605427 PR Review Comment: https://git.openjdk.org/jdk/pull/14719#discussion_r1247498636 PR Review Comment: https://git.openjdk.org/jdk/pull/14719#discussion_r1247497126 PR Review Comment: https://git.openjdk.org/jdk/pull/14719#discussion_r1247500189 From xgong at openjdk.org Fri Jun 30 06:58:00 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 Jun 2023 06:58:00 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v3] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: On Fri, 30 Jun 2023 06:52:51 GMT, Xiaohong Gong wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding string overflow range check, NULL to nullptr replacements. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 617: > >> 615: int effective_max_index = start_val->get_con() + step_val->get_con() * (num_elem - 1); >> 616: effective_indices_in_range = effective_min_index >= -128 && effective_max_index <= 127; >> 617: } > > May I ask why we need to fall-back to java implementation if the indices are in-effective for constant vals? While if the `start_val` and `step_val` are not all constants, for in-effective-indices, it subs to the lanecount? Is there any difference for constant and variable inputs? An alternative is moving this effective indice checking for constant values in java level. C2 compiler may optimize out it for most cases? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247504088 From xgong at openjdk.org Fri Jun 30 06:58:00 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 Jun 2023 06:58:00 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v3] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: On Thu, 29 Jun 2023 17:23:47 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding string overflow range check, NULL to nullptr replacements. src/hotspot/share/opto/vectorIntrinsics.cpp line 617: > 615: int effective_max_index = start_val->get_con() + step_val->get_con() * (num_elem - 1); > 616: effective_indices_in_range = effective_min_index >= -128 && effective_max_index <= 127; > 617: } May I ask why we need to fall-back to java implementation if the indices are in-effective for constant vals? While if the `start_val` and `step_val` are not all constants, for in-effective-indices, it subs to the lanecount? Is there any difference for constant and variable inputs? src/hotspot/share/opto/vectorIntrinsics.cpp line 626: > 624: } > 625: > 626: bool step_multiply = !step_val->is_con() || !is_power_of_2(step_val->get_con()); Is it better moving this definition above its usage (e.g. line-634)? src/hotspot/share/opto/vectorIntrinsics.cpp line 683: > 681: if(do_wrap) { > 682: // Wrap the indices greater than lane count. > 683: res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt)); Identity Style: please remove one space before `res = ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247502942 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247475844 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247476595 From thartmann at openjdk.org Fri Jun 30 07:15:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 30 Jun 2023 07:15:16 GMT Subject: [jdk21] RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 Message-ID: Backport of [JDK-8309902](https://bugs.openjdk.java.net/browse/JDK-8309902). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 Changes: https://git.openjdk.org/jdk21/pull/83/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=83&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309902 Stats: 64 lines in 2 files changed: 58 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk21/pull/83.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/83/head:pull/83 PR: https://git.openjdk.org/jdk21/pull/83 From epeter at openjdk.org Fri Jun 30 07:35:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 07:35:55 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Fri, 30 Jun 2023 02:30:20 GMT, Xiaohong Gong wrote: >> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> >> This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. >> >> Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > > Hi there, I'v filed the vm option and cpu feature sync issue here for AArch64: https://bugs.openjdk.org/browse/JDK-8311130, and will address the comment with it. Thanks again for the advice! > > Hi @eme64 , besides the sync issue, does the change to IR framework make sense to you? Currently, if we use an architecture specific vm options with `applyIf` for an IR check, and run the test on another different architecture, the whole test will fail by throwing exceptions, even if we add the `applyIfCPUFeature` to do the cpu check. The changes in the IR framework can fix this issue. > > If that part seems fine to you, maybe we can let this PR in first? Since the test failure will noise our internal ci testing. WDYT? Thanks! @XiaohongGong I totally agree with the changes to the IR framework (having `applyIfCPUFeature` before `applyIf`). Otherwise, using both `UseSVE=0` and `sve, false` is a temporary fix that should be reverted after [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). I'm accepting it as a temporary fix only. Who will do the real fix? I was a bit afraid not keeping the CPU feature and the VM flag in sync could also lead to issues in the backend of aarch64. But it does indeed seem that we only use `UseSVE`, and never `VM_Version::supports_sve()`. Still, someone might use them synonymous in the future and expect that they are in sync. Actually, since there are only so few uses of `VM_Version::supports_sve()`, is the risk not very low to just mask off the feature now directly with this fix? That fix does not look so complicated as I feared. What do you think? Anyway, I just launched testing for commit 1: tier1-6 plus stress testing. Will report back on Monday probably. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1614251920 From chagedorn at openjdk.org Fri Jun 30 07:47:56 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 Jun 2023 07:47:56 GMT Subject: [jdk21] RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 07:07:30 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309902](https://bugs.openjdk.java.net/browse/JDK-8309902). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/83#pullrequestreview-1506684746 From thartmann at openjdk.org Fri Jun 30 07:53:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 30 Jun 2023 07:53:04 GMT Subject: [jdk21] RFR: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: <1-u7mfPx8jjdDtok-WBL6ac35T3vGMqpat66OgdB_QE=.7ec91e4e-568e-4e1b-b447-0b6ef9c2017a@github.com> On Fri, 30 Jun 2023 07:07:30 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309902](https://bugs.openjdk.java.net/browse/JDK-8309902). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/83#issuecomment-1614269743 From xgong at openjdk.org Fri Jun 30 08:04:13 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 Jun 2023 08:04:13 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: <3PFvaToBrRaOrXxhyZpR3G7fKQ0OzeYJeUzqJiCxvw0=.88fd1545-f1e2-4f3d-9e1a-cfb66ce3de27@github.com> On Fri, 30 Jun 2023 02:30:20 GMT, Xiaohong Gong wrote: >> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> >> This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. >> >> Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > > Hi there, I'v filed the vm option and cpu feature sync issue here for AArch64: https://bugs.openjdk.org/browse/JDK-8311130, and will address the comment with it. Thanks again for the advice! > > Hi @eme64 , besides the sync issue, does the change to IR framework make sense to you? Currently, if we use an architecture specific vm options with `applyIf` for an IR check, and run the test on another different architecture, the whole test will fail by throwing exceptions, even if we add the `applyIfCPUFeature` to do the cpu check. The changes in the IR framework can fix this issue. > > If that part seems fine to you, maybe we can let this PR in first? Since the test failure will noise our internal ci testing. WDYT? Thanks! > @XiaohongGong I totally agree with the changes to the IR framework (having `applyIfCPUFeature` before `applyIf`). Thanks a lot! > Otherwise, using both `UseSVE=0` and `sve, false` is a temporary fix that should be reverted after [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). I'm accepting it as a temporary fix only. Who will do the real fix? We (Arm) will do the real fix. `UseSVE=0` is needed when `sve, true`, which only affects this test now. And yes, I can revert these IR checks once the real fix is in. > I was a bit afraid not keeping the CPU feature and the VM flag in sync could also lead to issues in the backend of aarch64. But it does indeed seem that we only use `UseSVE`, and never `VM_Version::supports_sve()`. Still, someone might use them synonymous in the future and expect that they are in sync. Agree, although we only use `UseSVE` in backend now. > Actually, since there are only so few uses of `VM_Version::supports_sve()`, is the risk not very low to just mask off the feature now directly with this fix? That fix does not look so complicated as I feared. What do you think? I prefer fixing that in a separate patch. One reason is syncing the vm options and cpu features is a refactory to AArch64 backend for me. It has other relative cpu features specific to different SVE systems besides `sve`. For example, the `svebitperm` which exists after sve2. We have to take a consideration for them as well. Besides, although the changes is not so big, we have to do more testing to make sure no regressions are involved. And besides the `UseSVE`, do you think it's necessary to sync other options as well? > Anyway, I just launched testing for commit 1: tier1-6 plus stress testing. Will report back on Monday probably. Thanks for doing this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1614282767 From lucy at openjdk.org Fri Jun 30 08:20:53 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 30 Jun 2023 08:20:53 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 06:05:12 GMT, sid8606 wrote: > Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. > > Ran tier1 test cases passing with release, fastdebug and slowdebug. Looks good to me. Thanks for fixing. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14647#pullrequestreview-1506736239 From epeter at openjdk.org Fri Jun 30 08:28:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 08:28:11 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v11] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Move examples to IRExample.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/8958d71e..6c8d7979 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=09-10 Stats: 293 lines in 4 files changed: 144 ins; 140 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Fri Jun 30 08:36:12 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 08:36:12 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 11:23:47 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 249: > >> 247: public static final String ADD_REDUCTION_VD = PREFIX + "ADD_REDUCTION_VD" + POSTFIX; >> 248: static { >> 249: beforeMatchingNameRegex(ADD_REDUCTION_VD, "AddReductionVD"); > > Shouldn't add reduction nodes only be created in Superword? Actually, the Vector API can also generate these nodes, and then they would already exist at parsing. I wanted to make these nodes vectorNodes as well, but they do not have a vector output, only a vector input. Hence I cannot match their size. We currently do not have any reduction IR tests for the Vector API. I tracked it in [JDK-8310523](https://bugs.openjdk.org/browse/JDK-8310523). Actually, the reduction nodes are the only `superWordNodes`. I think we can eventually remove `superWordNodes` completely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1247594897 From shade at openjdk.org Fri Jun 30 08:41:57 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 30 Jun 2023 08:41:57 GMT Subject: RFR: 8309976: Add microbenchmark for stressing code cache [v4] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 17:59:13 GMT, Eric Caspole wrote: >> Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. >> This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. >> The defaults are set very low by default and the intent is that they would be customized for any given study. > > Eric Caspole has updated the pull request incrementally with one additional commit since the last revision: > > Cleanups from Alekseys suggestions Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14521#pullrequestreview-1506770612 From epeter at openjdk.org Fri Jun 30 08:47:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 08:47:18 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v12] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VI, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VL, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VD, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allowed for `floats` (u... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use superWordNodes for reductions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/6c8d7979..b8e0f491 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=10-11 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Fri Jun 30 08:47:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 08:47:18 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 08:32:42 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 249: >> >>> 247: public static final String ADD_REDUCTION_VD = PREFIX + "ADD_REDUCTION_VD" + POSTFIX; >>> 248: static { >>> 249: beforeMatchingNameRegex(ADD_REDUCTION_VD, "AddReductionVD"); >> >> Shouldn't add reduction nodes only be created in Superword? > > Actually, the Vector API can also generate these nodes, and then they would already exist at parsing. I wanted to make these nodes vectorNodes as well, but they do not have a vector output, only a vector input. Hence I cannot match their size. > We currently do not have any reduction IR tests for the Vector API. I tracked it in [JDK-8310523](https://bugs.openjdk.org/browse/JDK-8310523). > > Actually, the reduction nodes are the only `superWordNodes`. I think we can eventually remove `superWordNodes` completely. Still, I reverted them to `superWordNodes` for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1247602749 From duke at openjdk.org Fri Jun 30 09:23:04 2023 From: duke at openjdk.org (sid8606) Date: Fri, 30 Jun 2023 09:23:04 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 08:18:22 GMT, Lutz Schmidt wrote: >> Missing return statement after calling jump_to_native_invoker mrthod in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. >> >> Ran tier1 test cases passing with release, fastdebug and slowdebug. > > Looks good to me. > Thanks for fixing. Thank you fro review @RealLucy. Do I need a one more review to integrate or it's trivial change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14647#issuecomment-1614378724 From epeter at openjdk.org Fri Jun 30 09:29:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 09:29:05 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 12:35:16 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2250: > >> 2248: int s = 0; >> 2249: try { >> 2250: s = Integer.parseInt(sizes[i]); > > We should also check if the size is a reasonable number (i.e. a positive multiple of 2 and maybe an upper limit(?)) and report a format violation if that is not the case as for example in: > > @IR(counts = {IRNode.ADD_VI, IRNode.VECTOR_SIZE + "3,-2", "1"}) I'm reporting an error on negative numbers now. But I'm not sure I want to limit it to power of 2 or a size limit. Especially when uses flags/tags this could not always hold. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1247656570 From mdoerr at openjdk.org Fri Jun 30 09:44:58 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 30 Jun 2023 09:44:58 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v6] In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 01:56:20 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with two additional commits since the last revision: > > - Change guarantee to assert > - Dean's comments Thanks for adding the assertions. I think they would be good to have for all platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1614406114 From epeter at openjdk.org Fri Jun 30 09:52:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 09:52:24 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v13] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 56 commits: - Merge branch 'master' into JDK-8310308 - whitespace fix - more refactoring for review - use superWordNodes for reductions - Move examples to IRExample.java - part 6 review update - part 5 review updates - part 4 review updates - part 3 review updates - part 2 of review updates - ... and 46 more: https://git.openjdk.org/jdk/compare/c08c9831...08ab854e ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=12 Stats: 3317 lines in 65 files changed: 1254 ins; 21 del; 2042 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Fri Jun 30 09:55:01 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 09:55:01 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 13:05:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> revert to ANY for TestAutoVectorization2DArray.java > > This is a great enhancement! Thanks for working on that. > > I have left some comments in the IR framework code (will also have a look at the test updates later) but here are some more general comments: > > - We should provide some better description when misusing the new features. Example: > > @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min(4)", ">0"}) > @IR(counts = {IRNode.LOAD_VL, IRNode.VECTOR_SIZE + "min()", ">0"}) > > Output: > > - Provided invalid value "_ at min(4)" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min(4)" for IR rule 2 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). > - Provided invalid value "_ at min()" after comparator "=", node IRNode.LOAD_VL, in count string "_ at min()" for IR rule 3 at private static long compiler.loopopts.superword.TestGeneralizedReductions.testReductionOnPartiallyUnrolledLoopWithSwappedInputs(long[]). > > We could give the user some more information about what's wrong here. You might want to play around with other wrong usages of the new features and check if the format violation is precise enough. You could also add these wrong usages to `TestBadFormat.java`. > - We should have a (sanity) test that explicitely uses `IRNode.VECTOR_SIZE_ANY` and `IRNode.VECTOR_SIZE_MAX`. > - We should also make sure to have some sanity tests for all the different variations that are now possible with the new features (if not already covered by your updated tests). @chhagedorn I went through all your review suggestions. The changes are pushed. Rerunning testing... ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1614418498 From mbaesken at openjdk.org Fri Jun 30 10:02:01 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 30 Jun 2023 10:02:01 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v2] In-Reply-To: References: Message-ID: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust comment in src/java.sql/share/classes/java/sql/DriverManager.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14593/files - new: https://git.openjdk.org/jdk/pull/14593/files/5d52b4cb..6665f60b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14593/head:pull/14593 PR: https://git.openjdk.org/jdk/pull/14593 From mbaesken at openjdk.org Fri Jun 30 10:02:03 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 30 Jun 2023 10:02:03 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 13:22:20 GMT, Matthias Baesken wrote: >>> Hi Alan, regarding usage of class VM I get 'package jdk.internal.misc is declared in module java.base, which does not export it to module java.sql' Is there any concern to export it as well to module java.sql ? And btw did you mean to use it like this, in the if ? >>> >>> `if (callerCL == null || VM.isSystemDomainLoader(callerCL)) { callerCL = Thread.currentThread().getContextClassLoader(); }` >> >> It was just a passing comment, I didn't meant to suggest changing it as part of this PR. We have always think twice before adding qualified exports from java.base and this is case where java.sql is very "non-core", we don't want to give it any access to java.base internals. > > Hi Alan, thanks for clarifying. So I should only adjust the comment, correct ? Hi Alan, I adjusted the comment in DriverManager.java . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1247686721 From epeter at openjdk.org Fri Jun 30 10:11:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 10:11:06 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 01:25:55 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > May I have a second review for this patch, please? @merykitty I'd love to review it sometime soon, I hope I get the time - though I'm soon off on vacation. One question I have already: We only have tests for very few constant divisors. That makes me a bit nervous. It is hard to get these sorts of mathematical things implemented correctly. And reviewing is equally difficult. It would be nice to have more divisor constants. The best would be if we could randomly generate them. Can we maybe generate methods randomly with the jasm api, and throw in random divisor constants? What do you think? And what do you suggest I read to understand the mathematical background, ie for determining the correct constants? ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1614436754 From epeter at openjdk.org Fri Jun 30 10:13:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 10:13:57 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: <4E2woMGK2WlZK_1ytghzjghD97D021FVFW-tqAM1LRo=.00bedd25-e13b-4368-b985-957377bcd669@github.com> On Thu, 22 Jun 2023 19:36:48 GMT, Vladimir Ivanov wrote: >> @eme64 Yes that was my mistake, that node requires AVX512VL so `vlRegF` and `regF` are the same. >> >>> Is there a way to stress-test the registers? >> >> Can we randomise the allocated register during register allocation? >> >> Thanks. > >> Is there a way to stress-test the registers? > > As an idea for such a stress test mode, is it possible to make `regF`/`vlRegF`, `regD`/`vlRegD` (and `vec`/`legVec` family of register classes) disjoint sets (`xmm0-xmm15` and `xmm16-xmm31`)? It should be enough to trigger relevant asserts whenever an AD instruction is used. @iwanowww @merykitty @sviswa7 I think the best would really be randomization. But I have no expertise in the register allocation. Maybe we can just disable a random subset of registers? That would also stress spilling, which sometimes has lead to issues before. Anyway. I think this fix is ok regardless. Can I get a second review, please? ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1614440755 From epeter at openjdk.org Fri Jun 30 10:21:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 30 Jun 2023 10:21:22 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v14] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix merged tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/08ab854e..af21a9f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=12-13 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From thartmann at openjdk.org Fri Jun 30 10:29:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 30 Jun 2023 10:29:08 GMT Subject: [jdk21] Integrated: 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 In-Reply-To: References: Message-ID: <_F5wZOnVqVJLRkkLdREyzf_yvUT_hHwBQjT009bp1Ao=.9c768d9f-0c6d-4b35-9071-851a796723d3@github.com> On Fri, 30 Jun 2023 07:07:30 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309902](https://bugs.openjdk.java.net/browse/JDK-8309902). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 3210d320 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/3210d32088abf0e1e27d9fcfdd3d6beebf309136 Stats: 64 lines in 2 files changed: 58 ins; 6 del; 0 mod 8309902: C2: assert(false) failed: Bad graph detected in build_loop_late after JDK-8305189 Reviewed-by: chagedorn Backport-of: 26efff758684b9c5615fb3b087538d713e6eca10 ------------- PR: https://git.openjdk.org/jdk21/pull/83 From jbhateja at openjdk.org Fri Jun 30 10:32:59 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jun 2023 10:32:59 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v4] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions and test modifications. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/cde4bf53..f8a5189a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=02-03 Stats: 121 lines in 2 files changed: 54 ins; 22 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From jbhateja at openjdk.org Fri Jun 30 10:50:01 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jun 2023 10:50:01 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/f8a5189a..1276f73d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=03-04 Stats: 4 lines in 1 file changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From jbhateja at openjdk.org Fri Jun 30 10:50:03 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jun 2023 10:50:03 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: On Fri, 30 Jun 2023 06:54:27 GMT, Xiaohong Gong wrote: > An alternative is moving this effective indice checking for constant values in java level. C2 compiler may optimize out it for most cases? I wanted to just fix incorrectness issue with this PR given that we already have another JBS (JDK-8310691) for shuffle related overhaul in progress. > May I ask why we need to fall-back to java implementation if the indices are in-effective for constant vals? While if the `start_val` and `step_val` are not all constants, for in-effective-indices, it subs to the lanecount? Is there any difference for constant and variable inputs? Inline expander operate over byte vectors currently, this check ensures that we do not overflow the byte value range while computing effective index for unwrapped argument, otherwise subsequent comparison mask may be incorrectly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247727090 From jbhateja at openjdk.org Fri Jun 30 10:50:06 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jun 2023 10:50:06 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v3] In-Reply-To: <1CPsc0Ak2X2vE10y0GK1V37-FEVdSotQ7Eb-RK-Bwq0=.5fa1a134-73c5-432c-a47a-f5c37d662989@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <1CPsc0Ak2X2vE10y0GK1V37-FEVdSotQ7Eb-RK-Bwq0=.5fa1a134-73c5-432c-a47a-f5c37d662989@github.com> Message-ID: On Thu, 29 Jun 2023 22:48:55 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding string overflow range check, NULL to nullptr replacements. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 688: > >> 686: >> 687: // Make the indices greater than lane count as -ve values to match the java side implementation. >> 688: res = gvn().transform(VectorNode::make(Op_AndV, res, bcast_mod, vt)); > > Is it correct that here we are setting the mask to be true for within range good lane indices. > What happens if the index is -ve? The BoolTest:gt would not catch that as it would still be true. > We could instead check for equality of indices before and after the AndV at line 688 below. If not equal then value was out of range. > May be I am missing something here. Correct, comparison predicate should be UGT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1247727033 From mbaesken at openjdk.org Fri Jun 30 11:37:10 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 30 Jun 2023 11:37:10 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v3] In-Reply-To: References: Message-ID: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: remove import ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14593/files - new: https://git.openjdk.org/jdk/pull/14593/files/6665f60b..9b2232a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14593/head:pull/14593 PR: https://git.openjdk.org/jdk/pull/14593 From jbhateja at openjdk.org Fri Jun 30 12:19:58 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 Jun 2023 12:19:58 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. Fix looks good to me. dynamic register class "float_reg_vl" should be able to pick correct register allocation set for vex-encoded instruction on KNL. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14379#pullrequestreview-1507086696 From roland at openjdk.org Fri Jun 30 13:32:13 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 30 Jun 2023 13:32:13 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 Message-ID: A long chain of nodes are sunk out of a loop. Every time a node is moved out of the loop, a cast is created to pin the node out of the loop. When its input is next sunk, the cast is removed (the cast is replaced by its input) and a new cast is created. Some nodes on the chain have 2 other nodes in the chain as uses. When such a node is sunk, 2 cast nodes are created, one for each use. So as the compiler moves forward in the chain, the number of cast to remove grows. From some profiling, removing those casts is what takes a lot of time. The fix I propose is, when a node is processed, to check whether a cast at the out of loop control was already created for that node and to reuse it. The test case takes 6 minutes when I run it without the fix and 3 seconds with it. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/14732/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14732&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308103 Stats: 69 lines in 2 files changed: 68 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14732/head:pull/14732 PR: https://git.openjdk.org/jdk/pull/14732 From ecaspole at openjdk.org Fri Jun 30 15:06:04 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Fri, 30 Jun 2023 15:06:04 GMT Subject: Integrated: 8309976: Add microbenchmark for stressing code cache In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 14:54:46 GMT, Eric Caspole wrote: > Most benchmarks have a relatively small code footprint compared to enterprise applications. While trying to model an application with a very large code footprint, we developed this JMH with its own classloader generating the desired number of classes from the string literal in the file, using the existing InMemoryJavaCompiler. Then these classes are are instantiated to the desired count, and methods are called in those objects, which can fill up the code cache, possibly causing code cache sweeping or compiler shut-off. > This allows to create a simulation of a large application with arbitrary java heap and code cache footprint, and take advantage of the benefits of JMH at the same time. > The defaults are set very low by default and the intent is that they would be customized for any given study. This pull request has now been integrated. Changeset: 430d6b61 Author: Eric Caspole URL: https://git.openjdk.org/jdk/commit/430d6b61c5d2d85be2c62af0c927c18531ff7cc3 Stats: 444 lines in 1 file changed: 444 ins; 0 del; 0 mod 8309976: Add microbenchmark for stressing code cache Reviewed-by: redestad, shade ------------- PR: https://git.openjdk.org/jdk/pull/14521 From kvn at openjdk.org Fri Jun 30 17:30:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 Jun 2023 17:30:57 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. src/hotspot/share/opto/loopopts.cpp line 1704: > 1702: cast = prev; > 1703: } else { > 1704: register_new_node(cast, x_ctrl); Can you move creation of `cast` here so you don't need to destroy it in case of previous cast existance? Or it is possible that `ConstraintCastNode::make_cast_for_type() can return `null`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1248114592 From sviswanathan at openjdk.org Fri Jun 30 17:53:53 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 30 Jun 2023 17:53:53 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> Message-ID: On Fri, 30 Jun 2023 10:50:01 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14700#pullrequestreview-1507629394 From xliu at openjdk.org Fri Jun 30 19:08:54 2023 From: xliu at openjdk.org (Xin Liu) Date: Fri, 30 Jun 2023 19:08:54 GMT Subject: RFR: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 06:49:13 GMT, Christian Hagedorn wrote: >> There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. >> One of them never uses 'phase' in the pattern-matching effort. >> >> C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat >> warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to >> gain the instance for it. >> >> I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does >> use PhaseValue* phase to get constant nodes, so leave it alone. > > src/hotspot/share/opto/parse1.cpp line 1014: > >> 1012: // then barrier introduced by allocation node can be removed. >> 1013: if (DoEscapeAnalysis && alloc_with_final()) { >> 1014: AllocateNode *alloc = AllocateNode::Ideal_allocation(alloc_with_final()); > > Suggestion: > > AllocateNode* alloc = AllocateNode::Ideal_allocation(alloc_with_final()); Thanks. I will clean up those places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14719#discussion_r1248202834 From xliu at openjdk.org Fri Jun 30 19:43:06 2023 From: xliu at openjdk.org (Xin Liu) Date: Fri, 30 Jun 2023 19:43:06 GMT Subject: RFR: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation [v2] In-Reply-To: References: Message-ID: > There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. > One of them never uses 'phase' in the pattern-matching effort. > > C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat > warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to > gain the instance for it. > > I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does > use PhaseValue* phase to get constant nodes, so leave it alone. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Clean up useless 'phase' parameter more. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14719/files - new: https://git.openjdk.org/jdk/pull/14719/files/1b9dd182..8afbde7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14719&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14719&range=00-01 Stats: 8 lines in 3 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14719/head:pull/14719 PR: https://git.openjdk.org/jdk/pull/14719 From kvn at openjdk.org Fri Jun 30 22:32:53 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 Jun 2023 22:32:53 GMT Subject: RFR: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 19:43:06 GMT, Xin Liu wrote: >> There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. >> One of them never uses 'phase' in the pattern-matching effort. >> >> C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat >> warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to >> gain the instance for it. >> >> I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does >> use PhaseValue* phase to get constant nodes, so leave it alone. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Clean up useless 'phase' parameter more. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14719#pullrequestreview-1507948930 From vlivanov at openjdk.org Fri Jun 30 23:08:58 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 30 Jun 2023 23:08:58 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 29 Jun 2023 14:54:30 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - whitespace > - reworked change > - Merge branch 'master' into JDK-8308869 > - more test failures > - Merge branch 'master' into JDK-8308869 > - whitespaces > - test failures > - review > - 32 bit fix > - white spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/03b7aff0...101399eb Thanks, Roland. IR shape looks much better now. > I also still think that preventing commoning is important so some path doesn't end up with profile data from some other path. I took a closer look at the relevant code (in particular, I forgot that`PhaseMacroExpand::expand_subtypecheck_node` creates a dedicated copy for each user) and now agree with you that commoning between unrelated paths is undesirable. Moreover, I'm in favor of completely disabling sharing for `SubTypeCheck` node. Considering `IfNode::search_identical()` handles `SubTypeCheck` case now, I don't see much value in special handling for nodes without associated bytecode location info. > I still think it's important to change profile collection to have data that is as accurate as possible. If the change needs to be split, I think profile collection changes should go in first. What's the plan if we agree on adjusting profile collection? Should all the platforms be updated all at once? If not, how is it intended to work during transition period? test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 44: > 42: flags.add("-XX:TypeProfileSubTypeCheckCommonThreshold=90"); > 43: if (!Platform.is32bit()) { > 44: flags.add("-XX:-UseCompressedClassPointers"); What's the purpose of `-XX:-UseCompressedClassPointers` on 64-bit platforms? Make it easier to match the IR? ------------- PR Review: https://git.openjdk.org/jdk/pull/14375#pullrequestreview-1507603807 PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1248116482 From vlivanov at openjdk.org Fri Jun 30 23:14:53 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 30 Jun 2023 23:14:53 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> References: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> Message-ID: On Thu, 29 Jun 2023 07:38:15 GMT, Roland Westrelin wrote: >> The crash occurs because at split if during IGVN, a `SubTypeCheck` is >> created with null as input. That happens because the control path the >> `SubTypeCheck` is cloned for is dead. To fix that I propose delaying >> split if until dead paths are collapsed. >> >> I added an assert to check a nullable first input to `SubTypeCheck` >> nodes (which should be impossible because it should be null >> checked). When I ran testing, a number of cases showed up with known >> non null values non properly marked as non null. I fixed them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review (Bug summary looks much clearer now, thank.) The fix looks good. Test results are clean. (Tobias, thanks for submitting the patch for testing.) ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1507965690