From duke at openjdk.org Mon Apr 1 09:21:35 2024 From: duke at openjdk.org (SUN Guoyun) Date: Mon, 1 Apr 2024 09:21:35 GMT Subject: RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 09:32:29 GMT, Emanuel Peter wrote: >> SUN Guoyun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode >> - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode > > A possible counter-example: > > > x1 = something > y1 = someCall > > for (int i = 0; i < a.length; i++) { > a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y) > } > > The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong. > > What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch. > Why only (x+1)+y and not also x+(y+1)? I agree with @eme64 about breaking other optimizations if we don't move constants to the right. (x+1)+(y+1) no longer becomes (x+y)+2. Previously, I didn't have a deep understanding of constant folding, but now it seems that this PR will cause performance regression. > To reduce spills, I would think we would want to move Calls to the left. (x+1)+y and x+(y+1) both become y+x+1. This is a good idea. But my idea is, can we delete add1 after `add1->clone()` ? https://github.com/openjdk/jdk/pull/18482/files#diff-c59303cb42c3e35f20bc530628dc611003e21819e84cedb2279e69cde0345410L183 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2029463266 From vkempik at openjdk.org Mon Apr 1 10:10:33 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 1 Apr 2024 10:10:33 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v10] In-Reply-To: <8FMdThon_sV2dJ6P5AnRzvQ8eXqfEumA2sX2TY4Z_GA=.eb31062e-8bcf-45cf-9769-8080df213ff6@github.com> References: <8FMdThon_sV2dJ6P5AnRzvQ8eXqfEumA2sX2TY4Z_GA=.eb31062e-8bcf-45cf-9769-8080df213ff6@github.com> Message-ID: On Sun, 31 Mar 2024 15:47:51 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: > > Use srliw to clear upper bits for 'lower' cases Marked as reviewed by vkempik (Committer). The results looks good now on VF2. Considering Zba is mandatory for RVA22U64 this can be accepted now. Can some reviewer take a look at this again please ? ------------- PR Review: https://git.openjdk.org/jdk/pull/17046#pullrequestreview-1971020346 PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2029521954 From duke at openjdk.org Mon Apr 1 17:13:31 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 1 Apr 2024 17:13:31 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: <_AtIZaBX5Ucg63kHMkaAzeQt8jIxp6VXPIfw4U8GJCE=.9dc3e07e-844f-4803-a50b-ad7745d85174@github.com> On Thu, 28 Mar 2024 00:45:33 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix L2F cvtsi2ssq Could I get one more review please? Hello Tobias (@TobiHartmann) and Vladimir (@vnkozlov), could you please look into this small PR? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2030175883 From dcubed at openjdk.org Mon Apr 1 22:19:06 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 1 Apr 2024 22:19:06 GMT Subject: RFR: 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 Message-ID: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> Trivial fixes to ProblemList noisy tests: [JDK-8329425](https://bugs.openjdk.org/browse/JDK-8329425) ProblemList containers/docker/TestJFREvents.java on linux-x64 [JDK-8329426](https://bugs.openjdk.org/browse/JDK-8329426) ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 [JDK-8329427](https://bugs.openjdk.org/browse/JDK-8329427) ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 [JDK-8329428](https://bugs.openjdk.org/browse/JDK-8329428) ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp ------------- Commit messages: - 8329428: ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp - 8329427: ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 - 8329426: ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 - 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 Changes: https://git.openjdk.org/jdk/pull/18568/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18568&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329425 Stats: 5 lines in 3 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18568/head:pull/18568 PR: https://git.openjdk.org/jdk/pull/18568 From dholmes at openjdk.org Mon Apr 1 22:19:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 1 Apr 2024 22:19:07 GMT Subject: RFR: 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 In-Reply-To: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> References: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> Message-ID: On Mon, 1 Apr 2024 21:07:34 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList noisy tests: > > [JDK-8329425](https://bugs.openjdk.org/browse/JDK-8329425) ProblemList containers/docker/TestJFREvents.java on linux-x64 > [JDK-8329426](https://bugs.openjdk.org/browse/JDK-8329426) ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 > [JDK-8329427](https://bugs.openjdk.org/browse/JDK-8329427) ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 > [JDK-8329428](https://bugs.openjdk.org/browse/JDK-8329428) ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp Seems okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18568#pullrequestreview-1972236395 From dcubed at openjdk.org Mon Apr 1 22:25:07 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 1 Apr 2024 22:25:07 GMT Subject: Integrated: 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 In-Reply-To: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> References: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> Message-ID: <9d0JDigfqBMaHdGS17bfqOTt0bCI4k474KSe4GcfXdU=.cc7c5d99-e1b7-417c-9bfd-a6142a571800@github.com> On Mon, 1 Apr 2024 21:07:34 GMT, Daniel D. Daugherty wrote: > Trivial fixes to ProblemList noisy tests: > > [JDK-8329425](https://bugs.openjdk.org/browse/JDK-8329425) ProblemList containers/docker/TestJFREvents.java on linux-x64 > [JDK-8329426](https://bugs.openjdk.org/browse/JDK-8329426) ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 > [JDK-8329427](https://bugs.openjdk.org/browse/JDK-8329427) ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 > [JDK-8329428](https://bugs.openjdk.org/browse/JDK-8329428) ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp This pull request has now been integrated. Changeset: c2979c15 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/c2979c150bdbcc2a9e6026347dc590e6a7e86595 Stats: 5 lines in 3 files changed: 4 ins; 0 del; 1 mod 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 8329426: ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 8329427: ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 8329428: ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/18568 From dcubed at openjdk.org Mon Apr 1 22:25:06 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 1 Apr 2024 22:25:06 GMT Subject: RFR: 8329425: ProblemList containers/docker/TestJFREvents.java on linux-x64 In-Reply-To: References: <4IV494kBFp8o93ikOFJEWQMsXtWXfh4EsThZ8fSIsKE=.c6d109e4-d1a4-4140-9ad7-96e6cbaa6eb1@github.com> Message-ID: On Mon, 1 Apr 2024 22:15:06 GMT, David Holmes wrote: >> Trivial fixes to ProblemList noisy tests: >> >> [JDK-8329425](https://bugs.openjdk.org/browse/JDK-8329425) ProblemList containers/docker/TestJFREvents.java on linux-x64 >> [JDK-8329426](https://bugs.openjdk.org/browse/JDK-8329426) ProblemList vmTestbase/nsk/jvmti/scenarios/capability/CM03/cm03t001/TestDescription.java with Xcomp on windows-x64 >> [JDK-8329427](https://bugs.openjdk.org/browse/JDK-8329427) ProblemList javax/sound/sampled/Clip/ClipFlushCrash.java on linux-x64 >> [JDK-8329428](https://bugs.openjdk.org/browse/JDK-8329428) ProblemList vmTestbase/nsk/stress/thread/thread006.java on linux-all in Xcomp > > Seems okay. Thanks @dholmes-ora - Thanks for the lightning fast review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18568#issuecomment-2030672631 From duke at openjdk.org Mon Apr 1 23:24:12 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 1 Apr 2024 23:24:12 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v4] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Use MemBarStoreStore in replace_string_concat ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/3f03f31e..c4702953 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=02-03 Stats: 19 lines in 2 files changed: 14 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Tue Apr 2 00:43:15 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 2 Apr 2024 00:43:15 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v5] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Add requires tag for IRIW* architectures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/c4702953..e836a1a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Tue Apr 2 00:43:15 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 2 Apr 2024 00:43:15 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v3] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 17:34:08 GMT, Vladimir Kozlov wrote: > What about `MemBarRelease` in `PhaseStringOpts::replace_string_concat()`? Missed this. Added a commit for this ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2030858517 From duke at openjdk.org Tue Apr 2 01:07:04 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 2 Apr 2024 01:07:04 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers Message-ID: Add instruction encoding support for Intel APX extended general-purpose registers: Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. ------------- Commit messages: - fix 32-bit build prefix functions - - add UseAPX x86 global - add signature for 32-bit build - - inlcude previous WREX2 bug fix - Merge branch 'master' into apx-encoding-pr - Instruction encoding support for APX extended GPRs -- initial commit Changes: https://git.openjdk.org/jdk/pull/18476/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328998 Stats: 859 lines in 4 files changed: 424 ins; 36 del; 399 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From jbhateja at openjdk.org Tue Apr 2 01:07:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 2 Apr 2024 01:07:04 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Hi @steveatgh , Please use newly created JBS entry [JDK-8328998](https://bugs.openjdk.org/browse/JDK-8328998) for this draft PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2019556517 From duke at openjdk.org Tue Apr 2 05:24:09 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 2 Apr 2024 05:24:09 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v6] In-Reply-To: References: Message-ID: <6K5Fpo1ev00bK62rlfQIjp91AmZYX3AwYk8J8R07HN0=.a401de7a-f515-40f2-b9c3-003d9eaffddc@github.com> > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: - Add micro benchmark courtesy of @shipilev - More comprehensive IR tests based on @shipilev's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/e836a1a7..656f52c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=04-05 Stats: 494 lines in 2 files changed: 461 ins; 1 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From epeter at openjdk.org Tue Apr 2 06:04:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 06:04:59 GMT Subject: RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2] In-Reply-To: References: Message-ID: On Mon, 1 Apr 2024 09:19:28 GMT, SUN Guoyun wrote: >> A possible counter-example: >> >> >> x1 = something >> y1 = someCall >> >> for (int i = 0; i < a.length; i++) { >> a[i] = (x + 1) + y) + ((x + 2) + y) + ((x + 2) + y) + ((x + 3) + y) + ((x + 4) + y) >> } >> >> The call is outside the loop, so folding would not be costly at all. And I fear that the 4 terms would not common up, and so be slower after your change. And I think there are probably other examples. But I have not benchmarked anything, so I could be quite wrong. >> >> What exactly is it that gives you the speedup in your benchmark? Spilling? Fewer add instructions? Would be nice to understand that better, and see what are potential examples where we would have regressions with your patch. > >> Why only (x+1)+y and not also x+(y+1)? I agree with @eme64 about breaking other optimizations if we don't move constants to the right. (x+1)+(y+1) no longer becomes (x+y)+2. > > Previously, I didn't have a deep understanding of constant folding, but now it seems that this PR will cause performance regression. > >> To reduce spills, I would think we would want to move Calls to the left. (x+1)+y and x+(y+1) both become y+x+1. > > This is a good idea. But my idea is, can we delete add1 after `add1->clone()` ? https://github.com/openjdk/jdk/pull/18482/files#diff-c59303cb42c3e35f20bc530628dc611003e21819e84cedb2279e69cde0345410L183 @sunny868 @dean-long I also like the idea of moving calls to the left. Though even that may not always be an improvement. It would be nice to have a series of benchmarks with all sorts of combinations of calls inside the loop, or outside the loop, adding together with variables inside the loop, or invariants from outside the loop. What we would like is that: 1. everything loop invariant is added up outside the loop, and only loop variant things are added inside the loop. 2. Constants should always go to the right. 3. And calls should go to the left. These 3 ideas seem to contradict each other at times, so I wonder if there is a more general heuristic here? Examples for contradiction (`invar`: loop invariant, `var`: loop variant, changes in each loop iteration): 1. `invar + var + con`: `(invar + con) + var` would take everything loop invariant out of the loop, but the constant would not be on the very right. 2. `invar1 + invar2 + call`: assume the call is inside the loop, but `invar1` and `invar2` are before/outside the loop. Moving the `call` to the very left `(call + invar1) + invar2` would mean we have to do more addition work inside the loop (2 additions per iteration), but `(invar1 + invar2) + call` would have the invariants added up before the loop, and only the call value added to that in every loop iteration. I'm not sure how significant all of this is. You really would need some benchmarks to get more insight. And writing a good benchmark is not easy. @sunny868 You will have to change the title anyway. Can you write `C2` instead of `[c2]`, please? ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2031139128 PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2031140331 From epeter at openjdk.org Tue Apr 2 06:13:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 06:13:14 GMT Subject: RFR: 8325252: C2 SuperWord: refactor the packset [v6] In-Reply-To: <2s2FgHKlsbPidpGkoCG8nqSjih3zfk_JGLNOBlqc_zo=.e4f405ce-f34c-4990-aaaa-d9ec4c1210a0@github.com> References: <2s2FgHKlsbPidpGkoCG8nqSjih3zfk_JGLNOBlqc_zo=.e4f405ce-f34c-4990-aaaa-d9ec4c1210a0@github.com> Message-ID: On Wed, 27 Mar 2024 16:08:36 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more updates for Christian, batch 2 > > Thanks for doing all the updates. It looks good to me now! Thanks @chhagedorn for the detailed review, thanks also @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18276#issuecomment-2031148405 From epeter at openjdk.org Tue Apr 2 06:13:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 06:13:14 GMT Subject: Integrated: 8325252: C2 SuperWord: refactor the packset In-Reply-To: References: Message-ID: On Wed, 13 Mar 2024 14:25:57 GMT, Emanuel Peter wrote: > I'm refactoring the packset, separating the details of packset-manupulation from the SuperWord algorithm. > > Most importantly: I split it into two classes: `PairSet` and `PackSet`. > `combine_pairs_to_longer_packs` converts the first into the second. > > I was able to simplify the combining, and remove the pack-sorting. > I now walk "pair-chains" directly with `PairSetIterator`. One such pair-chain is equivalent to a pack. > > I moved all the `filter / split` functionality to the `PackSet`, which allows hiding a lot of packset-manipulation from the SuperWord algorithm. > > I ran into some issues when I was extending the pairset in `extend_pairset_with_more_pairs_by_following_use_and_def`: > Using the PairSetIterator changed the order of extension, and that messed with the packing heuristic, and quite a few examples did not vectorize, because we would pack up the wrong 2 nodes out of a choice of 4 (e.g. we would pack `ac bd` instead of `ab cd`). Hence, I now still have to keep the insertion order for the pairs, and this basically means we are extending with a BFS order. Maybe this issue can be removed, if I improve the packing heuristic with some look-ahead expansion approach (but that is for another day [JDK-8309908](https://bugs.openjdk.org/browse/JDK-8309908)). > > But since I already spent some time on some of the packing heuristic (reordering and cost estimate), I did a light refactoring, and added extra tests for MulAddS2I. > > More details are described in the annotations in the code. This pull request has now been integrated. Changeset: 5cddc2de Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/5cddc2de493d9d8712e4bee3aed4f1a0d4c228c3 Stats: 1197 lines in 6 files changed: 549 ins; 338 del; 310 mod 8325252: C2 SuperWord: refactor the packset Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18276 From epeter at openjdk.org Tue Apr 2 06:28:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 06:28:18 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: > **Problem** > In [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) / https://git.openjdk.org/jdk/pull/14785 I fixed the alignment with `AlignVector`. For that, I had to compute `abs(scale)` and `abs(stride)`, as well as `scale * stride`. The issue is that all of these values can overflow the int range (e.g. `abs(min_int) = min_int`). > > We hit asserts like: > > `# assert(is_power_of_2(value)) failed: value must be a power of 2: 0xffffffff80000000` > Happens because we take `abs(min_int)`, which is `min_int = 0x80000000`, and assuming this was a positive (unsigned) number is a power of 2 `2^31`. We then expand it to `long`, get `0xffffffff80000000`, which is not a power of 2 anymore. This violates the implicit assumptions, and we hit the assert. > > `# assert(q >= 1) failed: modulo value must be large enough` > We have `scale = 2^30` and `stride = 4 = 2^2`. For the alignment calculation we compute `scale * stride = 2^32`, which overflows the int range and becomes zero. > > Before [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) we could get similar issues with the (old) code in `SuperWord::ref_is_alignable`, if `AlignVector` is enabled: > > > int span = preloop_stride * p.scale_in_bytes(); > ... > if (vw % span == 0) { > > > if `span == 0` because of overflow, then the `idiv` from the modulo gets a division by zero -> `SIGFPE`. > > But it seems the bug is possibly a regression from JDK20 b2 [JDK-8286197](https://bugs.openjdk.org/browse/JDK-8286197). Here we enabled certaint Unsafe memory access address patterns, and it is such patterns that the reproducer requires. > > **Solution** > I could either patch up all the code that works with `scale` and `stride`, and make sure no overflows ever happen. But that is quite involved and error prone. > > I now just disable vectorization for large `scale` and `stride`. This should not have any performance impact, because such large `scale` and `stride` would lead to highly inefficient memory accesses, since they are spaced very far apart. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'JDK-8328938-abs-min-int-assert' of https://github.com/eme64/jdk into JDK-8328938-abs-min-int-assert - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Merge branch 'master' into JDK-8328938-abs-min-int-assert - improve comments - 8328938 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18485/files - new: https://git.openjdk.org/jdk/pull/18485/files/9f8ed495..6f60fdf5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18485&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18485&range=02-03 Stats: 369085 lines in 3154 files changed: 23903 ins; 19414 del; 325768 mod Patch: https://git.openjdk.org/jdk/pull/18485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18485/head:pull/18485 PR: https://git.openjdk.org/jdk/pull/18485 From duke at openjdk.org Tue Apr 2 06:40:59 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 2 Apr 2024 06:40:59 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 15:52:37 GMT, Aleksey Shipilev wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > I propose we also add this benchmark that verifies barrier costs and coalescing: [ConstructorBarriers.txt](https://github.com/openjdk/jdk/files/14775850/ConstructorBarriers.txt). Maybe these also should be the IR tests. The benchmarks show that most combinations with `final`-s improve, and scalar replaced objects also still work (and probably eliminate all the barriers). > > On my Graviton 3 instance: > > > Benchmark Mode Cnt Score Error Units > > # Before > ConstructorBarriers.escaping_finalFinal avgt 9 9.097 ? 0.032 ns/op > ConstructorBarriers.escaping_finalPlain avgt 9 9.120 ? 0.101 ns/op > ConstructorBarriers.escaping_finalVolatile avgt 9 11.590 ? 0.088 ns/op > ConstructorBarriers.escaping_plainFinal avgt 9 9.113 ? 0.037 ns/op > ConstructorBarriers.escaping_plainPlain avgt 9 7.627 ? 0.155 ns/op > ConstructorBarriers.escaping_plainVolatile avgt 9 13.055 ? 0.180 ns/op > ConstructorBarriers.escaping_volatileFinal avgt 9 10.650 ? 0.112 ns/op > ConstructorBarriers.escaping_volatilePlain avgt 9 13.074 ? 0.156 ns/op > ConstructorBarriers.escaping_volatileVolatile avgt 9 13.546 ? 0.100 ns/op > > ConstructorBarriers.non_escaping_finalFinal avgt 9 2.220 ? 0.006 ns/op > ConstructorBarriers.non_escaping_finalPlain avgt 9 2.214 ? 0.014 ns/op > ConstructorBarriers.non_escaping_finalVolatile avgt 9 2.232 ? 0.035 ns/op > ConstructorBarriers.non_escaping_plainFinal avgt 9 2.222 ? 0.004 ns/op > ConstructorBarriers.non_escaping_plainPlain avgt 9 2.234 ? 0.036 ns/op > ConstructorBarriers.non_escaping_plainVolatile avgt 9 2.230 ? 0.019 ns/op > ConstructorBarriers.non_escaping_volatileFinal avgt 9 2.232 ? 0.018 ns/op > ConstructorBarriers.non_escaping_volatilePlain avgt 9 2.220 ? 0.033 ns/op > ConstructorBarriers.non_escaping_volatileVolatile avgt 9 2.232 ? 0.019 ns/op > > # After > ConstructorBarriers.escaping_finalFinal avgt 9 5.939 ? 0.035 ns/op ; improves > ConstructorBarriers.escaping_finalPlain avgt 9 5.945 ? 0.033 ns/op ; improves > ConstructorBarriers.escaping_finalVolatile avgt 9 10.997 ? 0.050 ns/op ; improves > ConstructorBarriers.escaping_plainFinal avgt 9 5.923 ? 0.061 ns/op ; improves > ConstructorBarriers.escaping_plainPlain avgt 9 7.687 ? 0.101 ns/op > ConstructorB... Benchmark results on my graviton instances see similar improvements to @shipilev 's Before: Benchmark Mode Cnt Score Error Units ConstructorBarriers.escaping_finalFinal avgt 3 9.229 ? 1.101 ns/op ConstructorBarriers.escaping_finalPlain avgt 3 9.150 ? 0.191 ns/op ConstructorBarriers.escaping_finalVolatile avgt 3 11.542 ? 1.259 ns/op ConstructorBarriers.escaping_plainFinal avgt 3 9.132 ? 0.261 ns/op ConstructorBarriers.escaping_plainPlain avgt 3 7.610 ? 0.575 ns/op ConstructorBarriers.escaping_plainVolatile avgt 3 13.024 ? 0.460 ns/op ConstructorBarriers.escaping_volatileFinal avgt 3 10.697 ? 1.567 ns/op ConstructorBarriers.escaping_volatilePlain avgt 3 13.156 ? 0.593 ns/op ConstructorBarriers.escaping_volatileVolatile avgt 3 13.707 ? 0.742 ns/op ConstructorBarriers.non_escaping_finalFinal avgt 3 2.218 ? 0.299 ns/op ConstructorBarriers.non_escaping_finalPlain avgt 3 2.243 ? 0.124 ns/op ConstructorBarriers.non_escaping_finalVolatile avgt 3 2.227 ? 0.032 ns/op ConstructorBarriers.non_escaping_plainFinal avgt 3 2.226 ? 0.208 ns/op ConstructorBarriers.non_escaping_plainPlain avgt 3 2.229 ? 0.112 ns/op ConstructorBarriers.non_escaping_plainVolatile avgt 3 2.239 ? 0.400 ns/op ConstructorBarriers.non_escaping_volatileFinal avgt 3 2.255 ? 0.259 ns/op ConstructorBarriers.non_escaping_volatilePlain avgt 3 2.206 ? 0.098 ns/op ConstructorBarriers.non_escaping_volatileVolatile avgt 3 2.203 ? 0.099 ns/op After: Benchmark Mode Cnt Score Error Units ConstructorBarriers.escaping_finalFinal avgt 3 5.919 ? 0.787 ns/op ConstructorBarriers.escaping_finalPlain avgt 3 5.949 ? 0.117 ns/op ConstructorBarriers.escaping_finalVolatile avgt 3 10.947 ? 1.353 ns/op ConstructorBarriers.escaping_plainFinal avgt 3 5.897 ? 0.039 ns/op ConstructorBarriers.escaping_plainPlain avgt 3 7.737 ? 3.529 ns/op ConstructorBarriers.escaping_plainVolatile avgt 3 13.182 ? 0.289 ns/op ConstructorBarriers.escaping_volatileFinal avgt 3 10.951 ? 0.535 ns/op ConstructorBarriers.escaping_volatilePlain avgt 3 13.086 ? 0.258 ns/op ConstructorBarriers.escaping_volatileVolatile avgt 3 13.765 ? 2.114 ns/op ConstructorBarriers.non_escaping_finalFinal avgt 3 2.234 ? 0.064 ns/op ConstructorBarriers.non_escaping_finalPlain avgt 3 2.226 ? 0.298 ns/op ConstructorBarriers.non_escaping_finalVolatile avgt 3 2.212 ? 0.085 ns/op ConstructorBarriers.non_escaping_plainFinal avgt 3 2.214 ? 0.033 ns/op ConstructorBarriers.non_escaping_plainPlain avgt 3 2.226 ? 0.114 ns/op ConstructorBarriers.non_escaping_plainVolatile avgt 3 2.220 ? 0.042 ns/op ConstructorBarriers.non_escaping_volatileFinal avgt 3 2.244 ? 0.146 ns/op ConstructorBarriers.non_escaping_volatilePlain avgt 3 2.235 ? 0.083 ns/op ConstructorBarriers.non_escaping_volatileVolatile avgt 3 2.230 ? 0.056 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2031186738 From jzhu at openjdk.org Tue Apr 2 08:03:00 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Tue, 2 Apr 2024 08:03:00 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub considers the length of live registers when spilling registers In-Reply-To: References: Message-ID: On Tue, 5 Mar 2024 16:52:02 GMT, Stuart Monteith wrote: >> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. >> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, >> even the use of a floating point may cause the maximum 2048 bits stack occupied. >> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. >> >> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 >> >> >> ...... >> 0x0000ffff684cfad8: stp x15, x18, [sp, #80] >> 0x0000ffff684cfadc: sub sp, sp, #0x100 >> 0x0000ffff684cfae0: str z16, [sp] >> 0x0000ffff684cfae4: add x1, x13, #0x10 >> 0x0000ffff684cfae8: mov x0, x16 >> ;; 0xFFFF803F5414 >> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 >> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 >> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfaf8: blr x8 >> 0x0000ffff684cfafc: mov x16, x0 >> 0x0000ffff684cfb00: ldr z16, [sp] >> 0x0000ffff684cfb04: add sp, sp, #0x100 >> 0x0000ffff684cfb08: ptrue p7.b >> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] >> ...... >> >> >> could be optimized into: >> >> >> ...... >> 0x0000ffff684cfa50: stp x15, x18, [sp, #80] >> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() >> 0x0000ffff684cfa58: add x1, x13, #0x10 >> 0x0000ffff684cfa5c: mov x0, x16 >> ;; 0xFFFF7FA942A8 >> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 >> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 >> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfa6c: blr x8 >> 0x0000ffff684cfa70: mov x16, x0 >> 0x0000ffff684cfa74: ldr d16, [sp], #16 >> 0x0000ffff684cfa78: ptrue p7.b >> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] >> ...... >> >> >> Besides the above benefit, when we know what size of register is live, >> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. >> >> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. > > Thanks, that helps - I can see you're saving/restoring the correct register lengths. Would it be possible to generate a testcase to test that registers are being saved/restored correctly? > > The following is a testcase that is an example of where this testing is done, although in this PR's case it isn't subroutines, but load/store barriers: > > https://github.com/openjdk/jdk/commit/4cd318756d4a8de64d25fb6512ecba9a008edfa1#diff-949a4a2f889be36be47e9b02b6d6cd1247768953b95a024f649878bac721fa04 @stooart-mon Thanks for your review. Please let me know if you have any other comments. @fisk I would appreciate it if you could share your comments on this change since it follows your previous work done for x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17977#issuecomment-2031330044 From dlong at openjdk.org Tue Apr 2 08:36:08 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 Apr 2024 08:36:08 GMT Subject: RFR: 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode [v2] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 08:45:55 GMT, SUN Guoyun wrote: >> This patch prohibits the conversion from "(x+1)+y" into "(x+y)+1" when y is a CallNode to reduce unnecessary spillcode and ADDNode. >> >> Testing: tier1-3 in x86_64 and LoongArch64 >> >> JMH in x86_64: >>
>> before:
>> Benchmark           Mode  Cnt      Score   Error  Units
>> CallNode.test      thrpt    2  26397.733          ops/s
>> 
>> after:
>> Benchmark           Mode  Cnt      Score   Error  Units
>> CallNode.test      thrpt    2  27839.337          ops/s
>> 
> > SUN Guoyun has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode > - 8328865: [c2] No need to convert "(x+1)+y" into "(x+y)+1" when y is a CallNode In general, we can't move a call left or right past something with side-effects or memory effects, right? This seems like the wrong place to be moving a Call, except relative to easy values like constants. For loop invariants, it seems like they could be grouped together on either side, but don't loop optimizations take care of that already? And it seems like a comprehensive cost analysis would need to take into account the cost of a spill vs an add. If we can rematerialize a value using (reg + reg) instead of a load from memory, that seems like a win, but for more complicated expressions that can be done outside the loop, rematerializing from memory could be best. So does this change really belong in AddNode::Ideal()? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18482#issuecomment-2031398326 From shade at openjdk.org Tue Apr 2 08:52:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 2 Apr 2024 08:52:03 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v6] In-Reply-To: <6K5Fpo1ev00bK62rlfQIjp91AmZYX3AwYk8J8R07HN0=.a401de7a-f515-40f2-b9c3-003d9eaffddc@github.com> References: <6K5Fpo1ev00bK62rlfQIjp91AmZYX3AwYk8J8R07HN0=.a401de7a-f515-40f2-b9c3-003d9eaffddc@github.com> Message-ID: On Tue, 2 Apr 2024 05:24:09 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Add micro benchmark courtesy of @shipilev > - More comprehensive IR tests based on @shipilev's suggestions This looks nearly complete. I think for sanity reasons we still want the diagnostic flag that could be used to restore old behavior for in-field diagnostics. This is especially important since we are touching the optimizer code. Turning the flag off should restore all behavior, including the optimizer paths. Something like `UseStoreStoreForInit`? test/hotspot/jtreg/compiler/c2/irTests/ConstructorBarriers.java line 244: > 242: @IR(failOn = IRNode.MEMBAR_RELEASE) > 243: @IR(failOn = IRNode.MEMBAR_STORESTORE) > 244: @IR(failOn = IRNode.MEMBAR_RELEASE) Here and later: Looks weird to test for `MEMBAR_RELEASE` twice. Also, should it be just `failOn = IRNode.MEMBAR`? ------------- PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-1973049174 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1547414178 From duke at openjdk.org Tue Apr 2 12:40:00 2024 From: duke at openjdk.org (Swati Sharma) Date: Tue, 2 Apr 2024 12:40:00 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. In-Reply-To: <_0V4aLv23eyNBgwgzFThGCfXPQw6jTZa2me6ZnF6I_g=.83cd138c-fc3c-4b01-9ccd-10ff7f4bf5d7@github.com> References: <_0V4aLv23eyNBgwgzFThGCfXPQw6jTZa2me6ZnF6I_g=.83cd138c-fc3c-4b01-9ccd-10ff7f4bf5d7@github.com> Message-ID: <-8n85gXA1M50-rWMOn9dl7aD3pvV7p8CqPkyzx1hVIg=.da123f14-3701-4f8e-9e16-8d5ebf561e86@github.com> On Wed, 27 Mar 2024 17:02:06 GMT, Emanuel Peter wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjointLarge.java line 29: > >> 27: /** >> 28: * @test >> 29: * @bug 8310159 > > Suggestion: > > * @bug 8326421 > > Was there a reason for the other bug number? I think usually we use the bug number of the issue where the test is added. I might be wrong. @eme64 This test is to cover the functionality correctness of the issue JDK-8310159, If you suggest this should cover a general scenario I can add the bug number of the issue itself i.e 8326421. Please let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17962#discussion_r1547798819 From epeter at openjdk.org Tue Apr 2 13:11:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 13:11:09 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer Message-ID: This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. There are now only a few cases where we cannot use the cached `VPointer`: - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. **Benchmarking SuperWord Compile Time** I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. On master: C2 Compile Time: 56.816 s IdealLoop: 56.604 s AutoVectorize: 56.192 s With this patch: C2 Compile Time: 49.719 s IdealLoop: 49.509 s AutoVectorize: 49.106 s This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. ------------- Commit messages: - use caching - 8326962 Changes: https://git.openjdk.org/jdk/pull/18577/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326962 Stats: 168 lines in 5 files changed: 122 ins; 13 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Tue Apr 2 13:11:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 13:11:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 09:04:45 GMT, Emanuel Peter wrote: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. src/hotspot/share/opto/superword.cpp line 600: > 598: MemNode* s2 = memops.at(j)->as_Mem(); > 599: if (isomorphic(s1, s2)) { > 600: const VPointer& p2 = get_pointer(s2); Note: a classic example of a quadratic loop, where we compare "all-to-all" memops, thus parse the pointer subgraph repeatedly. src/hotspot/share/opto/vectorization.cpp line 194: > 192: > 193: uint bytes = number_of_pointers * sizeof(VPointer); > 194: _pointers = (VPointer*)_arena->Amalloc(bytes); Note: I wish I could use `GrowableArray` here. But I have a `StackObj` that is `NONCOPYABLE`. I thus have to directly construct the `VPointer` into the array, and cannot construct it outside and pass it in. Someday, I hope that `GrowableArray` allows appending with the move-constructor, or something similar. For now: I simply allocate my own memory, and use the placement-new to construct the `VPointer`s directly into that memory. src/hotspot/share/opto/vectorization.cpp line 268: > 266: if (n1->is_Load() && n2->is_Load()) { continue; } > 267: > 268: const VPointer& p2 = _pointers.get(n2); Note: another quadratic loop where we repeatedly parse the pointers. src/hotspot/share/opto/vectorization.cpp line 788: > 786: tty->print_cr(" + scale(%4d) * iv]", _scale); > 787: } > 788: #endif Note: improve printing a bit for `POINTERS` tag of `TraceAutoVectorization`. src/hotspot/share/opto/vectorization.cpp line 1496: > 1494: } > 1495: } > 1496: Note: moved it up so we can use it anywhere in `vectorization.cpp`. src/hotspot/share/opto/vectorization.hpp line 726: > 724: > 725: // Comparable? > 726: bool invar_equals(const VPointer& q) const { Note: had to make some things `const` here, so that I can pass around `const VPointer&`, which I get from `_pointers.get(n)` / `get_pointer(n)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547524530 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547529998 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547530691 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547531986 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547532679 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547534538 From jsjolen at openjdk.org Tue Apr 2 14:00:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 Apr 2024 14:00:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 09:04:45 GMT, Emanuel Peter wrote: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Hi Emanuel, I've some general questions regarding naming and Arena usage, I hope you don't mind some runtime team input. src/hotspot/share/opto/superword.hpp line 499: > 497: > 498: // VLoopDependencyGraph Accessors > 499: const VPointer& get_pointer(const MemNode* mem) const { This can't just be called `vpointer` or `vpointer_of`? src/hotspot/share/opto/vectorization.hpp line 458: > 456: class VLoopPointers : public StackObj { > 457: private: > 458: Arena* _arena; Will the pointer ever change? Could potentially change this to a reference. src/hotspot/share/opto/vectorization.hpp line 483: > 481: > 482: void compute_and_cache(); > 483: const VPointer& get(const MemNode* mem) const; Questioning naming of this also :-). ------------- PR Review: https://git.openjdk.org/jdk/pull/18577#pullrequestreview-1973893348 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547931080 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547937229 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547934364 From jsjolen at openjdk.org Tue Apr 2 14:00:11 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 Apr 2024 14:00:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 13:52:51 GMT, Johan Sj?len wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > src/hotspot/share/opto/vectorization.hpp line 458: > >> 456: class VLoopPointers : public StackObj { >> 457: private: >> 458: Arena* _arena; > > Will the pointer ever change? Could potentially change this to a reference. Is it important for this to be Arena-allocated? Seems to me like `compute_and_cache` will only be computed once per `VLoopPointers` instance, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547944838 From chagedorn at openjdk.org Tue Apr 2 14:11:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Apr 2024 14:11:12 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: <6Z1BHUXNN5Tl2C6UZTz3VcHSw1sbgVVO6BF4PXSGkws=.a66e1faf-dc2f-4917-87cd-836eaee7d769@github.com> On Tue, 2 Apr 2024 13:50:07 GMT, Johan Sj?len wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > src/hotspot/share/opto/superword.hpp line 499: > >> 497: >> 498: // VLoopDependencyGraph Accessors >> 499: const VPointer& get_pointer(const MemNode* mem) const { > > This can't just be called `vpointer` or `vpointer_of`? +1 for `vpointer()` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547968034 From chagedorn at openjdk.org Tue Apr 2 14:11:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 2 Apr 2024 14:11:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 09:04:45 GMT, Emanuel Peter wrote: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. That's a nice improvement and it makes sense to just compute them once and re-use them. I only have a few comments but generally looks good! src/hotspot/share/opto/superword.hpp line 498: > 496: } > 497: > 498: // VLoopDependencyGraph Accessors Suggestion: // VLoopDependencyGraph accessors src/hotspot/share/opto/vectorization.cpp line 184: > 182: } > 183: > 184: void VLoopPointers::compute_and_cache() { Could be split into something like: allocate_pointer_memory(); initialize_pointers(); trace_pointers(); where allocate_pointer_memory(): number_of_pointers = compute_number_of_pointers(); uint bytes = number_of_pointers * sizeof(VPointer); _pointers = (VPointer*)_arena->Amalloc(bytes); src/hotspot/share/opto/vectorization.cpp line 214: > 212: int bb_idx = _body.bb_idx(mem); > 213: int pointers_idx = _bb_idx_to_pointer.at(bb_idx); > 214: assert(pointers_idx >= 0, "mem node must have a cached pointer"); Should we also assert here that `pointers_idx` is within the array range? You could cache the length of the `_pointers` array when you allocate/initialize it above in `compute_and_cache()`. src/hotspot/share/opto/vectorization.cpp line 224: > 222: for (int i = 0; i < _body.body().length(); i++) { > 223: MemNode* mem = _body.body().at(i)->isa_Mem(); > 224: if (mem != nullptr && _vloop.in_bb(mem)) { I see that you use this pattern twice. Maybe we could provide a "for_each_mem(lambda)` in `VLoopBody`? But could also be done separately. src/hotspot/share/opto/vectorization.hpp line 456: > 454: // Submodule of VLoopAnalyzer. > 455: // We compute and cache the VPointer for every load and store. > 456: class VLoopPointers : public StackObj { Nit: Should we call this `VLoopVPointers` to make the link to `VPointers` and not just some pointers? src/hotspot/share/opto/vectorization.hpp line 462: > 460: const VLoopBody& _body; > 461: > 462: // Array of cached pointers Maybe make a note that we allocate/cache them lazily upon request. ------------- PR Review: https://git.openjdk.org/jdk/pull/18577#pullrequestreview-1973894052 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547931473 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547939073 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547946451 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547952871 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547958314 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1547960511 From rehn at openjdk.org Tue Apr 2 14:34:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 2 Apr 2024 14:34:10 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 06:58:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > remove swap file How would you add jni.h ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2032202423 From fyang at openjdk.org Tue Apr 2 14:37:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Apr 2024 14:37:12 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v10] In-Reply-To: <8FMdThon_sV2dJ6P5AnRzvQ8eXqfEumA2sX2TY4Z_GA=.eb31062e-8bcf-45cf-9769-8080df213ff6@github.com> References: <8FMdThon_sV2dJ6P5AnRzvQ8eXqfEumA2sX2TY4Z_GA=.eb31062e-8bcf-45cf-9769-8080df213ff6@github.com> Message-ID: <-SrC51R6lQEUtmcf4Z0OhAf24-WfhN4w968PBAtg7iE=.d0dda166-61e3-44ab-b9f6-129eae227dc5@github.com> On Sun, 31 Mar 2024 15:47:51 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: > > Use srliw to clear upper bits for 'lower' cases I am having another look. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1365: > 1363: shadd(tmp1, tmp1, table0, tmp1, 2); > 1364: lwu(tmp2, Address(tmp1)); > 1365: xorr(crc, crc, tmp2); I witnessed slightly better JMH numbers on Lichee-PI-4A with the following sequence: if (upper) srli(v, v, 32); xorr(v, v, crc); andi(tmp1, v, right_8_bits); shadd(tmp1, tmp1, table3, tmp2, 2); lwu(crc, Address(tmp1)); srli(tmp1, v, 6); andi(tmp1, tmp1, (right_8_bits << 2)); add(tmp1, tmp1, table2); lwu(tmp2, Address(tmp1)); srli(tmp1, v, 14); andi(tmp1, tmp1, (right_8_bits << 2)); add(tmp1, tmp1, table1); xorr(crc, crc, tmp2); lwu(tmp2, Address(tmp1)); srliw(tmp1, v, 24); shadd(tmp1, tmp1, table0, tmp1, 2); xorr(crc, crc, tmp2); lwu(tmp2, Address(tmp1)); xorr(crc, crc, tmp2); src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 60: > 58: > 59: /** > 60: * crc_table[] from jdk/src/share/native/java/util/zip/zlib-1.2.5/crc32.h I think the correct path should be: `jdk/src/java.base/share/native/libzip/zlib/crc32.h` ------------- PR Review: https://git.openjdk.org/jdk/pull/17046#pullrequestreview-1974015466 PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548013663 PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548007465 From duke at openjdk.org Tue Apr 2 15:14:08 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 2 Apr 2024 15:14:08 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> References: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> Message-ID: On Mon, 25 Mar 2024 06:19:42 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/2330e980...32cb9c0d > > Code looks good, running testing now... Ping me again in 2 days if I don't report back by then ;) @eme64 how does testing look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-2032320856 From epeter at openjdk.org Tue Apr 2 15:29:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:29:03 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: References: Message-ID: On Fri, 22 Mar 2024 18:48:56 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge branch 'master' into licm > - @run driver -> @run main > - Add tests for add/sub reassociation > - Merge branch 'master' into licm > - Make inputs deterministic. Make size an arg. Fix comments. Formatting. > - Update test to utilize @setup method for arguments > - Merge branch 'master' into licm > - Add correctness test for some random tests with random inputs > - Add some correctness tests where we do reassociate > - Remove unused TestInfo parameter. Have some tests exit mid-loop. > - ... and 7 more: https://git.openjdk.org/jdk/compare/eb586168...32cb9c0d Testing is clean. Thanks for the work, it now looks good to me :) (Still, I'd like another Reviewer to approve it.) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1974189351 From epeter at openjdk.org Tue Apr 2 15:45:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:45:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: <6Z1BHUXNN5Tl2C6UZTz3VcHSw1sbgVVO6BF4PXSGkws=.a66e1faf-dc2f-4917-87cd-836eaee7d769@github.com> References: <6Z1BHUXNN5Tl2C6UZTz3VcHSw1sbgVVO6BF4PXSGkws=.a66e1faf-dc2f-4917-87cd-836eaee7d769@github.com> Message-ID: On Tue, 2 Apr 2024 14:07:45 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/superword.hpp line 499: >> >>> 497: >>> 498: // VLoopDependencyGraph Accessors >>> 499: const VPointer& get_pointer(const MemNode* mem) const { >> >> This can't just be called `vpointer` or `vpointer_of`? > > +1 for `vpointer()` Good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548130482 From epeter at openjdk.org Tue Apr 2 15:45:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:45:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 15:38:49 GMT, Emanuel Peter wrote: >> Is it important for this to be Arena-allocated? Seems to me like `compute_and_cache` will only be computed once per `VLoopPointers` instance, right? > > We can discuss if Arena-allocated is the right thing to do. But for now it is what I did with all other submodules of `VLoopAnalyzer`, so if we were to change this, then I can do that in a separate RFE. > > What alternative would you prefer, and why? > > I like Arena-allocation, because I have a clear location and life-time for my allocations. I can close the arena after all AutoVectorization, and I know that the data is valid up to that point, and then it gets deallocated. > > CHeap allocation would require me to be more smart and careful about deallocation. > > Resouce allocation in my experience often is problematic if you have different life-times for things. I like Resource-allocation only for temporary data structures, not data that is used across a large algorithm with dozens of subalgorithms. > > Let me know what you think ;) > Will the pointer ever change? Could potentially change this to a reference. I could make it a reference. But data structures like `GrowableArray` take a `Arena*`. So then I have to use `*` and `&` all the time. I don't like that, it makes the code much more "noisy". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548130142 From epeter at openjdk.org Tue Apr 2 15:45:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:45:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 13:59:46 GMT, Christian Hagedorn wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > src/hotspot/share/opto/vectorization.cpp line 224: > >> 222: for (int i = 0; i < _body.body().length(); i++) { >> 223: MemNode* mem = _body.body().at(i)->isa_Mem(); >> 224: if (mem != nullptr && _vloop.in_bb(mem)) { > > I see that you use this pattern twice. Maybe we could provide a "for_each_mem(lambda)" in `VLoopBody`? But could also be done separately. I was considering it. I can do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548132492 From epeter at openjdk.org Tue Apr 2 15:45:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:45:12 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: <5A9Cv5PMOdPBlrISZ_yBj7WCo9NWnDmJvgHT_fc-V9o=.6a202ac5-4774-4cd0-8d38-9f8c4b689755@github.com> On Tue, 2 Apr 2024 13:51:47 GMT, Johan Sj?len wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > src/hotspot/share/opto/vectorization.hpp line 483: > >> 481: >> 482: void compute_and_cache(); >> 483: const VPointer& get(const MemNode* mem) const; > > Questioning naming of this also :-). Sounds good, will rename it to `vpointer`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548130792 From epeter at openjdk.org Tue Apr 2 15:45:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:45:11 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 13:56:31 GMT, Johan Sj?len wrote: >> src/hotspot/share/opto/vectorization.hpp line 458: >> >>> 456: class VLoopPointers : public StackObj { >>> 457: private: >>> 458: Arena* _arena; >> >> Will the pointer ever change? Could potentially change this to a reference. > > Is it important for this to be Arena-allocated? Seems to me like `compute_and_cache` will only be computed once per `VLoopPointers` instance, right? We can discuss if Arena-allocated is the right thing to do. But for now it is what I did with all other submodules of `VLoopAnalyzer`, so if we were to change this, then I can do that in a separate RFE. What alternative would you prefer, and why? I like Arena-allocation, because I have a clear location and life-time for my allocations. I can close the arena after all AutoVectorization, and I know that the data is valid up to that point, and then it gets deallocated. CHeap allocation would require me to be more smart and careful about deallocation. Resouce allocation in my experience often is problematic if you have different life-times for things. I like Resource-allocation only for temporary data structures, not data that is used across a large algorithm with dozens of subalgorithms. Let me know what you think ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548127900 From epeter at openjdk.org Tue Apr 2 15:50:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 15:50:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: <9wk16cDm6EkVVVS6SZHj-n7z5GOJ4UlI-MxSW8Kdeak=.a6d31bb3-dcf4-429c-8a6b-91e3cdf4f5b4@github.com> On Tue, 2 Apr 2024 14:03:09 GMT, Christian Hagedorn wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > src/hotspot/share/opto/vectorization.hpp line 456: > >> 454: // Submodule of VLoopAnalyzer. >> 455: // We compute and cache the VPointer for every load and store. >> 456: class VLoopPointers : public StackObj { > > Nit: Should we call this `VLoopVPointers` to make the link to `VPointers` and not just some pointers? Hmm. This looks like a general naming question now. My idea is that the `V` at the beginning of the types just is kind of a "namespace", to say that all types are used for `Vectorization`. But I guess here we can do it just so everybody knows we are dealing with `VPointers`. I'll make an exception, but don't want to see `V`'s littered everywhere ;) > src/hotspot/share/opto/vectorization.hpp line 462: > >> 460: const VLoopBody& _body; >> 461: >> 462: // Array of cached pointers > > Maybe make a note that we allocate/cache them lazily upon request. It is not lazy, they are allocated and cached in `compute_and_cache`. Like all other `VLoopAnalyzer` submodules. Maybe I missed your point ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548137659 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548139526 From epeter at openjdk.org Tue Apr 2 16:01:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:01:25 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v2] In-Reply-To: References: Message-ID: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: pointer -> vpointer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18577/files - new: https://git.openjdk.org/jdk/pull/18577/files/d5ef5e45..386f2ca1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=00-01 Stats: 54 lines in 4 files changed: 0 ins; 0 del; 54 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Tue Apr 2 16:01:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:01:25 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v2] In-Reply-To: <8_jxqANi6RTAiFatiMqSAJmuPoichU-7hHgwGFTVPm8=.9b284c20-6a4c-44d7-84d7-39c47e00002c@github.com> References: <8_jxqANi6RTAiFatiMqSAJmuPoichU-7hHgwGFTVPm8=.9b284c20-6a4c-44d7-84d7-39c47e00002c@github.com> Message-ID: On Tue, 2 Apr 2024 15:56:47 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.hpp line 498: >> >>> 496: } >>> 497: >>> 498: // VLoopDependencyGraph Accessors >> >> Suggestion: >> >> // VLoopDependencyGraph accessors > > Well, I have it upper-case in all other cases... but the real mistake is that it should be a `VLoopVPointer` accessor. Would you like me to change all `Accessors` -> `accessors`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548157167 From epeter at openjdk.org Tue Apr 2 16:01:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:01:25 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v2] In-Reply-To: References: Message-ID: <8_jxqANi6RTAiFatiMqSAJmuPoichU-7hHgwGFTVPm8=.9b284c20-6a4c-44d7-84d7-39c47e00002c@github.com> On Tue, 2 Apr 2024 13:50:23 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> pointer -> vpointer > > src/hotspot/share/opto/superword.hpp line 498: > >> 496: } >> 497: >> 498: // VLoopDependencyGraph Accessors > > Suggestion: > > // VLoopDependencyGraph accessors Well, I have it upper-case in all other cases... but the real mistake is that it should be a `VLoopVPointer` accessor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548154170 From duke at openjdk.org Tue Apr 2 16:07:27 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 2 Apr 2024 16:07:27 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11] In-Reply-To: References: Message-ID: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> > Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. > > ### Correctness checks > > Tier 1/2 tests are ok. > > ### Performance results on T-Head board > > #### Results for enabled intrinsic: > > Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --- | ---- | ----- | --- | ---- | --- | ---- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | > > #### Results for disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: - Schedule instructions better - Fix crc32.h path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17046/files - new: https://git.openjdk.org/jdk/pull/17046/files/64abc7b4..36b96465 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=09-10 Stats: 7 lines in 2 files changed: 2 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17046/head:pull/17046 PR: https://git.openjdk.org/jdk/pull/17046 From duke at openjdk.org Tue Apr 2 16:07:28 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 2 Apr 2024 16:07:28 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v10] In-Reply-To: <-SrC51R6lQEUtmcf4Z0OhAf24-WfhN4w968PBAtg7iE=.d0dda166-61e3-44ab-b9f6-129eae227dc5@github.com> References: <8FMdThon_sV2dJ6P5AnRzvQ8eXqfEumA2sX2TY4Z_GA=.eb31062e-8bcf-45cf-9769-8080df213ff6@github.com> <-SrC51R6lQEUtmcf4Z0OhAf24-WfhN4w968PBAtg7iE=.d0dda166-61e3-44ab-b9f6-129eae227dc5@github.com> Message-ID: On Tue, 2 Apr 2024 14:33:30 GMT, Fei Yang wrote: >> ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: >> >> Use srliw to clear upper bits for 'lower' cases > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1365: > >> 1363: shadd(tmp1, tmp1, table0, tmp1, 2); >> 1364: lwu(tmp2, Address(tmp1)); >> 1365: xorr(crc, crc, tmp2); > > I witnessed slightly better JMH numbers on Lichee-PI-4A with the following sequence: > > if (upper) > srli(v, v, 32); > xorr(v, v, crc); > > andi(tmp1, v, right_8_bits); > shadd(tmp1, tmp1, table3, tmp2, 2); > lwu(crc, Address(tmp1)); > > srli(tmp1, v, 6); > andi(tmp1, tmp1, (right_8_bits << 2)); > add(tmp1, tmp1, table2); > lwu(tmp2, Address(tmp1)); > > srli(tmp1, v, 14); > andi(tmp1, tmp1, (right_8_bits << 2)); > add(tmp1, tmp1, table1); > xorr(crc, crc, tmp2); > > lwu(tmp2, Address(tmp1)); > srliw(tmp1, v, 24); > shadd(tmp1, tmp1, table0, tmp1, 2); > xorr(crc, crc, tmp2); > > lwu(tmp2, Address(tmp1)); > xorr(crc, crc, tmp2); Thanks for pointing it out! Also some additional accel can be achieved by using `srli` for '`upper`' cases: On Lichee-Pi: `srliw` only | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | --------------------------------- | ------ | ----------- | --------- | -------- | --------- | ------ | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 6512.348 | 146.138 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 3408.306 | 279.986 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1971.538 | 100.804 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 1040.091 | 3.426 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 272.233 | 3.844 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 33.781 | 1.961 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 8.399 | 0.042 | ops/ms | `srli` and `srliw` | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ----------------- | ---------------------------- | ------ | ----- | ------------ | -------- | -------- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 6561.674 | 104.461 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 3586.810 | 109.934 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 2024.515 | 16.118 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 1047.475 | 39.745 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 274.006 | 0.809 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 34.746 | 0.203 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 8.405 | 0.064 | ops/ms | > src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 60: > >> 58: >> 59: /** >> 60: * crc_table[] from jdk/src/share/native/java/util/zip/zlib-1.2.5/crc32.h > > I think the correct path should be: `jdk/src/java.base/share/native/libzip/zlib/crc32.h` Fixed, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548163980 PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1548164055 From epeter at openjdk.org Tue Apr 2 16:15:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:15:24 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v3] In-Reply-To: References: Message-ID: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: for_each_mem ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18577/files - new: https://git.openjdk.org/jdk/pull/18577/files/386f2ca1..2dbb02e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=01-02 Stats: 28 lines in 2 files changed: 13 ins; 6 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Tue Apr 2 16:15:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:15:24 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v3] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 13:53:57 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> for_each_mem > > src/hotspot/share/opto/vectorization.cpp line 184: > >> 182: } >> 183: >> 184: void VLoopPointers::compute_and_cache() { > > Could be split into something like: > > allocate_pointer_memory(); > initialize_pointers(); > trace_pointers(); > > where allocate_pointer_memory(): > number_of_pointers = compute_number_of_pointers(); > uint bytes = number_of_pointers * sizeof(VPointer); > _pointers = (VPointer*)_arena->Amalloc(bytes); With the new `for_each_mem`, the code is already much easier to read. I don't know if splitting it further would really help now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548176186 From epeter at openjdk.org Tue Apr 2 16:19:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:19:21 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: vpointers length ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18577/files - new: https://git.openjdk.org/jdk/pull/18577/files/2dbb02e5..12f96209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=02-03 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Tue Apr 2 16:19:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 2 Apr 2024 16:19:21 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 13:57:11 GMT, Johan Sj?len wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> vpointers length > > Hi Emanuel, > > I've some general questions regarding naming and Arena usage, I hope you don't mind some runtime team input. @jdksjolen @chhagedorn Thanks for your suggestions! I think I addressed / commented on all your review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18577#issuecomment-2032491947 From mli at openjdk.org Tue Apr 2 16:33:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 2 Apr 2024 16:33:11 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11] In-Reply-To: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> References: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> Message-ID: On Tue, 2 Apr 2024 16:07:27 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: > > - Schedule instructions better > - Fix crc32.h path Thanks for updating and continuous refinement. ![image](https://github.com/openjdk/jdk/assets/10797965/4ed5ccc7-cc4b-431d-8c7d-ae829bd2c43b) Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing. So IMHO, when data size is big enough, it brings performance regression rather than performance gain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2032522714 From kvn at openjdk.org Tue Apr 2 18:04:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Apr 2024 18:04:09 GMT Subject: RFR: 8327110: Refactor create_bool_from_template_assertion_predicate() to separate class and fix identical cloning cases used for Loop Unswitching and Split If [v3] In-Reply-To: <8UdoeQB0Qz7Lzb-SZeOpf8V9IyXcmeKKyOHzQz0E5GE=.9550067d-93fa-4915-a06c-cbba220f2893@github.com> References: <8UdoeQB0Qz7Lzb-SZeOpf8V9IyXcmeKKyOHzQz0E5GE=.9550067d-93fa-4915-a06c-cbba220f2893@github.com> Message-ID: On Thu, 28 Mar 2024 12:32:59 GMT, Christian Hagedorn wrote: >> This is a follow-up to the previous refactoring done in https://github.com/openjdk/jdk/pull/18080. The patch starts to replace the usages of `create_bool_from_template_assertion_predicate()` by providing a refactored and fixed cloning algorithm. >> >> #### How `create_bool_from_template_assertion_predicate()` Works >> Currently, the algorithm in `create_bool_from_template_assertion_predicate()` uses an iterative DFS walk to find all nodes of a Template Assertion Predicate Expression in order to clone them. We do the following: >> 1. Follow all inputs if they could be a node that's part of a Template Assertion Predicate (compares opcodes): >> https://github.com/openjdk/jdk/blob/326c91e1a28ec70822ef927ee9ab17f79aa6d35c/src/hotspot/share/opto/loopTransform.cpp#L1513 >> >> 2. Once we find an `OpaqueLoopInit` or `OpaqueLoopStride` node, we start backtracking in the DFS. While doing so, we start to clone all nodes on the path from the `OpaqueLoop*Nodes` node to the start node and already update the graph. This logic is quite complex and difficult to understand since we do everything simultaneously. This was one of the reasons, I've originally tried to refactor this method in https://github.com/openjdk/jdk/pull/16877 because I needed to extend it for the full fix of Assertion Predicates in JDK-8288981. >> >> #### Missing Visited Set >> The current implementation of `create_bool_from_template_assertion_predicate()` does not use a visited set. This means that whenever we find a diamond shape, we could visit a node twice and re-discover all paths above this diamond again: >> >> >> ... >> | >> E >> | >> D >> / \ >> B C >> \ / >> A >> >> DFS walk: A -> B -> D -> E -> ... -> C -> D -> E -> ... >> >> With each diamond, the number of revisits of each node above doubles. >> >> #### Endless DFS in Edge-Cases >> In most cases, we would normally just stop quite quickly once we follow a data node that is not part of a Template Assertion Predicate Expression because the node opcode is different. However, in the test cases, we create a long chain of data nodes with many diamonds that could all be part of a Template Assertion Predicate Expression (i.e. `is_part_of_template_assertion_predicate_bool()` would return true to follow the inputs in a DFS walk). As a result, the DFS revisits a lot of nodes, especially higher up in the graph, exponentially many times and compilation is stuck for a long time (running the test cases result in a test timeout because... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Moved comment + better assert "Stamp" of approval. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18293#pullrequestreview-1974555601 From kvn at openjdk.org Tue Apr 2 18:46:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Apr 2024 18:46:01 GMT Subject: RFR: 8328702: C2: Crash during parsing because sub type check is not folded In-Reply-To: References: Message-ID: <9Re3prp8UgXZZCvQSDgna01X8JuTM3nar_Renmeezdo=.ac2ed92a-6216-4347-9b1d-6e10dc07ddb6@github.com> On Wed, 27 Mar 2024 13:48:09 GMT, Christian Hagedorn wrote: > The test case shows a problem where data is folded during parsing while control is not. This leaves the graph in a broken state and we fail with an assertion. > > We have the following (pseudo) code for some class `X`: > > o = flag ? new Object[] : new byte[]; > if (o instanceof X) { > X x = (X)o; // checkcast > } > > For the `checkcast`, C2 knows that the type of `o` is some kind of array, i.e. type `[bottom`. But this cannot be a sub type of `X`. Therefore, the `CheckCastPP` node created for the `checkcast` result is replaced by top by the type system. However, the `SubTypeCheckNode` for the `checkcast` is not folded and the graph is broken. > > The problem of not folding the `SubTypeCheckNode` can be traced back to `SubTypeCheckNode::sub` calling `static_subtype_check()` when transforming the node after it's creation. `static_subtype_check()` should detect that the sub type check is always wrong here: > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/compile.cpp#L4454-L4460 > > But it does not because these two checks return the following: > 1. Check: is `o` a sub type of `X`? -> returns no, so far so good. > 2. Check: _could_ `o` be a sub type of `X`? -> returns no which is wrong! `[bottom` is only a sub type of `Object` and can never be a subtype of `X` > > In `maybe_java_subtype_of_helper_for_arr()`, we wrongly conclude that any array with a base element type `bottom` _could_ be a sub type of anything: > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/type.cpp#L6462-L6465 > But this is only true if the super class is also an array class - but not if `other` (super klass) is an instance klass as in this case. > > The fix for this is to first check the immediately following check which handles the case of comparing an array klass to an instance klass: An array klass can only ever be a sub class of an instance klass if it's the `Object` class. But in our case, we have `X` and this would return false: > > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/type.cpp#L6466-L6468 > > The very same problem can also be triggered with `X` being an interface instead. There are tests for both these cases. > > #### Additionally Required Fix > When running with `-XX:+ExpandSubTypeCheckAtParseTime`, we eagerly expand the sub type check during parsing and therefore do not emit a `SubTypeCheckNode`. When additionally running with `-XX:+StressReflectiveCode`, th... Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18512#pullrequestreview-1974643156 From duke at openjdk.org Tue Apr 2 19:01:10 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 2 Apr 2024 19:01:10 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11] In-Reply-To: References: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> Message-ID: On Tue, 2 Apr 2024 16:30:27 GMT, Hamlin Li wrote: >> ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Schedule instructions better >> - Fix crc32.h path > > Thanks for updating and continuous refinement. > > ![image](https://github.com/openjdk/jdk/assets/10797965/4ed5ccc7-cc4b-431d-8c7d-ae829bd2c43b) > > Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing. > So IMHO, when data size is big enough, it brings performance regression rather than performance gain. @Hamlin-Li I modified `test/micro/org/openjdk/bench/java/util/TestCRC32C.java` a bit to see if the regression happens on increased data, and run it on VisionFive2 with Zba enabed: Enabled intrinsic | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- | | CRC32.TestCRC32.testCRC32Update | 131072 | thrpt | 40 | 2.841 | 0.001 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 262144 | thrpt | 40 | 1.420 | 0.001 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 524288 | thrpt | 40 | 0.709 | 0.001 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2097152 | thrpt | 40 | 0.176 | 0.001 | ops/ms | Disabled intrinsic | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------- | ----------- | ----------- | ----- | ------- | ------ | ---------- | | CRC32.TestCRC32.testCRC32Update | 131072 | thrpt | 40 | 2.729 | 0.003 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 262144 | thrpt | 40 | 1.367 | 0.001 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 524288 | thrpt | 40 | 0.684 | 0.001 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2097152 | thrpt | 40 | 0.170 | 0.001 | ops/ms | | (count) | enabled/disabled | | ------------------ | ------------------ | | 131072 | 1,041040674 | | 262144 | 1,038771031 | | 524288 | 1.036549708 | | 2097152 | 1,035294118 | So since there are no regressions compared to C2-generated code with `-XX:+UseZba`, how about making the intrinsic Zba-exclusive? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2032833530 From kvn at openjdk.org Tue Apr 2 19:49:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Apr 2024 19:49:11 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: On Tue, 2 Apr 2024 06:28:18 GMT, Emanuel Peter wrote: >> **Problem** >> In [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) / https://git.openjdk.org/jdk/pull/14785 I fixed the alignment with `AlignVector`. For that, I had to compute `abs(scale)` and `abs(stride)`, as well as `scale * stride`. The issue is that all of these values can overflow the int range (e.g. `abs(min_int) = min_int`). >> >> We hit asserts like: >> >> `# assert(is_power_of_2(value)) failed: value must be a power of 2: 0xffffffff80000000` >> Happens because we take `abs(min_int)`, which is `min_int = 0x80000000`, and assuming this was a positive (unsigned) number is a power of 2 `2^31`. We then expand it to `long`, get `0xffffffff80000000`, which is not a power of 2 anymore. This violates the implicit assumptions, and we hit the assert. >> >> `# assert(q >= 1) failed: modulo value must be large enough` >> We have `scale = 2^30` and `stride = 4 = 2^2`. For the alignment calculation we compute `scale * stride = 2^32`, which overflows the int range and becomes zero. >> >> Before [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) we could get similar issues with the (old) code in `SuperWord::ref_is_alignable`, if `AlignVector` is enabled: >> >> >> int span = preloop_stride * p.scale_in_bytes(); >> ... >> if (vw % span == 0) { >> >> >> if `span == 0` because of overflow, then the `idiv` from the modulo gets a division by zero -> `SIGFPE`. >> >> But it seems the bug is possibly a regression from JDK20 b2 [JDK-8286197](https://bugs.openjdk.org/browse/JDK-8286197). Here we enabled certaint Unsafe memory access address patterns, and it is such patterns that the reproducer requires. >> >> **Solution** >> I could either patch up all the code that works with `scale` and `stride`, and make sure no overflows ever happen. But that is quite involved and error prone. >> >> I now just disable vectorization for large `scale` and `stride`. This should not have any performance impact, because such large `scale` and `stride` would lead to highly inefficient memory accesses, since they are spaced very far apart. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'JDK-8328938-abs-min-int-assert' of https://github.com/eme64/jdk into JDK-8328938-abs-min-int-assert > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8328938-abs-min-int-assert > - improve comments > - 8328938 src/hotspot/share/opto/vectorization.cpp line 411: > 409: abs(long_stride) >= max_val || > 410: abs(long_scale * long_stride) >= max_val) { > 411: assert(!valid(), "adr stride*scale is too large"); Why you need assert? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18485#discussion_r1548527073 From kvn at openjdk.org Tue Apr 2 20:11:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Apr 2024 20:11:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 16:19:21 GMT, Emanuel Peter wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > vpointers length One question: will VLoopAnalyzer default destructor clean up all memory used? ------------- PR Review: https://git.openjdk.org/jdk/pull/18577#pullrequestreview-1974930345 From kvn at openjdk.org Wed Apr 3 00:35:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 00:35:11 GMT Subject: RFR: 8329174: update CodeBuffer layout in comment after constants section moved In-Reply-To: References: Message-ID: <8Pp9tmiY-2sl1H-VXv2r8DbPre4JJCzqKZh_hccF0pA=.abf28793-3f56-464b-bc87-c79df803bb45@github.com> On Thu, 28 Mar 2024 02:50:40 GMT, lusou-zhangquan wrote: > Enhancement [JDK-6961697](https://bugs.openjdk.org/browse/JDK-6961697) moved nmethod constants section before instruction section, but the layout scheme in codeBuffer.cpp was not changed correspondingly. The mismatch between layout scheme in source code and actual layout is misleading, so we'd better fix it. My PR takes longer to get reviews. Let push this and I will merge it into my PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18529#issuecomment-2033336581 From duke at openjdk.org Wed Apr 3 00:35:11 2024 From: duke at openjdk.org (lusou-zhangquan) Date: Wed, 3 Apr 2024 00:35:11 GMT Subject: Integrated: 8329174: update CodeBuffer layout in comment after constants section moved In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 02:50:40 GMT, lusou-zhangquan wrote: > Enhancement [JDK-6961697](https://bugs.openjdk.org/browse/JDK-6961697) moved nmethod constants section before instruction section, but the layout scheme in codeBuffer.cpp was not changed correspondingly. The mismatch between layout scheme in source code and actual layout is misleading, so we'd better fix it. This pull request has now been integrated. Changeset: 866e7b6b Author: Quan Zhang Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/866e7b6b7745928e559da8cdf622bf6a097ec995 Stats: 9 lines in 1 file changed: 4 ins; 4 del; 1 mod 8329174: update CodeBuffer layout in comment after constants section moved Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/18529 From duke at openjdk.org Wed Apr 3 01:13:39 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Apr 2024 01:13:39 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v3] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 17:46:14 GMT, Vladimir Kozlov wrote: > Can we also add statistic about how many different barriers C2 generates and eliminates? It will help to know if we missing some optimization with these changes. Added these statistics. Example output: `Barriers generated = 4, Barriers eliminated = 1` I think there are missing cases for barriers eliminated. There could be cases of aside from `MemBarNode::remove()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2033363198 From duke at openjdk.org Wed Apr 3 01:13:39 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Apr 2024 01:13:39 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v7] In-Reply-To: References: Message-ID: <7HWfk5Q4A6yM9yzfiKxWhZ3cuswzWEvqzChKLtSdHT8=.a6cd68c4-fa17-47cd-bd91-a6668ba84d00@github.com> > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: - Statistics for barriers generated/eliminated - global flag to turn on storestore barrier emission and membar acquires IR tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/656f52c8..33d23635 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=05-06 Stats: 42 lines in 7 files changed: 33 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Wed Apr 3 01:13:39 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Apr 2024 01:13:39 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v6] In-Reply-To: References: <6K5Fpo1ev00bK62rlfQIjp91AmZYX3AwYk8J8R07HN0=.a401de7a-f515-40f2-b9c3-003d9eaffddc@github.com> Message-ID: On Tue, 2 Apr 2024 08:40:52 GMT, Aleksey Shipilev wrote: >> Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add micro benchmark courtesy of @shipilev >> - More comprehensive IR tests based on @shipilev's suggestions > > test/hotspot/jtreg/compiler/c2/irTests/ConstructorBarriers.java line 244: > >> 242: @IR(failOn = IRNode.MEMBAR_RELEASE) >> 243: @IR(failOn = IRNode.MEMBAR_STORESTORE) >> 244: @IR(failOn = IRNode.MEMBAR_RELEASE) > > Here and later: Looks weird to test for `MEMBAR_RELEASE` twice. Also, should it be just `failOn = IRNode.MEMBAR`? Meant to test for `MEMBAR_VOLATILE`. We can't fail on all `MEMBAR` because there are some cases that have `MEMBAR_ACQUIRE`. Added those checks in latest commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1548795888 From fyang at openjdk.org Wed Apr 3 05:42:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Apr 2024 05:42:11 GMT Subject: RFR: 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V In-Reply-To: References: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> Message-ID: On Sat, 30 Mar 2024 14:52:31 GMT, Jasmine Karthikeyan wrote: >> Please review this small change fixing an IR matching failure on linux-riscv platform. >> >> JDK-8324655 tries to identify min/max patterns in CMoves and transform them into Min and Max nodes. >> But architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. >> So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated >> with normal compare and branch instructions instead [1]. This is why the IR matching test added by >> JDK-8324655 fails on this platform. A simple way to fix this would be skip this test for this case. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9775 > > Thank you for fixing this! It looks good to me. @jaskarth : Thanks! Could we have a Reviewer please? Maybe @TobiHartmann or @chhagedorn : -) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18558#issuecomment-2033572797 From thartmann at openjdk.org Wed Apr 3 05:51:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Apr 2024 05:51:08 GMT Subject: RFR: 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V In-Reply-To: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> References: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> Message-ID: On Sat, 30 Mar 2024 08:49:00 GMT, Fei Yang wrote: > Please review this small change fixing an IR matching failure on linux-riscv platform. > > JDK-8324655 tries to identify min/max patterns in CMoves and transform them into Min and Max nodes. > But architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. > So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated > with normal compare and branch instructions instead [1]. This is why the IR matching test added by > JDK-8324655 fails on this platform. A simple way to fix this would be skip this test for this case. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9775 Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18558#pullrequestreview-1975564856 From dlunden at openjdk.org Wed Apr 3 05:53:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 3 Apr 2024 05:53:15 GMT Subject: Integrated: 8323682: C2: guard check is not generated in Arrays.copyOfRange intrinsic when allocation is eliminated by EA In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 13:28:32 GMT, Daniel Lund?n wrote: > The library intrinsic `_copyOfRange` does not add a guard for start indices that are larger than the length of the source arrays. Macro expansion of `ArrayCopy` nodes later adds such a guard, but in certain situations escape analysis may result in removing the `ArrayCopy` node before it is expanded. The result is incorrect behavior of the compiled program (as the missing guard may have relevant side effects, such as throwing an exception). > > ### Changeset > > - Add the missing guard (start index <= source array length). > - Add a regression test. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/8437807452) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. This pull request has now been integrated. Changeset: 92f5c0be Author: Daniel Lund?n Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/92f5c0be8e3b47343b54a26940df691faaf49b23 Stats: 71 lines in 3 files changed: 65 ins; 1 del; 5 mod 8323682: C2: guard check is not generated in Arrays.copyOfRange intrinsic when allocation is eliminated by EA Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18472 From chagedorn at openjdk.org Wed Apr 3 06:37:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 06:37:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: <8_jxqANi6RTAiFatiMqSAJmuPoichU-7hHgwGFTVPm8=.9b284c20-6a4c-44d7-84d7-39c47e00002c@github.com> Message-ID: On Tue, 2 Apr 2024 15:58:50 GMT, Emanuel Peter wrote: >> Well, I have it upper-case in all other cases... but the real mistake is that it should be a `VLoopVPointer` accessor. > > Would you like me to change all `Accessors` -> `accessors`? You could do that if there are not too many places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548991580 From chagedorn at openjdk.org Wed Apr 3 06:41:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 06:41:00 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 16:12:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 184: >> >>> 182: } >>> 183: >>> 184: void VLoopPointers::compute_and_cache() { >> >> Could be split into something like: >> >> allocate_pointer_memory(); >> initialize_pointers(); >> trace_pointers(); >> >> where allocate_pointer_memory(): >> number_of_pointers = compute_number_of_pointers(); >> uint bytes = number_of_pointers * sizeof(VPointer); >> _pointers = (VPointer*)_arena->Amalloc(bytes); > > With the new `for_each_mem`, the code is already much easier to read. I don't know if splitting it further would really help now? That's already better! My general take on that is when I see: // Do x // Do y // Do z it suggests that it should actually be x(); y(); z(); But that's just my personal preference :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548995281 From chagedorn at openjdk.org Wed Apr 3 06:46:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 06:46:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 15:42:04 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 224: >> >>> 222: for (int i = 0; i < _body.body().length(); i++) { >>> 223: MemNode* mem = _body.body().at(i)->isa_Mem(); >>> 224: if (mem != nullptr && _vloop.in_bb(mem)) { >> >> I see that you use this pattern twice. Maybe we could provide a "for_each_mem(lambda)" in `VLoopBody`? But could also be done separately. > > I was considering it. I can do that. Can now also be replaced by `for_each_mem()`. >> src/hotspot/share/opto/vectorization.hpp line 462: >> >>> 460: const VLoopBody& _body; >>> 461: >>> 462: // Array of cached pointers >> >> Maybe make a note that we allocate/cache them lazily upon request. > > It is not lazy, they are allocated and cached in `compute_and_cache`. Like all other `VLoopAnalyzer` submodules. Maybe I missed your point ? I've meant that it's not allocated in the constructor as you initialize it with `nullptr`. It's only initialized once you call `compute_and_cache()` which may not happen if we bail out earlier. That's what I've meant with "lazy" but that was probably not clear enough :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548999383 PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1548997542 From epeter at openjdk.org Wed Apr 3 06:47:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 06:47:01 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: On Tue, 2 Apr 2024 19:46:03 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'JDK-8328938-abs-min-int-assert' of https://github.com/eme64/jdk into JDK-8328938-abs-min-int-assert >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8328938-abs-min-int-assert >> - improve comments >> - 8328938 > > src/hotspot/share/opto/vectorization.cpp line 411: > >> 409: abs(long_stride) >= max_val || >> 410: abs(long_scale * long_stride) >= max_val) { >> 411: assert(!valid(), "adr stride*scale is too large"); > > Why you need assert? If you look a few lines up, you can see that all other "bailouts" also check that the VPointer is invalid. I am simply matching the surrounding code. And it also makes it explicit, that the VPointer will be invalid, which is what I want. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18485#discussion_r1549000771 From epeter at openjdk.org Wed Apr 3 06:53:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 06:53:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 20:08:34 GMT, Vladimir Kozlov wrote: > One question: will VLoopAnalyzer default destructor clean up all memory used? @vnkozlov there is no need, since it is all allocated over the `Arena` in `VLoopAnalyzer`: // Arena for all submodules Arena _arena; It is that arena that I pass into all submodules, such as `VLoopVPointer`. `VLoopAnalyzer` is stack allocated, so once the destructor removes its `_arena`, all submodules are also automatically deallocated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18577#issuecomment-2033687519 From epeter at openjdk.org Wed Apr 3 06:57:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 06:57:09 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: <-8eSAd68tJf5BvtduzNA2Xwwr6w5Z8UrpKT0_fJA6uM=.77cbb013-e91e-4e54-b39c-9619fb2a0d04@github.com> On Wed, 3 Apr 2024 06:43:05 GMT, Christian Hagedorn wrote: >> I was considering it. I can do that. > > Can now also be replaced by `for_each_mem()`. Thanks for spotting that! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549010333 From epeter at openjdk.org Wed Apr 3 07:08:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 07:08:26 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v5] In-Reply-To: References: Message-ID: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18577/files - new: https://git.openjdk.org/jdk/pull/18577/files/12f96209..bd7f76ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=03-04 Stats: 15 lines in 2 files changed: 0 ins; 3 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From chagedorn at openjdk.org Wed Apr 3 07:08:26 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 07:08:26 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v5] In-Reply-To: <9wk16cDm6EkVVVS6SZHj-n7z5GOJ4UlI-MxSW8Kdeak=.a6d31bb3-dcf4-429c-8a6b-91e3cdf4f5b4@github.com> References: <9wk16cDm6EkVVVS6SZHj-n7z5GOJ4UlI-MxSW8Kdeak=.a6d31bb3-dcf4-429c-8a6b-91e3cdf4f5b4@github.com> Message-ID: On Tue, 2 Apr 2024 15:45:47 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.hpp line 456: >> >>> 454: // Submodule of VLoopAnalyzer. >>> 455: // We compute and cache the VPointer for every load and store. >>> 456: class VLoopPointers : public StackObj { >> >> Nit: Should we call this `VLoopVPointers` to make the link to `VPointers` and not just some pointers? > > Hmm. This looks like a general naming question now. My idea is that the `V` at the beginning of the types just is kind of a "namespace", to say that all types are used for `Vectorization`. > But I guess here we can do it just so everybody knows we are dealing with `VPointers`. I'll make an exception, but don't want to see `V`'s littered everywhere ;) I agree that we should not start adding `V`'s in between. But for `VPointer`, I think it makes sense since `VPointer` is quite a known name and `LoopPointers` might be misleading. Thanks for making this change! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549025476 From epeter at openjdk.org Wed Apr 3 07:08:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 07:08:27 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v5] In-Reply-To: References: Message-ID: <5zx6Qq1J_0iqhCMcTSbdvtho4IDlJjl36g8l47PMTlQ=.3640c11b-2609-435e-b66a-bb9917af4657@github.com> On Wed, 3 Apr 2024 06:41:07 GMT, Christian Hagedorn wrote: >> It is not lazy, they are allocated and cached in `compute_and_cache`. Like all other `VLoopAnalyzer` submodules. Maybe I missed your point ? > > I've meant that it's not allocated in the constructor as you initialize it with `nullptr`. It's only initialized once you call `compute_and_cache()` which may not happen if we bail out earlier. That's what I've meant with "lazy" but that was probably not clear enough :-) Aha, I see. I mean all other submodules are handled the same. They also cannot really be used until `VLoopAnalyzer::setup_submodules` returns with success. I guess this here is the first instance where the data structure itself is only allocated after the constructor. But I feel like if anybody has a question about where it is allocated, they can just search the reference. If I start putting down such detailed comments, then I need to put them everywhere. And that will clutter the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549022826 From chagedorn at openjdk.org Wed Apr 3 07:08:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 07:08:27 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v5] In-Reply-To: <5zx6Qq1J_0iqhCMcTSbdvtho4IDlJjl36g8l47PMTlQ=.3640c11b-2609-435e-b66a-bb9917af4657@github.com> References: <5zx6Qq1J_0iqhCMcTSbdvtho4IDlJjl36g8l47PMTlQ=.3640c11b-2609-435e-b66a-bb9917af4657@github.com> Message-ID: On Wed, 3 Apr 2024 07:02:14 GMT, Emanuel Peter wrote: >> I've meant that it's not allocated in the constructor as you initialize it with `nullptr`. It's only initialized once you call `compute_and_cache()` which may not happen if we bail out earlier. That's what I've meant with "lazy" but that was probably not clear enough :-) > > Aha, I see. I mean all other submodules are handled the same. They also cannot really be used until `VLoopAnalyzer::setup_submodules` returns with success. I guess this here is the first instance where the data structure itself is only allocated after the constructor. But I feel like if anybody has a question about where it is allocated, they can just search the reference. If I start putting down such detailed comments, then I need to put them everywhere. And that will clutter the code. That's true. Here I think I've only commented it since it's allocated specially for the first time in the sub modules. But it does not really add much information per se. It's fine to leave it like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549039187 From chagedorn at openjdk.org Wed Apr 3 07:10:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 07:10:58 GMT Subject: RFR: 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V In-Reply-To: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> References: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> Message-ID: On Sat, 30 Mar 2024 08:49:00 GMT, Fei Yang wrote: > Please review this small change fixing an IR matching failure on linux-riscv platform. > > JDK-8324655 tries to identify min/max patterns in CMoves and transform them into Min and Max nodes. > But architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. > So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated > with normal compare and branch instructions instead [1]. This is why the IR matching test added by > JDK-8324655 fails on this platform. A simple way to fix this would be skip this test for this case. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9775 Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18558#pullrequestreview-1975736029 From epeter at openjdk.org Wed Apr 3 07:17:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 07:17:34 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v6] In-Reply-To: References: Message-ID: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: split up _vpointers.compute_vpointers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18577/files - new: https://git.openjdk.org/jdk/pull/18577/files/bd7f76ba..644c5a3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18577&range=04-05 Stats: 21 lines in 2 files changed: 13 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18577.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18577/head:pull/18577 PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Wed Apr 3 07:17:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 07:17:34 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v6] In-Reply-To: References: Message-ID: <849undlgKp_0mXNcVh15o93EWiisSYniXAklau-SgEM=.c935b845-1dd1-450d-bf0b-072cf9980d2a@github.com> On Wed, 3 Apr 2024 06:38:45 GMT, Christian Hagedorn wrote: >> With the new `for_each_mem`, the code is already much easier to read. I don't know if splitting it further would really help now? > > That's already better! > > My general take on that is when I see: > > // Do x > // Do y > // Do z > > it suggests that it should actually be > > x(); > y(); > z(); > > But that's just my personal preference :-) Ok, I split it up now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549054681 From epeter at openjdk.org Wed Apr 3 07:24:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 07:24:10 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v6] In-Reply-To: References: <5zx6Qq1J_0iqhCMcTSbdvtho4IDlJjl36g8l47PMTlQ=.3640c11b-2609-435e-b66a-bb9917af4657@github.com> Message-ID: On Wed, 3 Apr 2024 07:05:29 GMT, Christian Hagedorn wrote: >> Aha, I see. I mean all other submodules are handled the same. They also cannot really be used until `VLoopAnalyzer::setup_submodules` returns with success. I guess this here is the first instance where the data structure itself is only allocated after the constructor. But I feel like if anybody has a question about where it is allocated, they can just search the reference. If I start putting down such detailed comments, then I need to put them everywhere. And that will clutter the code. > > That's true. Here I think I've only commented it since it's allocated specially for the first time in the sub modules. But it does not really add much information per se. It's fine to leave it like that. Ok, thanks for the suggestion anyway ? I will leave it without a comment then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18577#discussion_r1549069015 From chagedorn at openjdk.org Wed Apr 3 07:29:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 07:29:02 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v6] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 07:17:34 GMT, Emanuel Peter wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > split up _vpointers.compute_vpointers Thanks for making the suggested changes. Looks good to me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18577#pullrequestreview-1975779931 From fyang at openjdk.org Wed Apr 3 07:42:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Apr 2024 07:42:03 GMT Subject: RFR: 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V In-Reply-To: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> References: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> Message-ID: On Sat, 30 Mar 2024 08:49:00 GMT, Fei Yang wrote: > Please review this small change fixing an IR matching failure on linux-riscv platform. > > JDK-8324655 tries to identify min/max patterns in CMoves and transform them into Min and Max nodes. > But architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. > So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated > with normal compare and branch instructions instead [1]. This is why the IR matching test added by > JDK-8324655 fails on this platform. A simple way to fix this would be skip this test for this case. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9775 Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18558#issuecomment-2033779415 From fyang at openjdk.org Wed Apr 3 07:42:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Apr 2024 07:42:04 GMT Subject: Integrated: 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V In-Reply-To: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> References: <6ray-riEC6nAkcgIDes3YrwdJOhNVxm4NY5RXzYzwaE=.a57c4f75-3ac9-456e-a39e-58f7adcd4cb3@github.com> Message-ID: On Sat, 30 Mar 2024 08:49:00 GMT, Fei Yang wrote: > Please review this small change fixing an IR matching failure on linux-riscv platform. > > JDK-8324655 tries to identify min/max patterns in CMoves and transform them into Min and Max nodes. > But architectures like RISC-V doesn't have support of conditional moves at the ISA level for now. > So we set ConditionalMoveLimit parameter to 0 for this platform and conditionals moves are emulated > with normal compare and branch instructions instead [1]. This is why the IR matching test added by > JDK-8324655 fails on this platform. A simple way to fix this would be skip this test for this case. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv.ad#L9775 This pull request has now been integrated. Changeset: 16b842af Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/16b842af8edd10c4071eec98caf838a2f6c49746 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8329355: Test compiler/c2/irTests/TestIfMinMax.java fails on RISC-V Reviewed-by: jkarthikeyan, thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18558 From epeter at openjdk.org Wed Apr 3 08:36:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 08:36:26 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v18] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 48 commits: - Merge branch 'master' into JDK-8318446 - a little bit of casting for debug printing code - Merge branch 'master' into JDK-8318446 - fix test for trapping examples - WIP test with out of bounds exception - allow only array stores of same type as container - mismatched access test - add test300 - make it happen in post_loop_opts - fix invalid case - ... and 38 more: https://git.openjdk.org/jdk/compare/e3e6c2a8...d97fa2b4 ------------- Changes: https://git.openjdk.org/jdk/pull/16245/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=17 Stats: 2391 lines in 13 files changed: 2387 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From galder at openjdk.org Wed Apr 3 08:43:10 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 3 Apr 2024 08:43:10 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 02:43:18 GMT, Dean Long wrote: >> You're right about holder_known, but why do you need to check for _clone specifically at line 2137? If there is logic missing that prevents an inlining attempt then I think it should be fixed first, rather than in a followup. >> >> And I see that you need to do a receiver type check to allow only primitive arrays. Can you do that in append_alloc_array_copy, and bailout if not successful? The logic in build_graph_for_intrinsic would need to change slightly to support this. > > I was able to remove the clone-specific logic in invoke() in two parts: > > 1. fix the type_is_exact logic to allow array receiver > 2. move primitive array receiver check into append_alloc_array_copy > You're right about holder_known, but why do you need to check for _clone specifically at line 2137? If there is logic missing that prevents an inlining attempt then I think it should be fixed first, rather than in a followup. I added that check because none of the conditions in that `if` statement satisfied the situations in which `clone` calls are optimized. For the example I gave above, `code == Bytecodes::_invokevirtual` is true and `target->is_final_method()` is false. So that's why I added `clone` specifically. > I was able to remove the clone-specific logic in invoke() in two parts: > > 1. fix the type_is_exact logic to allow array receiver > 2. move primitive array receiver check into append_alloc_array_copy Great! I assume you also solved the clone check in line 2137? How do we add your work on top of mine? Do I cherry pick the commit(s) from a branch of yours? Or some other way? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1549258389 From shade at openjdk.org Wed Apr 3 08:50:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Apr 2024 08:50:10 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v7] In-Reply-To: <7HWfk5Q4A6yM9yzfiKxWhZ3cuswzWEvqzChKLtSdHT8=.a6cd68c4-fa17-47cd-bd91-a6668ba84d00@github.com> References: <7HWfk5Q4A6yM9yzfiKxWhZ3cuswzWEvqzChKLtSdHT8=.a6cd68c4-fa17-47cd-bd91-a6668ba84d00@github.com> Message-ID: On Wed, 3 Apr 2024 01:13:39 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Statistics for barriers generated/eliminated > - global flag to turn on storestore barrier emission and membar acquires > IR tests More comments: src/hotspot/share/opto/c2_globals.hpp line 795: > 793: develop(bool, UseStoreStoreForCtor, true, \ > 794: "Use storestore barrier instead of release barrier for" \ > 795: "on constructor exit") \ For in-field use / debugging, we really need to make a diagnostic product flag. src/hotspot/share/opto/escape.cpp line 200: > 198: // escape status of the associated Allocate node some of them > 199: // may be eliminated. > 200: if (n->req() > MemBarNode::Precedent) { Should be protected by a feature flag? Like: if (!UseStoreStoreForCtor || n->req() > MemBarNode::Precedent) { src/hotspot/share/opto/macro.cpp line 639: > 637: (use->is_Phi() || use->is_EncodeP() || > 638: use->Opcode() == Op_MemBarRelease || > 639: use->Opcode() == Op_MemBarStoreStore)) { Should be protected by a feature flag? src/hotspot/share/opto/memnode.cpp line 3438: > 3436: } > 3437: } > 3438: } else if (opc == Op_MemBarRelease || opc == Op_MemBarStoreStore) { Should be protected by a feature flag? src/hotspot/share/opto/stringopts.cpp line 2013: > 2011: // a reference to the newly constructed object (see Parse::do_exits()). > 2012: assert(AllocateNode::Ideal_allocation(result) != nullptr, "should be newly allocated"); > 2013: kit.insert_mem_bar(Op_MemBarStoreStore, result); Should be protected by a feature flag? ------------- PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-1976033867 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1549263678 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1549268541 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1549268950 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1549265535 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1549265267 From mli at openjdk.org Wed Apr 3 11:03:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Apr 2024 11:03:10 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v11] In-Reply-To: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> References: <-TES7hSQnM_OOxP1-Uxt1k6e3lCRaDsj9V0mSYfnz3Y=.8caa70c2-7831-468c-a047-d03f08a93623@github.com> Message-ID: On Tue, 2 Apr 2024 16:07:27 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: > > - Schedule instructions better > - Fix crc32.h path Thanks for updating. Seems fine, but I'm not sure. Maybe see how others think about it. Just FYI, as the trend of performance gain in this implementation is less and less as the data size grow larger, so I wonder if the CRC algorithm used in this implementation is optimal enough. Seems there're other more advanced algorithms which are supposed to bring more optimistic performance gains, and some of these algorithms are already implemented on other platforms in jdk. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-2034256054 From bkilambi at openjdk.org Wed Apr 3 11:12:24 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 3 Apr 2024 11:12:24 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Add comments, revert to requires_strict_order and other minor changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/4aed4b50..1156ef39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=02-03 Stats: 150 lines in 7 files changed: 62 ins; 5 del; 83 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From bkilambi at openjdk.org Wed Apr 3 11:12:25 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 3 Apr 2024 11:12:25 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3] In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 17:24:03 GMT, Andrew Haley wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Naming changes: replace strict/non-strict with more technical terms > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2858: > >> 2856: // reduction addF >> 2857: instruct reduce_add2F_neon(vRegF dst, vRegF fsrc, vReg vsrc) %{ >> 2858: predicate(Matcher::vector_length(n->in(2)) == 2 && n->as_Reduction()->is_associative()); > > This `vector_length(n->in(2)) == 2` is very obscure. I suspect that anyone coming across this code would not understand it. > > What exactly is the reason that this pattern is only applied for the 16b case? You need to give a justification in a comment right here. This is for vector length of 8B (64 bits). It adds two floats. I have added my comments in the new PS. > src/hotspot/share/opto/vectornode.hpp line 235: > >> 233: // Floating-point addition and multiplication are non-associative, so >> 234: // AddReductionVF/D and MulReductionVF/D require strict-ordering >> 235: // in auto-vectorization. Currently, Vector API allows > > Don't say "currently". Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1549499929 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1549498697 From bkilambi at openjdk.org Wed Apr 3 11:12:25 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 3 Apr 2024 11:12:25 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3] In-Reply-To: <5x-T1UNFkXz3aN0Zef-2h5NK2BrSuqO5NzT02-Hv3Vg=.3d6357d4-f525-4171-9641-40231f5431be@github.com> References: <6KLRg7UgrEMNOU71aVTF1Pka972NReqA_wOynzEipHE=.f3e38aea-d0c0-4f09-8849-96398152ef6a@github.com> <5x-T1UNFkXz3aN0Zef-2h5NK2BrSuqO5NzT02-Hv3Vg=.3d6357d4-f525-4171-9641-40231f5431be@github.com> Message-ID: On Thu, 21 Mar 2024 11:34:18 GMT, Emanuel Peter wrote: >> Hi, thank you for your comments on this. I personally also feel "is_associative" is a bit non-intuitive as in the reader might have to make the connection between "associativity" and "ordering" compared to the case where we directly use what we intend the variable to do, something like "is_ordered". Would it be okay if I revert this to "requires_strict_order" for the variable and method names that I used in my first commit? > > Yes, I think that would be ok. Thank you. I have reverted to "requires_strict_order" in the new PS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1549498466 From bkilambi at openjdk.org Wed Apr 3 11:12:25 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 3 Apr 2024 11:12:25 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3] In-Reply-To: References: Message-ID: On Fri, 22 Mar 2024 05:41:01 GMT, Emanuel Peter wrote: >> Hi, what is the reason to declare it as a `const`? It is declared as a `private` member variable with no "setter" function either. It is not easy to modify this value from any other part of the code anyway. > > Generally, it is better to declare things `const` if they can be. It tells the reader of the code that the field will never be changed. Even if things are fine now, a future contributor might misunderstand how the field is to be used, and start modifying it in a `Ideal` method for example. > But if it is not simple to make it `const` for some reason, then don't do it. Thanks for the explanation. I agree with you and have made the changes in the new PS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1549500514 From bkilambi at openjdk.org Wed Apr 3 11:15:11 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 3 Apr 2024 11:15:11 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v3] In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 10:26:16 GMT, Emanuel Peter wrote: >> Yes, MUL is non-associative in VectorAPI just like ADD operation (according to the description here - https://docs.oracle.com/en/java/javase/19/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#fp_assoc). >> >> We found a significant perf difference between the SVE "fadda" instruction which is a strictly ordered instruction vs Neon instructions on a 128-bit SVE machine especially after this optimization - https://bugs.openjdk.org/browse/JDK-8298244 but there's no such performance difference for the MUL operation. MulReductionVF/VD do not have direct instructions for multiply reduction nor do they have separate ISA for strictly ordered or non-strictly ordered. So, currently we do not have any data that shows any benefit to add similar code for MUL and thus it's currently considered to be a non-associative operation (strictly ordered). I am not sure about other platforms. > > Right. Ok, since your benchmarks are restiricted to NEON/SVE, I can understand these results. But I would think that probably on x86 machines this would look different, it is just that we currently have no unordered float/double add/mul reductions. > > I think it would be nice if you made both Add and Mul capable of being unordered already, that would make future work in this area simpler. Or do you see a regression for unordered mul reductions on your benchmark machines? Ok, I have added support in the mid-end for Mul operation as well. I don't see any regression on aarch64 as I have not modified the rules for mul reduction in any way. I have not added any aarch64 backend rules for mul reduction as we do not have separate instructions for strictly/non-strictly ordered mul reduction and it makes no sense to add the strict ordering condition for mul reduction on aarch64. However, as you suggested if other platforms do have such instructions, it might benefit them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1549505490 From sjayagond at openjdk.org Wed Apr 3 11:45:21 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 3 Apr 2024 11:45:21 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register Message-ID: Fix sign extension on 4 byte load from argument stack slot to GPR. ------------- Commit messages: - Fix signed extension load from argument stack slot to GPR. Changes: https://git.openjdk.org/jdk/pull/18601/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18601&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329545 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18601.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18601/head:pull/18601 PR: https://git.openjdk.org/jdk/pull/18601 From chagedorn at openjdk.org Wed Apr 3 11:49:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 11:49:10 GMT Subject: RFR: 8328702: C2: Crash during parsing because sub type check is not folded In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 16:05:12 GMT, Roland Westrelin wrote: >> Thanks Roland for your review! >> >>> > When running with -XX:+ExpandSubTypeCheckAtParseTime >>> >>> Do we want to retire `ExpandSubTypeCheckAtParseTime`? Is there any reason to keep it? >> >> I'm not sure about how much benefit it gives us. A quick JBS search for "ExpandSubTypeCheckAtParseTime" revealed a few issues - but would need to double check how many of them really only triggered with that flag and were real bugs. So, apart from having it as a stress option, I don't see a real benefit for it - but that might be a good enough reason to keep it for now. >> >> What do you think? > >> I'm not sure about how much benefit it gives us. A quick JBS search for "ExpandSubTypeCheckAtParseTime" revealed a few issues - but would need to double check how many of them really only triggered with that flag and were real bugs. So, apart from having it as a stress option, I don't see a real benefit for it - but that might be a good enough reason to keep it for now. >> >> What do you think? > > It also has a maintenance cost (you had to make a code change for it in this PR and I also remember having to take `ExpandSubTypeCheckAtParseTime` into consideration at some point). I would vote for removing it unless it's known to have some value. Thanks @rwestrel and @vnkozlov for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18512#issuecomment-2034362752 From chagedorn at openjdk.org Wed Apr 3 11:49:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 11:49:12 GMT Subject: RFR: 8327110: Refactor create_bool_from_template_assertion_predicate() to separate class and fix identical cloning cases used for Loop Unswitching and Split If [v3] In-Reply-To: <8UdoeQB0Qz7Lzb-SZeOpf8V9IyXcmeKKyOHzQz0E5GE=.9550067d-93fa-4915-a06c-cbba220f2893@github.com> References: <8UdoeQB0Qz7Lzb-SZeOpf8V9IyXcmeKKyOHzQz0E5GE=.9550067d-93fa-4915-a06c-cbba220f2893@github.com> Message-ID: On Thu, 28 Mar 2024 12:32:59 GMT, Christian Hagedorn wrote: >> This is a follow-up to the previous refactoring done in https://github.com/openjdk/jdk/pull/18080. The patch starts to replace the usages of `create_bool_from_template_assertion_predicate()` by providing a refactored and fixed cloning algorithm. >> >> #### How `create_bool_from_template_assertion_predicate()` Works >> Currently, the algorithm in `create_bool_from_template_assertion_predicate()` uses an iterative DFS walk to find all nodes of a Template Assertion Predicate Expression in order to clone them. We do the following: >> 1. Follow all inputs if they could be a node that's part of a Template Assertion Predicate (compares opcodes): >> https://github.com/openjdk/jdk/blob/326c91e1a28ec70822ef927ee9ab17f79aa6d35c/src/hotspot/share/opto/loopTransform.cpp#L1513 >> >> 2. Once we find an `OpaqueLoopInit` or `OpaqueLoopStride` node, we start backtracking in the DFS. While doing so, we start to clone all nodes on the path from the `OpaqueLoop*Nodes` node to the start node and already update the graph. This logic is quite complex and difficult to understand since we do everything simultaneously. This was one of the reasons, I've originally tried to refactor this method in https://github.com/openjdk/jdk/pull/16877 because I needed to extend it for the full fix of Assertion Predicates in JDK-8288981. >> >> #### Missing Visited Set >> The current implementation of `create_bool_from_template_assertion_predicate()` does not use a visited set. This means that whenever we find a diamond shape, we could visit a node twice and re-discover all paths above this diamond again: >> >> >> ... >> | >> E >> | >> D >> / \ >> B C >> \ / >> A >> >> DFS walk: A -> B -> D -> E -> ... -> C -> D -> E -> ... >> >> With each diamond, the number of revisits of each node above doubles. >> >> #### Endless DFS in Edge-Cases >> In most cases, we would normally just stop quite quickly once we follow a data node that is not part of a Template Assertion Predicate Expression because the node opcode is different. However, in the test cases, we create a long chain of data nodes with many diamonds that could all be part of a Template Assertion Predicate Expression (i.e. `is_part_of_template_assertion_predicate_bool()` would return true to follow the inputs in a DFS walk). As a result, the DFS revisits a lot of nodes, especially higher up in the graph, exponentially many times and compilation is stuck for a long time (running the test cases result in a test timeout because... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Moved comment + better assert Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18293#issuecomment-2034364105 From amitkumar at openjdk.org Wed Apr 3 11:59:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 3 Apr 2024 11:59:08 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 11:40:44 GMT, Sidraya Jayagond wrote: > Fix sign extension on 4 byte load from argument stack slot to GPR. Please update copyright headers as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18601#issuecomment-2034381458 From epeter at openjdk.org Wed Apr 3 13:56:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 13:56:17 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 Message-ID: In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. **Details** Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. **More Backgroud / Details** This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks during `filter_packs_for_mutual_independence`, which seem to be much more stressing the dependency graph traversals. If such large dense dependency graphs turn out to be very common, we could take more drastic steps in the future: - Bail out of SuperWord if the graph gets too large. - Implement a data structure that is better for dense graphs, such as a matrix, where we mark the cell for `(n1, n2)` corresponding to the `independence(n1, n2)` query. This would make independence checks a constant time lookup, rather than a graph traversal. -------------------- I extracted a simple compile-time benchmark from `TestAlignVectorFuzzer.java`: `/oracle-work/jdk-fork2/build/linux-x64/jdk/bin/java -XX:CompileCommand=printcompilation,TestGraph2::* -XX:CompileCommand=compileonly,TestGraph2::test* -XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=0 -XX:LoopUnrollLimit=1000 -Xbatch TestGraph2.java` With patch: C2 Compile Time: 8.234 s IdealLoop: 8.170 s AutoVectorize: 7.789 s master: C2 Compile Time: 56.223 s IdealLoop: 56.017 s AutoVectorize: 55.576 s import java.util.Random; class TestGraph2 { private static final Random random = new Random(); static final int RANGE_CON = 1024 * 8; static int init = 593436; static int limit = 599554; static int offset1 = -592394; static int offset2 = -592386; static final int offset3 = -592394; static final int stride = 4; static final int scale = 1; static final int hand_unrolling1 = 2; static final int hand_unrolling2 = 8; static final int hand_unrolling3 = 15; public static void main(String[] args) { byte[] aB = generateB(); byte[] bB = generateB(); byte[] cB = generateB(); for (int i = 1; i < 100; i++) { testUUBBBH(aB, bB, cB); } } static byte[] generateB() { byte[] a = new byte[RANGE_CON]; for (int i = 0; i < a.length; i++) { a[i] = (byte)random.nextInt(); } return a; } static Object[] testUUBBBH(byte[] a, byte[] b, byte[] c) { int h1 = hand_unrolling1; int h2 = hand_unrolling2; int h3 = hand_unrolling3; for (int i = init; i < limit; i += stride) { if (h1 >= 1) { a[offset1 + i * scale + 0]++; } if (h1 >= 2) { a[offset1 + i * scale + 1]++; } if (h1 >= 3) { a[offset1 + i * scale + 2]++; } if (h1 >= 4) { a[offset1 + i * scale + 3]++; } if (h1 >= 5) { a[offset1 + i * scale + 4]++; } if (h1 >= 6) { a[offset1 + i * scale + 5]++; } if (h1 >= 7) { a[offset1 + i * scale + 6]++; } if (h1 >= 8) { a[offset1 + i * scale + 7]++; } if (h1 >= 9) { a[offset1 + i * scale + 8]++; } if (h1 >= 10) { a[offset1 + i * scale + 9]++; } if (h1 >= 11) { a[offset1 + i * scale + 10]++; } if (h1 >= 12) { a[offset1 + i * scale + 11]++; } if (h1 >= 13) { a[offset1 + i * scale + 12]++; } if (h1 >= 14) { a[offset1 + i * scale + 13]++; } if (h1 >= 15) { a[offset1 + i * scale + 14]++; } if (h1 >= 16) { a[offset1 + i * scale + 15]++; } if (h2 >= 1) { b[offset2 + i * scale + 0]++; } if (h2 >= 2) { b[offset2 + i * scale + 1]++; } if (h2 >= 3) { b[offset2 + i * scale + 2]++; } if (h2 >= 4) { b[offset2 + i * scale + 3]++; } if (h2 >= 5) { b[offset2 + i * scale + 4]++; } if (h2 >= 6) { b[offset2 + i * scale + 5]++; } if (h2 >= 7) { b[offset2 + i * scale + 6]++; } if (h2 >= 8) { b[offset2 + i * scale + 7]++; } if (h2 >= 9) { b[offset2 + i * scale + 8]++; } if (h2 >= 10) { b[offset2 + i * scale + 9]++; } if (h2 >= 11) { b[offset2 + i * scale + 10]++; } if (h2 >= 12) { b[offset2 + i * scale + 11]++; } if (h2 >= 13) { b[offset2 + i * scale + 12]++; } if (h2 >= 14) { b[offset2 + i * scale + 13]++; } if (h2 >= 15) { b[offset2 + i * scale + 14]++; } if (h2 >= 16) { b[offset2 + i * scale + 15]++; } if (h3 >= 1) { c[offset3 + i * scale + 0]++; } if (h3 >= 2) { c[offset3 + i * scale + 1]++; } if (h3 >= 3) { c[offset3 + i * scale + 2]++; } if (h3 >= 4) { c[offset3 + i * scale + 3]++; } if (h3 >= 5) { c[offset3 + i * scale + 4]++; } if (h3 >= 6) { c[offset3 + i * scale + 5]++; } if (h3 >= 7) { c[offset3 + i * scale + 6]++; } if (h3 >= 8) { c[offset3 + i * scale + 7]++; } if (h3 >= 9) { c[offset3 + i * scale + 8]++; } if (h3 >= 10) { c[offset3 + i * scale + 9]++; } if (h3 >= 11) { c[offset3 + i * scale + 10]++; } if (h3 >= 12) { c[offset3 + i * scale + 11]++; } if (h3 >= 13) { c[offset3 + i * scale + 12]++; } if (h3 >= 14) { c[offset3 + i * scale + 13]++; } if (h3 >= 15) { c[offset3 + i * scale + 14]++; } if (h3 >= 16) { c[offset3 + i * scale + 15]++; } } return new Object[]{ a, b, c }; } } ------------- Commit messages: - Load / Store precedence - add some comments and asserts - ensure Load/Store order - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression - 8327978 Changes: https://git.openjdk.org/jdk/pull/18532/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327978 Stats: 46 lines in 2 files changed: 44 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From chagedorn at openjdk.org Wed Apr 3 14:18:08 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 3 Apr 2024 14:18:08 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: Message-ID: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> On Thu, 28 Mar 2024 16:43:49 GMT, Roland Westrelin wrote: >> Ok. Makes sense. Thanks for the explanation. > > Then isn't there a risk that after some transformation the `CastPP` ends up with an input that's a constant superklass which would cause the `CastPP` to transform to top? Hm, you're right. That's the very same problem. So, a `CastPP` does not really work here. Should we go back to the previously suggested version without `CastPP`? if (improved_klass_ptr_type != klass_ptr_type) { if (improved_klass_ptr_type->singleton()) { improved_superklass = makecon(improved_klass_ptr_type); } else { superklass->raise_bottom_type(improved_klass_ptr_type); _gvn.set_type(superklass, improved_klass_ptr_type); } } It's a best effort solution. We might still miss opportunities to remove sub type checks later where we could call `try_improve()` again. But at least the `SubTypeCheck` is now in sync with the `CheckCastPP` that is also only improved at parse time with `try_improve()` and not later anymore. We could think about using `try_improve()` during IGVN as well to get better type information later. But I suggest to do that separately to this fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1549842229 From roland at openjdk.org Wed Apr 3 14:28:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Apr 2024 14:28:00 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Wed, 3 Apr 2024 14:15:20 GMT, Christian Hagedorn wrote: > Should we go back to the previously suggested version without CastPP? Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1549862737 From kxu at openjdk.org Wed Apr 3 14:34:09 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 3 Apr 2024 14:34:09 GMT Subject: RFR: 8327381 Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v3] In-Reply-To: <89nYlsd2Lj9mt0Dy6ws1mP2sXEoyj5kGC6KlSvw-m9k=.cf67d0a4-3528-48e7-b4cf-864bf39b9711@github.com> References: <6mb_BOei2bIRzPvulo4SkaWGa9EXjiBIFfKTIAAWdCU=.86b2b6f0-7e06-4b4d-9881-593577b43184@github.com> <89nYlsd2Lj9mt0Dy6ws1mP2sXEoyj5kGC6KlSvw-m9k=.cf67d0a4-3528-48e7-b4cf-864bf39b9711@github.com> Message-ID: On Tue, 19 Mar 2024 16:16:49 GMT, Emanuel Peter wrote: >> Oops. Package name updated. Sorry for such a rookie mistake! > > @tabjy I am re-running testing, then will re-review. Hi @eme64, I'm wondering if you could kindly take a look at the updated commits. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18198#issuecomment-2034792214 From roland at openjdk.org Wed Apr 3 14:36:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Apr 2024 14:36:10 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Wed, 3 Apr 2024 14:24:54 GMT, Roland Westrelin wrote: >> Hm, you're right. That's the very same problem. So, a `CastPP` does not really work here. Should we go back to the previously suggested version without `CastPP`? >> >> if (improved_klass_ptr_type != klass_ptr_type) { >> if (improved_klass_ptr_type->singleton()) { >> improved_superklass = makecon(improved_klass_ptr_type); >> } else { >> superklass->raise_bottom_type(improved_klass_ptr_type); >> _gvn.set_type(superklass, improved_klass_ptr_type); >> } >> } >> >> It's a best effort solution. We might still miss opportunities to remove sub type checks later where we could call `try_improve()` again. But at least the `SubTypeCheck` is now in sync with the `CheckCastPP` that is also only improved at parse time with `try_improve()` and not later anymore. >> >> We could think about using `try_improve()` during IGVN as well to get better type information later. But I suggest to do that separately to this fix. > >> Should we go back to the previously suggested version without CastPP? > > Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1549875327 From epeter at openjdk.org Wed Apr 3 15:00:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Apr 2024 15:00:36 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v19] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: WIP refactoring, in broken state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/d97fa2b4..118c3666 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=17-18 Stats: 756 lines in 4 files changed: 445 ins; 309 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From sjayagond at openjdk.org Wed Apr 3 15:10:25 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 3 Apr 2024 15:10:25 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register [v2] In-Reply-To: References: Message-ID: > Fix sign extension on 4 byte load from argument stack slot to GPR. Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: copyright header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18601/files - new: https://git.openjdk.org/jdk/pull/18601/files/8708acbb..e4e9729d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18601&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18601&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18601.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18601/head:pull/18601 PR: https://git.openjdk.org/jdk/pull/18601 From duke at openjdk.org Wed Apr 3 16:02:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 3 Apr 2024 16:02:22 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: References: Message-ID: > Hello everyone! Please review this non-vectorized implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|thr... ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: - Dispose of some unneeded instructions - Move buf_end up - Add missing instructions for accum function split - Prettify labels and accum function - Split accum function - Eliminate L_nmax loop counter - Move repeating code under function - Add `enter` and `leave` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/cb1036cd..b9512458 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=00-01 Stats: 187 lines in 1 file changed: 46 ins; 118 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From duke at openjdk.org Wed Apr 3 16:02:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Wed, 3 Apr 2024 16:02:22 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: <6VkrGNhwJb0yXsm9qgOicZ8aiHkHnX7dynR1TrXOp5A=.f2c9d34b-1e1c-4670-b0a0-33d46c2781fd@github.com> References: <6VkrGNhwJb0yXsm9qgOicZ8aiHkHnX7dynR1TrXOp5A=.f2c9d34b-1e1c-4670-b0a0-33d46c2781fd@github.com> Message-ID: On Fri, 29 Mar 2024 02:15:14 GMT, Fei Yang wrote: >> ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Dispose of some unneeded instructions >> - Move buf_end up >> - Add missing instructions for accum function split >> - Prettify labels and accum function >> - Split accum function >> - Eliminate L_nmax loop counter >> - Move repeating code under function >> - Add `enter` and `leave` > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5086: > >> 5084: const uint64_t BASE = 0xfff1; >> 5085: const uint64_t NMAX = 0x15B0; >> 5086: > > I think it's better to start a new frame on stub enter and exit with `__ enter()` and `__ leave()` respectively for proper stackwalking of RuntimeStub frame. Fixed, thanks! > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5160: > >> 5158: __ srli(temp2, temp0, 56); >> 5159: __ add(s1, s1, temp2); >> 5160: __ add(s2, s2, s1); > > I see a lot of duplicate logic in this function. Can we factor out some common logic as separate functions? Like generate_updateBytesAdler32_accum_16, generate_updateBytesAdler32_accum_8, etc. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1550032999 PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1550032956 From duke at openjdk.org Wed Apr 3 19:22:10 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Apr 2024 19:22:10 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v7] In-Reply-To: References: <7HWfk5Q4A6yM9yzfiKxWhZ3cuswzWEvqzChKLtSdHT8=.a6cd68c4-fa17-47cd-bd91-a6668ba84d00@github.com> Message-ID: On Wed, 3 Apr 2024 08:44:07 GMT, Aleksey Shipilev wrote: >> Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: >> >> - Statistics for barriers generated/eliminated >> - global flag to turn on storestore barrier emission and membar acquires >> IR tests > > src/hotspot/share/opto/memnode.cpp line 3438: > >> 3436: } >> 3437: } >> 3438: } else if (opc == Op_MemBarRelease || opc == Op_MemBarStoreStore) { > > Should be protected by a feature flag? I don't think the feature flag is needed for these cases. These are optimizations to support the changes in code shape introduced by the feature, not the feature itself. If the feature flag is off, there should be no change in behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1550322871 From duke at openjdk.org Wed Apr 3 19:31:38 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Apr 2024 19:31:38 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v8] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Make flag product diagnostic and guard string concat storestore by flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/33d23635..582848c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=06-07 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From kvn at openjdk.org Wed Apr 3 21:37:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 21:37:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Thu, 28 Mar 2024 00:45:33 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix L2F cvtsi2ssq I have one question for changes in assembler code. I see you avoided `xor` for instruction with memory by executing them only without AVX. I will run our performance testing to see if this change affects performance. Eric did run it but I don't know which version. And I will run regular testing too. src/hotspot/cpu/x86/assembler_x86.cpp line 2034: > 2032: InstructionAttr attributes(AVX_128bit, /* rex_w */ VM_Version::supports_evex(), /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 2033: attributes.set_rex_vex_w_reverted(); > 2034: int encode = simd_prefix_and_encode(dst, src, src, VEX_SIMD_F2, VEX_OPCODE_0F, &attributes); Can you explain this change? ------------- PR Review: https://git.openjdk.org/jdk/pull/18503#pullrequestreview-1978069711 PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2035636844 PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1550510499 From kvn at openjdk.org Wed Apr 3 21:50:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 21:50:11 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: On Tue, 2 Apr 2024 06:28:18 GMT, Emanuel Peter wrote: >> **Problem** >> In [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) / https://git.openjdk.org/jdk/pull/14785 I fixed the alignment with `AlignVector`. For that, I had to compute `abs(scale)` and `abs(stride)`, as well as `scale * stride`. The issue is that all of these values can overflow the int range (e.g. `abs(min_int) = min_int`). >> >> We hit asserts like: >> >> `# assert(is_power_of_2(value)) failed: value must be a power of 2: 0xffffffff80000000` >> Happens because we take `abs(min_int)`, which is `min_int = 0x80000000`, and assuming this was a positive (unsigned) number is a power of 2 `2^31`. We then expand it to `long`, get `0xffffffff80000000`, which is not a power of 2 anymore. This violates the implicit assumptions, and we hit the assert. >> >> `# assert(q >= 1) failed: modulo value must be large enough` >> We have `scale = 2^30` and `stride = 4 = 2^2`. For the alignment calculation we compute `scale * stride = 2^32`, which overflows the int range and becomes zero. >> >> Before [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) we could get similar issues with the (old) code in `SuperWord::ref_is_alignable`, if `AlignVector` is enabled: >> >> >> int span = preloop_stride * p.scale_in_bytes(); >> ... >> if (vw % span == 0) { >> >> >> if `span == 0` because of overflow, then the `idiv` from the modulo gets a division by zero -> `SIGFPE`. >> >> But it seems the bug is possibly a regression from JDK20 b2 [JDK-8286197](https://bugs.openjdk.org/browse/JDK-8286197). Here we enabled certaint Unsafe memory access address patterns, and it is such patterns that the reproducer requires. >> >> **Solution** >> I could either patch up all the code that works with `scale` and `stride`, and make sure no overflows ever happen. But that is quite involved and error prone. >> >> I now just disable vectorization for large `scale` and `stride`. This should not have any performance impact, because such large `scale` and `stride` would lead to highly inefficient memory accesses, since they are spaced very far apart. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'JDK-8328938-abs-min-int-assert' of https://github.com/eme64/jdk into JDK-8328938-abs-min-int-assert > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8328938-abs-min-int-assert > - improve comments > - 8328938 Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18485#pullrequestreview-1978125749 From kvn at openjdk.org Wed Apr 3 21:50:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 21:50:11 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: On Wed, 3 Apr 2024 06:44:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectorization.cpp line 411: >> >>> 409: abs(long_stride) >= max_val || >>> 410: abs(long_scale * long_stride) >= max_val) { >>> 411: assert(!valid(), "adr stride*scale is too large"); >> >> Why you need assert? > > If you look a few lines up, you can see that all other "bailouts" also check that the VPointer is invalid. I am simply matching the surrounding code. And it also makes it explicit, that the VPointer will be invalid, which is what I want. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18485#discussion_r1550551563 From kvn at openjdk.org Wed Apr 3 21:53:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 21:53:09 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v6] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 07:17:34 GMT, Emanuel Peter wrote: >> This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). >> >> Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. >> >> I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. >> >> There are now only a few cases where we cannot use the cached `VPointer`: >> - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. >> - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). >> >> This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. >> >> **Benchmarking SuperWord Compile Time** >> >> I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. >> >> On master: >> >> C2 Compile Time: 56.816 s >> IdealLoop: 56.604 s >> AutoVectorize: 56.192 s >> >> >> With this patch: >> >> C2 Compile Time: 49.719 s >> IdealLoop: 49.509 s >> AutoVectorize: 49.106 s >> >> >> This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > split up _vpointers.compute_vpointers Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18577#pullrequestreview-1978129713 From kvn at openjdk.org Wed Apr 3 21:53:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 21:53:09 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 06:50:04 GMT, Emanuel Peter wrote: > It is that arena that I pass into all submodules, such as `VLoopVPointer`. `VLoopAnalyzer` is stack allocated, so once the destructor removes its `_arena`, all submodules are also automatically deallocated. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18577#issuecomment-2035661524 From kvn at openjdk.org Wed Apr 3 22:39:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 22:39:11 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:05 GMT, Emanuel Peter wrote: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... src/hotspot/share/opto/superword.cpp line 3125: > 3123: for (DUIterator_Fast imax, i = mem->fast_outs(imax); i < imax; i++) { > 3124: Node* mem_use = mem->fast_out(i); > 3125: if (_vloop.in_bb(mem_use) && !visited.test(bb_idx(mem_use)) && mem_use->is_Store()) { `mem_use->is_Store()` check is cheap and should be first. It will also help to skip other checks for Load node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1550603988 From kvn at openjdk.org Wed Apr 3 23:31:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Apr 2024 23:31:11 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Thu, 28 Mar 2024 00:45:33 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix L2F cvtsi2ssq Next tests failed when running with `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` flags compiler/intrinsics/zip/TestFpRegsABI.java compiler/loopopts/superword/TestCmpInvar.java # Internal Error (/workspace/open/src/hotspot/cpu/x86/assembler_x86.cpp:11719), pid=955891, tid=955918 # assert(((!attributes->uses_vl()) || (attributes->get_vector_len() == AVX_512bit) || (!_legacy_mode_vl) || (attributes->is_legacy_mode()))) failed: XMM register should be 0-15 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-04-03-2139260.vladimir.kozlov.jdkgit2, mixed mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x633784] Assembler::vex_prefix_and_encode(int, int, int, Assembler::VexSimdPrefix, Assembler::VexOpcode, InstructionAttr*) [clone .constprop.1]+0x284 # Current CompileTask: C2:237 45 % b compiler.intrinsics.zip.TestFpRegsABI$TestIntrinsic::calcValue @ 6 (661 bytes) Stack: [0x00007f03e044b000,0x00007f03e054b000], sp=0x00007f03e0546830, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x633784] Assembler::vex_prefix_and_encode(int, int, int, Assembler::VexSimdPrefix, Assembler::VexOpcode, InstructionAttr*) [clone .constprop.1]+0x284 (assembler_x86.cpp:11719) V [libjvm.so+0x65e21e] Assembler::pxor(XMMRegister, XMMRegister)+0x5e (assembler_x86.cpp:8258) V [libjvm.so+0x3a5885] convI2D_reg_regNode::emit(CodeBuffer&, PhaseRegAlloc*) const+0x135 (x86_64.ad:10097) V [libjvm.so+0x14d4386] PhaseOutput::scratch_emit_size(Node const*)+0x376 (output.cpp:3366) V [libjvm.so+0x14ccaca] PhaseOutput::shorten_branches(unsigned int*)+0x34a (output.cpp:544) V [libjvm.so+0x14de41a] PhaseOutput::Output()+0xa1a (output.cpp:345) V [libjvm.so+0x9ec52c] Compile::Code_Gen()+0x4ac (compile.cpp:3031) V [libjvm.so+0x9ef0a6] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c36 (compile.cpp:894) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2035803789 From duke at openjdk.org Thu Apr 4 02:38:02 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 4 Apr 2024 02:38:02 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Wed, 3 Apr 2024 23:28:48 GMT, Vladimir Kozlov wrote: > Next tests failed when running with `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` flags compiler/intrinsics/zip/TestFpRegsABI.java compiler/loopopts/superword/TestCmpInvar.java Thank you, Vladimir (@vnkozlov). Will look into the test and fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2036043991 From epeter at openjdk.org Thu Apr 4 05:04:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:04:21 GMT Subject: RFR: 8328938: C2 SuperWord: disable vectorization for large stride and scale [v4] In-Reply-To: References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: <46rL4cMaqglDePH3QaO-RGkPAzDwCpoa5NPKO3ewvk0=.bafadcd6-368c-4335-bbfc-33738f3c9bec@github.com> On Thu, 28 Mar 2024 16:14:41 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'JDK-8328938-abs-min-int-assert' of https://github.com/eme64/jdk into JDK-8328938-abs-min-int-assert >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8328938-abs-min-int-assert >> - improve comments >> - 8328938 > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18485#issuecomment-2036194509 From epeter at openjdk.org Thu Apr 4 05:04:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:04:22 GMT Subject: Integrated: 8328938: C2 SuperWord: disable vectorization for large stride and scale In-Reply-To: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> References: <9oXO4yuvZbpAxofIUBGVwJ2WyBLPWcP2IHxqZg5nQNQ=.f8f9365c-56c5-4fa9-8075-880f432ac214@github.com> Message-ID: On Tue, 26 Mar 2024 10:03:29 GMT, Emanuel Peter wrote: > **Problem** > In [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) / https://git.openjdk.org/jdk/pull/14785 I fixed the alignment with `AlignVector`. For that, I had to compute `abs(scale)` and `abs(stride)`, as well as `scale * stride`. The issue is that all of these values can overflow the int range (e.g. `abs(min_int) = min_int`). > > We hit asserts like: > > `# assert(is_power_of_2(value)) failed: value must be a power of 2: 0xffffffff80000000` > Happens because we take `abs(min_int)`, which is `min_int = 0x80000000`, and assuming this was a positive (unsigned) number is a power of 2 `2^31`. We then expand it to `long`, get `0xffffffff80000000`, which is not a power of 2 anymore. This violates the implicit assumptions, and we hit the assert. > > `# assert(q >= 1) failed: modulo value must be large enough` > We have `scale = 2^30` and `stride = 4 = 2^2`. For the alignment calculation we compute `scale * stride = 2^32`, which overflows the int range and becomes zero. > > Before [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) we could get similar issues with the (old) code in `SuperWord::ref_is_alignable`, if `AlignVector` is enabled: > > > int span = preloop_stride * p.scale_in_bytes(); > ... > if (vw % span == 0) { > > > if `span == 0` because of overflow, then the `idiv` from the modulo gets a division by zero -> `SIGFPE`. > > But it seems the bug is possibly a regression from JDK20 b2 [JDK-8286197](https://bugs.openjdk.org/browse/JDK-8286197). Here we enabled certaint Unsafe memory access address patterns, and it is such patterns that the reproducer requires. > > **Solution** > I could either patch up all the code that works with `scale` and `stride`, and make sure no overflows ever happen. But that is quite involved and error prone. > > I now just disable vectorization for large `scale` and `stride`. This should not have any performance impact, because such large `scale` and `stride` would lead to highly inefficient memory accesses, since they are spaced very far apart. This pull request has now been integrated. Changeset: 29314587 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2931458711244e20eb7845a1aefcf6ed4206bce1 Stats: 272 lines in 2 files changed: 272 ins; 0 del; 0 mod 8328938: C2 SuperWord: disable vectorization for large stride and scale Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18485 From epeter at openjdk.org Thu Apr 4 05:07:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:07:25 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 22:36:16 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Vladimir: check is_Store first > > src/hotspot/share/opto/superword.cpp line 3125: > >> 3123: for (DUIterator_Fast imax, i = mem->fast_outs(imax); i < imax; i++) { >> 3124: Node* mem_use = mem->fast_out(i); >> 3125: if (_vloop.in_bb(mem_use) && !visited.test(bb_idx(mem_use)) && mem_use->is_Store()) { > > `mem_use->is_Store()` check is cheap and should be first. It will also help to skip other checks for Load node. Sure, I can do that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1550926982 From epeter at openjdk.org Thu Apr 4 05:07:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:07:24 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v2] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Vladimir: check is_Store first ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/cd6c401c..87892b7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From epeter at openjdk.org Thu Apr 4 05:15:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:15:03 GMT Subject: RFR: 8326962: C2 SuperWord: cache VPointer [v4] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 21:50:10 GMT, Vladimir Kozlov wrote: >>> One question: will VLoopAnalyzer default destructor clean up all memory used? >> >> @vnkozlov there is no need, since it is all allocated over the `Arena` in `VLoopAnalyzer`: >> >> >> // Arena for all submodules >> Arena _arena; >> >> >> It is that arena that I pass into all submodules, such as `VLoopVPointer`. `VLoopAnalyzer` is stack allocated, so once the destructor removes its `_arena`, all submodules are also automatically deallocated. > >> It is that arena that I pass into all submodules, such as `VLoopVPointer`. `VLoopAnalyzer` is stack allocated, so once the destructor removes its `_arena`, all submodules are also automatically deallocated. > > Good. Thanks @vnkozlov @chhagedorn @jdksjolen for the reviews and suggestions! @jdksjolen feel free to give me your ideas about Arena-allocation, I can still improve in a follow-up RFE ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18577#issuecomment-2036203197 From epeter at openjdk.org Thu Apr 4 05:15:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:15:03 GMT Subject: Integrated: 8326962: C2 SuperWord: cache VPointer In-Reply-To: References: Message-ID: On Tue, 2 Apr 2024 09:04:45 GMT, Emanuel Peter wrote: > This is a subtask of [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361). > > Parsing `VPointer` currently happens all over SuperWord. And often in quadratic loops, where we compare all-with-all loads/stores. > > I propose to cache the `VPointer`s, then we can do a constant-time cache lookup rather than parsing the pointer subgraph every time. > > There are now only a few cases where we cannot use the cached `VPointer`: > - `SuperWord::unrolling_analysis`: we have no `VLoopAnalyzer`, and so no submodules like `VLoopPointers`. We don't need to cache, since we only iterate over the loop body once, and create only a single `VPointer` per memop. > - `SuperWord::output`: when we have a `Load`, and try to bypass `StoreVector` nodes. The `StoreVector` nodes are new, and so we have no cached `VPointer` for them. This could be fixed somehow, but I don't want to deal with it now. I intend to refactor `SuperWord::output` soon, and can look into options at that point (either I bypass before we insert the vector nodes, or I remember what scalar memop the vector was created from, and then get the cached pointer this way). > > This changeset is also a preparation step for [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155). I will have a list of pointers, and sort them such that creating adjacent refs is much more efficient. > > **Benchmarking SuperWord Compile Time** > > I use the same benchmark from https://github.com/openjdk/jdk/pull/18532. > > On master: > > C2 Compile Time: 56.816 s > IdealLoop: 56.604 s > AutoVectorize: 56.192 s > > > With this patch: > > C2 Compile Time: 49.719 s > IdealLoop: 49.509 s > AutoVectorize: 49.106 s > > > This saves us about `7 sec`, which is significant. I will have to see what it effect it has once we also apply https://github.com/openjdk/jdk/pull/18532, but I think the combined effect will be very significant. This pull request has now been integrated. Changeset: f762637b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f762637be2568f898db25aa6a57c180f1feac3a3 Stats: 190 lines in 5 files changed: 138 ins; 13 del; 39 mod 8326962: C2 SuperWord: cache VPointer Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18577 From epeter at openjdk.org Thu Apr 4 05:21:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 05:21:24 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v3] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression - For Vladimir: check is_Store first - Load / Store precedence - add some comments and asserts - ensure Load/Store order - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression - 8327978 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/87892b7c..161cf7d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=01-02 Stats: 7155 lines in 144 files changed: 2745 ins; 3308 del; 1102 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From chagedorn at openjdk.org Thu Apr 4 06:05:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 06:05:15 GMT Subject: Integrated: 8328702: C2: Crash during parsing because sub type check is not folded In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 13:48:09 GMT, Christian Hagedorn wrote: > The test case shows a problem where data is folded during parsing while control is not. This leaves the graph in a broken state and we fail with an assertion. > > We have the following (pseudo) code for some class `X`: > > o = flag ? new Object[] : new byte[]; > if (o instanceof X) { > X x = (X)o; // checkcast > } > > For the `checkcast`, C2 knows that the type of `o` is some kind of array, i.e. type `[bottom`. But this cannot be a sub type of `X`. Therefore, the `CheckCastPP` node created for the `checkcast` result is replaced by top by the type system. However, the `SubTypeCheckNode` for the `checkcast` is not folded and the graph is broken. > > The problem of not folding the `SubTypeCheckNode` can be traced back to `SubTypeCheckNode::sub` calling `static_subtype_check()` when transforming the node after it's creation. `static_subtype_check()` should detect that the sub type check is always wrong here: > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/compile.cpp#L4454-L4460 > > But it does not because these two checks return the following: > 1. Check: is `o` a sub type of `X`? -> returns no, so far so good. > 2. Check: _could_ `o` be a sub type of `X`? -> returns no which is wrong! `[bottom` is only a sub type of `Object` and can never be a subtype of `X` > > In `maybe_java_subtype_of_helper_for_arr()`, we wrongly conclude that any array with a base element type `bottom` _could_ be a sub type of anything: > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/type.cpp#L6462-L6465 > But this is only true if the super class is also an array class - but not if `other` (super klass) is an instance klass as in this case. > > The fix for this is to first check the immediately following check which handles the case of comparing an array klass to an instance klass: An array klass can only ever be a sub class of an instance klass if it's the `Object` class. But in our case, we have `X` and this would return false: > > https://github.com/openjdk/jdk/blob/d0a265039a36292d87b249af0e8977982e5acc7b/src/hotspot/share/opto/type.cpp#L6466-L6468 > > The very same problem can also be triggered with `X` being an interface instead. There are tests for both these cases. > > #### Additionally Required Fix > When running with `-XX:+ExpandSubTypeCheckAtParseTime`, we eagerly expand the sub type check during parsing and therefore do not emit a `SubTypeCheckNode`. When additionally running with `-XX:+StressReflectiveCode`, th... This pull request has now been integrated. Changeset: e5e21a8a Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e5e21a8a6e64466f9cda2064aa2723a15d4ae86a Stats: 143 lines in 3 files changed: 139 ins; 2 del; 2 mod 8328702: C2: Crash during parsing because sub type check is not folded Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18512 From chagedorn at openjdk.org Thu Apr 4 06:08:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 06:08:13 GMT Subject: Integrated: 8327110: Refactor create_bool_from_template_assertion_predicate() to separate class and fix identical cloning cases used for Loop Unswitching and Split If In-Reply-To: References: Message-ID: On Thu, 14 Mar 2024 07:10:30 GMT, Christian Hagedorn wrote: > This is a follow-up to the previous refactoring done in https://github.com/openjdk/jdk/pull/18080. The patch starts to replace the usages of `create_bool_from_template_assertion_predicate()` by providing a refactored and fixed cloning algorithm. > > #### How `create_bool_from_template_assertion_predicate()` Works > Currently, the algorithm in `create_bool_from_template_assertion_predicate()` uses an iterative DFS walk to find all nodes of a Template Assertion Predicate Expression in order to clone them. We do the following: > 1. Follow all inputs if they could be a node that's part of a Template Assertion Predicate (compares opcodes): > https://github.com/openjdk/jdk/blob/326c91e1a28ec70822ef927ee9ab17f79aa6d35c/src/hotspot/share/opto/loopTransform.cpp#L1513 > > 2. Once we find an `OpaqueLoopInit` or `OpaqueLoopStride` node, we start backtracking in the DFS. While doing so, we start to clone all nodes on the path from the `OpaqueLoop*Nodes` node to the start node and already update the graph. This logic is quite complex and difficult to understand since we do everything simultaneously. This was one of the reasons, I've originally tried to refactor this method in https://github.com/openjdk/jdk/pull/16877 because I needed to extend it for the full fix of Assertion Predicates in JDK-8288981. > > #### Missing Visited Set > The current implementation of `create_bool_from_template_assertion_predicate()` does not use a visited set. This means that whenever we find a diamond shape, we could visit a node twice and re-discover all paths above this diamond again: > > > ... > | > E > | > D > / \ > B C > \ / > A > > DFS walk: A -> B -> D -> E -> ... -> C -> D -> E -> ... > > With each diamond, the number of revisits of each node above doubles. > > #### Endless DFS in Edge-Cases > In most cases, we would normally just stop quite quickly once we follow a data node that is not part of a Template Assertion Predicate Expression because the node opcode is different. However, in the test cases, we create a long chain of data nodes with many diamonds that could all be part of a Template Assertion Predicate Expression (i.e. `is_part_of_template_assertion_predicate_bool()` would return true to follow the inputs in a DFS walk). As a result, the DFS revisits a lot of nodes, especially higher up in the graph, exponentially many times and compilation is stuck for a long time (running the test cases result in a test timeout because background compilation is disabled). > > #### New DFS Implem... This pull request has now been integrated. Changeset: f26e4308 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/f26e4308992d989d71e7fbfaa3feb95f0ea17c06 Stats: 378 lines in 9 files changed: 367 ins; 0 del; 11 mod 8327110: Refactor create_bool_from_template_assertion_predicate() to separate class and fix identical cloning cases used for Loop Unswitching and Split If Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18293 From epeter at openjdk.org Thu Apr 4 07:22:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 07:22:32 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v20] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: is_adjacent_input_pair ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/118c3666..de4d90ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=18-19 Stats: 187 lines in 4 files changed: 66 ins; 119 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 08:00:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 08:00:18 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v21] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor combining the stores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/de4d90ac..2ea69739 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=19-20 Stats: 206 lines in 1 file changed: 100 ins; 96 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 08:10:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 08:10:26 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v22] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - fixed small issue v2 - fixed small issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/2ea69739..b282ea35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=20-21 Stats: 11 lines in 1 file changed: 4 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From mli at openjdk.org Thu Apr 4 08:46:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Apr 2024 08:46:11 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics In-Reply-To: References: Message-ID: On Thu, 22 Feb 2024 13:51:01 GMT, Emanuel Peter wrote: >> Thanks all for your suggestions. >> >> @chhagedorn @eme64 >> I just tried to modify the test with IR test framework, but seems it requires running in driver mode, i.e. '@run driver`, and seems `driver` mode can not be run in `manual` at the same time. Any suggestion? Or should I just follow @vnkozlov 's sugguestion to make sure golden value is retrieved via an intepreted method without compilation? >> >> @chhagedorn >>> It's good to add some tests for that. Have you considered using IR tests instead? This could simplify the test and result verification and also add the benefit of sanity checking whether we actually used the intrinsic with matching RoundD in the IR, for example. >> >> For the IR verification, there is already test cover this case, so maybe we can skip it in this test? Although it does not harm to verify IR in this test, but as above mentioned `driver` can not work together with `manual`, so we are good to just skip this case in this test? >> >> @eme64 >>> It would be nice to also have different kinds of inputs: randomized, and for floats also inf, nan, etc. >> >> The test is verifying the whole range of 32/64 bits, so I think it includes all special values, like nan, inf, etc, and randam is not necessary anymore in this sense. Does this make sense? >> >>> I think your new tests should not go into an old "cr" directory. >> >> I will move the test to other directory rather than "cr". >> >>> Until that is all in place, you should do it like in this test: >>> test/hotspot/jtreg/compiler/vectorization/TestOptionVectorizeIR.java >>> (you can see the gold values, and the @IR rules) >> >> As we can not use IR test framework, this is out of discussion? >> But, we can make sure to use the golden value from an interpreted method as @vnkozlov suggested. > > @Hamlin-Li Thanks for the work and your response! > Ah, I see now. This is a `manual` test. I did not even know that this mode existed! > > A few comments: > - We don't have any other manual tests. These tests are only run manually, and not automatically. Probably nobody will ever run this test again, hardly anybody knows that this option even exists. > - You iterate over a range of `2^64`. How long does this take on your machine? Did this even ever complete? This will run in the order of days, if not weeks or months on a single machine. > - You construct the float/double values manually, by adding mantissa and exponent. But what about all the values that are outside this range? Or are you sure you covered all `2^64` bit values? > - You still compare both in potentially compiled code versions. That will not do, we may mis-compile both in the same way. > - You don't have any IR verification about what was done by the compiler. > > Suggestions: > - Your test must be executable in automatic mode, and not just in manual. This ensures we run the test in CI, and here on GitHub actions. > - Instead of iterating over **all possible values**, you should do it **randomly**. This way the test can run for a limited number or inputs, and terminate in reasonable time. > - You should do it using the IR framework, so you can verify the IR nodes. The IR framework will soon support automatic random input generation and result verification. But for now you will have to do the input generation and verification against "golden" value from the interpreter yourself. > - The random values should mix in `nan, infty, +-0.0, ...` with a higher frequency than just random values taken via `random.nextLong()` converted to double via `Double.longBitsToDouble`. > > Other thoughts: > - If your really really want the manual mode with exhaustive iteration over int and long range, then you can put that in, but only in addition to a IR framework test with random values. > - Rather than focusing on doing just `Math.round`, we should probably do this with other (and maybe all) operations. I quickly checked for things like `Math.sqrt` and a few others. For many there is no good random input test with result verification. Hey @eme64 , Can you have another look? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2036559409 From aph at openjdk.org Thu Apr 4 09:21:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Apr 2024 09:21:11 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> Message-ID: On Wed, 20 Mar 2024 19:11:34 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add more tests; add more IR filter for Double tests test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 118: > 116: > 117: // generate input arrays for testing, then run tests & verify results > 118: There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. You can make the test much faster by copy-and-pasting the library code for Math.round(float) and letting the JIT compile it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1551305494 From mli at openjdk.org Thu Apr 4 09:59:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Apr 2024 09:59:02 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> Message-ID: On Thu, 4 Apr 2024 09:18:50 GMT, Andrew Haley wrote: > There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. In previous discussion, there are several reasons why it's implemented in this way: 1. test the whole range of 32 bits is slow, and even slow for a 64 ranges double. 2. if it's too slow, then it's not feasible to make it an automatic test. these are expected by @eme64. > You can make the test much faster by copy-and-pasting the library code for Math.round(float) and letting the JIT compile it. Previously, I had [this question](https://github.com/openjdk/jdk/pull/17753#issuecomment-1992519401), but from the point view of correctness of the golden value. I think you make another point to change from @DontCompile to copying library java code. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1551367964 From epeter at openjdk.org Thu Apr 4 11:02:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 11:02:18 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v23] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: cfg_status_for_pair refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/b282ea35..318608c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=21-22 Stats: 111 lines in 1 file changed: 48 ins; 51 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 11:09:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 11:09:27 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v24] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - rm dead code - add IR rule to two tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/318608c1..6677ecad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=22-23 Stats: 12 lines in 2 files changed: 10 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From chagedorn at openjdk.org Thu Apr 4 11:14:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 11:14:36 GMT Subject: RFR: 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() Message-ID: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> This patch replaces all `TypeInterfaces::intersection_with()` + `eq()` usages with a simpler `contains()` call which does the same. Thanks, Christian ------------- Commit messages: - 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() Changes: https://git.openjdk.org/jdk/pull/18620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18620&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329201 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18620/head:pull/18620 PR: https://git.openjdk.org/jdk/pull/18620 From epeter at openjdk.org Thu Apr 4 11:14:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 11:14:44 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v25] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: some cosmetics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/6677ecad..8b344e3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=23-24 Stats: 9 lines in 1 file changed: 1 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From chagedorn at openjdk.org Thu Apr 4 11:29:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 11:29:10 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v3] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 05:21:24 GMT, Emanuel Peter wrote: >> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). >> >> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. >> >> **Details** >> >> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. >> >> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. >> >> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. >> >> **More Backgroud / Details** >> >> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. >> >> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates ... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression > - For Vladimir: check is_Store first > - Load / Store precedence > - add some comments and asserts > - ensure Load/Store order > - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression > - 8327978 Good catch! Could such a long-running/compiling test also be added as jtreg test which fails due to a timeout without this patch and passes with the patch? src/hotspot/share/opto/vectorization.cpp line 324: > 322: } > 323: } > 324: } This is identical to the loop above. Could this code be shared (e.g. `find_max_pred_depth()`)? ------------- PR Review: https://git.openjdk.org/jdk/pull/18532#pullrequestreview-1979544235 PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551475445 From epeter at openjdk.org Thu Apr 4 11:45:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 11:45:35 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v26] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: merged_input_value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/8b344e3a..763a2f67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=24-25 Stats: 101 lines in 1 file changed: 58 ins; 39 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 12:16:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 12:16:32 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v27] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: a bit more stuff ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/763a2f67..6d3ab7e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=25-26 Stats: 42 lines in 2 files changed: 26 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 12:22:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 12:22:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v28] In-Reply-To: References: Message-ID: <9JnVMa5mZZIRoWPI8yhVD89VypDaocu3oSQUVvLlNWA=.6a2bee87-a0fd-440c-b1f2-70da39b8f22b@github.com> > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/6d3ab7e6..330d6745 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=26-27 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Thu Apr 4 12:49:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 12:49:16 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v4] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add TestLargeCompilation.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/161cf7d0..6b02fce2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=02-03 Stats: 130 lines in 1 file changed: 130 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From epeter at openjdk.org Thu Apr 4 12:55:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 12:55:24 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v5] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: extract find_max_pred_depth ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/6b02fce2..9cb0afa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=03-04 Stats: 17 lines in 1 file changed: 5 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From epeter at openjdk.org Thu Apr 4 12:55:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 12:55:25 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v3] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 11:25:37 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression >> - For Vladimir: check is_Store first >> - Load / Store precedence >> - add some comments and asserts >> - ensure Load/Store order >> - Merge branch 'master' into JDK-8327978-dependency-graph-comp-time-regression >> - 8327978 > > Good catch! Could such a long-running/compiling test also be added as jtreg test which fails due to a timeout without this patch and passes with the patch? @chhagedorn I added a regression test as you requested. It results in timeout before the patch, and passes with plenty of time to spare with the patch. I also did the code change you requested. > src/hotspot/share/opto/vectorization.cpp line 324: > >> 322: } >> 323: } >> 324: } > > This is identical to the loop above. Could this code be shared (e.g. `find_max_pred_depth()`)? Good idea! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18532#issuecomment-2037130738 PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551620700 From chagedorn at openjdk.org Thu Apr 4 13:14:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 13:14:01 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v5] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 12:55:24 GMT, Emanuel Peter wrote: >> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). >> >> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. >> >> **Details** >> >> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. >> >> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. >> >> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. >> >> **More Backgroud / Details** >> >> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. >> >> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > extract find_max_pred_depth You have a whitespace error in the test file. src/hotspot/share/opto/vectorization.cpp line 299: > 297: // assume that the depth of all preds is already computed when we compute the depth of use. > 298: void VLoopDependencyGraph::compute_depth() { > 299: auto find_max_pred_depth = [&] (const Node* n) { I would move this code out to a separate method. Having a lambda here makes it hard to read `compute_depth()` and you don't really need to capture anything. test/hotspot/jtreg/compiler/loopopts/superword/TestLargeCompilation.java line 31: > 29: * @summary Test compile time for large compilation, where SuperWord takes especially much time. > 30: * @requires vm.compiler2.enabled > 31: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -XX:RepeatCompilation=5 -XX:LoopUnrollLimit=1000 -Xbatch You can also use `main/othervm/timeout=30` or an even lower timeout. Then you might be able to get rid of `RepeatCompilation`. ------------- PR Review: https://git.openjdk.org/jdk/pull/18532#pullrequestreview-1979852854 PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551658809 PR Review Comment: https://git.openjdk.org/jdk/pull/18532#discussion_r1551656544 From epeter at openjdk.org Thu Apr 4 13:25:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 13:25:39 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v6] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more review updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/9cb0afa0..022a9e78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=04-05 Stats: 30 lines in 3 files changed: 14 ins; 13 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From epeter at openjdk.org Thu Apr 4 13:42:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 13:42:18 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 14:19:57 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > test fix Generally looks much better again :) I left a few more comments. I only did a quick pass this time. Now I'm tired. Maybe I can look at some critical parts again tomorrow. Then I'll approve it, but I want at least 2 other Reviewers to look at it, just for the sheer complexity. src/hotspot/share/opto/callGenerator.cpp line 1018: > 1016: // result = slowGet(); result = slowGet(); > 1017: // goto continue; goto continue; > 1018: // This is really nice, definitely keep this! src/hotspot/share/opto/callGenerator.cpp line 1218: > 1216: // slow_call: > 1217: // result = slowGet(); > 1218: // goto continue; Now you have duplication of these comments, see above `remove_first_probe_if_when_it_never_hits`. Would it make sense to put this somewhere more "central"? src/hotspot/share/opto/graphKit.cpp line 4276: > 4274: // carrier thread's cache. > 4275: // return _gvn.transform(LoadNode::make(_gvn, nullptr, immutable_memory(), p, p->bottom_type()->is_ptr(), > 4276: // TypeRawPtr::NOTNULL, T_ADDRESS, MemNode::unordered)); This is not dead code, but for the purpose of showing that `immutable_memory` does not work? src/hotspot/share/opto/loopnode.cpp line 5181: > 5179: } > 5180: > 5181: bool PhaseIdealLoop::optimize_scoped_value_get_nodes() { This is a bit of a monster method, with deep nesting. Hard to read. Can you break it up somehow into smaller methods? src/hotspot/share/opto/loopnode.cpp line 5193: > 5191: } > 5192: IfNode* iff = hits_in_cache->success_proj()->in(0)->as_If(); > 5193: for (uint j = 0; j < _scoped_value_get_nodes.size(); j++) { Do you need the whole range? Now you have all i's and all j's. That is intended? src/hotspot/share/opto/type.cpp line 617: > 615: TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0); > 616: > 617: const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); Suggestion: const Type** fgetfromcache = (const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); src/hotspot/share/opto/type.cpp line 622: > 620: fgetfromcache[2] = TypeAryPtr::OOPS; > 621: TypeTuple::make(3, fgetfromcache); > 622: const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); Suggestion: const Type** fsvgetresult = (const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 68: > 66: "testSlowPath1,testSlowPath2,testSlowPath3,testSlowPath4,testSlowPath5,testSlowPath6,testSlowPath7,testSlowPath8,testSlowPath9,testSlowPath10"); > 67: for (String test : tests) { > 68: TestFramework.runWithFlags("-XX:+TieredCompilation", "--enable-preview", "-XX:CompileCommand=dontinline,java.lang.ScopedValue::slowGet", "-DTest=" + test); What is the reason for running each test individually? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1979811468 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551634190 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551630624 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551643903 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551686872 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551693575 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551700779 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551701152 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551703782 From epeter at openjdk.org Thu Apr 4 13:42:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 13:42:18 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 12:56:21 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > src/hotspot/share/opto/callGenerator.cpp line 1218: > >> 1216: // slow_call: >> 1217: // result = slowGet(); >> 1218: // goto continue; > > Now you have duplication of these comments, see above `remove_first_probe_if_when_it_never_hits`. Would it make sense to put this somewhere more "central"? And you further repeat the comments below. I fear that if someone would eventually make changes, they would not update all comments, and then the comments diverge. > src/hotspot/share/opto/graphKit.cpp line 4276: > >> 4274: // carrier thread's cache. >> 4275: // return _gvn.transform(LoadNode::make(_gvn, nullptr, immutable_memory(), p, p->bottom_type()->is_ptr(), >> 4276: // TypeRawPtr::NOTNULL, T_ADDRESS, MemNode::unordered)); > > This is not dead code, but for the purpose of showing that `immutable_memory` does not work? Ah, you just moved it. ok. > src/hotspot/share/opto/loopnode.cpp line 5181: > >> 5179: } >> 5180: >> 5181: bool PhaseIdealLoop::optimize_scoped_value_get_nodes() { > > This is a bit of a monster method, with deep nesting. Hard to read. Can you break it up somehow into smaller methods? You seem to do an all-vs-all optimization here, right? Could you do that in a nested loop, and then just dispatch for all combinations: hits-hits hits-get get-hits get-get Also: is there a reason for the reverse-order? > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 68: > >> 66: "testSlowPath1,testSlowPath2,testSlowPath3,testSlowPath4,testSlowPath5,testSlowPath6,testSlowPath7,testSlowPath8,testSlowPath9,testSlowPath10"); >> 67: for (String test : tests) { >> 68: TestFramework.runWithFlags("-XX:+TieredCompilation", "--enable-preview", "-XX:CompileCommand=dontinline,java.lang.ScopedValue::slowGet", "-DTest=" + test); > > What is the reason for running each test individually? Hmm. Profile pollution. But if it is so bad, then won't that be an issue "in the real wold"? Is this test not very artificial? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551632692 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551646332 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551692752 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551704827 From epeter at openjdk.org Thu Apr 4 13:42:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 13:42:18 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> Message-ID: On Wed, 6 Mar 2024 13:45:31 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopTransform.cpp line 3790: >> >>> 3788: phase->do_peeling(this, old_new); >>> 3789: return false; >>> 3790: } >> >> Just because I'm curious: why do the other places not already peel these loops? I.e. why do we need this here? > > Peeling looks for a loop invariant condition with one branch that exits the loop because then peeling makes the test in the loop body redundant with the one in the peeled iteration. Here, if there's a `ScopedValue.get()` on a loop invariant `ScopedValue` object, peeling one iteration will make `ScopedValue.get()` in the loop body redundant with the one in the peeled iteration. So it's not quite the same, at least, because for `ScopedValue.get()` the optimization applies whether `ScopedValue.get()` causes an exit of the loop or not. Ah, great, thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1551659782 From mli at openjdk.org Thu Apr 4 13:44:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Apr 2024 13:44:34 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v10] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: use java library code of Math.round as golden value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/962b40e8..a8c4172d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=08-09 Stats: 64 lines in 2 files changed: 56 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From epeter at openjdk.org Thu Apr 4 13:58:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Apr 2024 13:58:35 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v7] In-Reply-To: References: Message-ID: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18532/files - new: https://git.openjdk.org/jdk/pull/18532/files/022a9e78..47e25f27 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18532&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18532.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18532/head:pull/18532 PR: https://git.openjdk.org/jdk/pull/18532 From chagedorn at openjdk.org Thu Apr 4 14:04:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 14:04:11 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v7] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:58:35 GMT, Emanuel Peter wrote: >> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). >> >> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. >> >> **Details** >> >> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. >> >> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. >> >> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. >> >> **More Backgroud / Details** >> >> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. >> >> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace Looks good, thanks for the update! And nice that you've been able to extract and add a test for it. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18532#pullrequestreview-1980016138 From chagedorn at openjdk.org Thu Apr 4 15:15:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Apr 2024 15:15:11 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Wed, 3 Apr 2024 14:32:38 GMT, Roland Westrelin wrote: >>> Should we go back to the previously suggested version without CastPP? >> >> Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. > > If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). > > Should we go back to the previously suggested version without CastPP? > > Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. I'm afraid you're right. This could probably happen, too. > If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). I've first thought of doing it in `SubTypeCheckNode::Value()` but assumed we can get away with handling it in `GraphKit`. But as now figured out, this comes with new problems and does not seem to be safe. I will try to undo my current fix idea in `GraphKit` and do it in `SubTypeCheckNode::Value()` instead. This should work (not yet sure though what to do with `ExpandSubTypeCheckAtParseTime` and if it's easy to fix - otherwise, we could move forward with the proposal to remove it for good). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1551873024 From kvn at openjdk.org Thu Apr 4 15:29:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Apr 2024 15:29:05 GMT Subject: RFR: 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() In-Reply-To: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> References: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> Message-ID: <2OGZLIsZVeNInlzKx90h02PlYcahs_IA7CLe4ZikmT8=.7ea361cd-4ecc-4688-a1b7-1c111022dac0@github.com> On Thu, 4 Apr 2024 11:09:07 GMT, Christian Hagedorn wrote: > This patch replaces all `TypeInterfaces::intersection_with()` + `eq()` usages with a simpler `contains()` call which does the same. > > Thanks, > Christian Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18620#pullrequestreview-1980284483 From kvn at openjdk.org Thu Apr 4 15:33:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Apr 2024 15:33:12 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v7] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:58:35 GMT, Emanuel Peter wrote: >> In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). >> >> The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. >> >> **Details** >> >> Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. >> >> But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. >> >> Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. >> >> **More Backgroud / Details** >> >> This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. >> >> Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18532#pullrequestreview-1980301870 From roland at openjdk.org Thu Apr 4 16:04:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Apr 2024 16:04:12 GMT Subject: RFR: 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() In-Reply-To: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> References: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> Message-ID: On Thu, 4 Apr 2024 11:09:07 GMT, Christian Hagedorn wrote: > This patch replaces all `TypeInterfaces::intersection_with()` + `eq()` usages with a simpler `contains()` call which does the same. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18620#pullrequestreview-1980425769 From aph at openjdk.org Thu Apr 4 16:33:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Apr 2024 16:33:08 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: <87t3reTHf8QtR506vIL1EY_FewcjebGG7JhLDdfBtkY=.b3f68cdf-5c34-467d-853f-f1cbc432b40d@github.com> On Mon, 25 Mar 2024 09:11:40 GMT, Magnus Ihse Bursie wrote: > > And neither should we compile or link it with "-fvisibility=hidden". That is the root of this problem. > > If you suggest that we should not compile hsdis with hidden visibility, I disagree. Yes, that's what I would do. > I have been working hard on unifying build of native libraries across the entire product, to fix holes where we have not used a consistent way of compiling and/or linking. There is no reason to tread hsdis differently. If I restore using hidden visibility as an option that all native libraries, except hsdis, must opt in to, then we are just back to square one, and suddenly someone will forget about it. Instead, now we set -fvisibility=hidden in configure so nobody can forget about it. OK, OK! So please can we get this fix in? > Robbin proposes to change this to > > ``` > #if defined(_WIN32) > __declspec(dllexport) > #elif defined(_GNU_SOURCE) > __attribute__ ((visibility ("default"))) > #endif > ``` > > My counter-proposal was to replace it with just `JNIEXPORT`. Surely you can't say that is a worse solution? JNIEXPORT is better, I guess, but it does mean that hsdis is no longer standalone, and IMO it should have a pathological dependency on some JVM header file. That's the problem here. But really, whatever works is good with me. The last thing I want to do is delay this fix any further. Robbin has asked "How would you add jni.h ?" Is anyone going to answer? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2037672904 From ihse at openjdk.org Thu Apr 4 16:46:00 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 4 Apr 2024 16:46:00 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 06:58:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > remove swap file Ideally, it should just be enough to add `#import "jni.h"`. This will definitely work for capstone and llvm. However, for binutils we disable our standard C include paths, so this will fail. (I'm not sure if this is really necessary, but that is how we currently do it) To solve it you can either set `HSDIS_TOOLCHAIN_DEFAULT_CFLAGS` to `true` for binutils, or you can add the include path to jni.h directly. The former would be the best, but in case that does not work, try adding `java.base:include` to `EXTRA_HEADER_DIRS`. I apologize for the late reply. I've been just working spotty hours due to spring break. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2037695449 PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2037696471 From shade at openjdk.org Thu Apr 4 17:13:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Apr 2024 17:13:10 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v8] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 19:31:38 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Make flag product diagnostic and guard string concat storestore by flag More comments. Consider splitting out the Membar counting diagnostic into a separate PR. src/hotspot/share/opto/c2_globals.hpp line 795: > 793: product(bool, UseStoreStoreForCtor, true, DIAGNOSTIC, \ > 794: "Use storestore barrier instead of release barrier for" \ > 795: "on constructor exit") \ "Use StoreStore barrier instead of Release barrier at the end of constructors" ------------- PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-1980621452 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1552095084 From shade at openjdk.org Thu Apr 4 17:13:11 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Apr 2024 17:13:11 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v7] In-Reply-To: References: <7HWfk5Q4A6yM9yzfiKxWhZ3cuswzWEvqzChKLtSdHT8=.a6cd68c4-fa17-47cd-bd91-a6668ba84d00@github.com> Message-ID: On Wed, 3 Apr 2024 19:19:39 GMT, Joshua Cao wrote: >> src/hotspot/share/opto/memnode.cpp line 3438: >> >>> 3436: } >>> 3437: } >>> 3438: } else if (opc == Op_MemBarRelease || opc == Op_MemBarStoreStore) { >> >> Should be protected by a feature flag? > > I don't think the feature flag is needed for these cases. These are optimizations to support the changes in code shape introduced by the feature, not the feature itself. If the feature flag is off, there should be no change in behavior. Extra safety, though. We are fiddling with common path in the aggressively optimizing compiler. I think we want to have the "chicken flag" that rolls back any changes from this PR, and it is provably the same code as before. For this hunk, it would be something like: } else if (opc == (UseStoreStoreForCtor ? Op_MemBarStoreStore : Op_MemBarRelease)) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1552100593 From dlong at openjdk.org Thu Apr 4 20:00:12 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Apr 2024 20:00:12 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 08:40:25 GMT, Galder Zamarre?o wrote: >> I was able to remove the clone-specific logic in invoke() in two parts: >> >> 1. fix the type_is_exact logic to allow array receiver >> 2. move primitive array receiver check into append_alloc_array_copy > >> You're right about holder_known, but why do you need to check for _clone specifically at line 2137? If there is logic missing that prevents an inlining attempt then I think it should be fixed first, rather than in a followup. > > I added that check because none of the conditions in that `if` statement satisfied the situations in which `clone` calls are optimized. For the example I gave above, `code == Bytecodes::_invokevirtual` is true and `target->is_final_method()` is false. So that's why I added `clone` specifically. > >> I was able to remove the clone-specific logic in invoke() in two parts: >> >> 1. fix the type_is_exact logic to allow array receiver >> 2. move primitive array receiver check into append_alloc_array_copy > > Great! I assume you also solved the clone check in line 2137? > > How do we add your work on top of mine? Do I cherry pick the commit(s) from a branch of yours? Or some other way? I'll prepare a branch for you to try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1552348280 From duke at openjdk.org Thu Apr 4 21:56:40 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 4 Apr 2024 21:56:40 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: - Guard everything by feature flag - Revert "Statistics for barriers generated/eliminated" This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/582848c7..5ff6bef5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=07-08 Stats: 24 lines in 6 files changed: 2 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Thu Apr 4 21:59:11 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 4 Apr 2024 21:59:11 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v3] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 17:46:14 GMT, Vladimir Kozlov wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> EA tests, static test classes, add @requires, fix comment > > Can we also add statistic about how many different barriers C2 generates and eliminates? It will help to know if we missing some optimization with these changes. @vnkozlov What do you think about excluding barrier statistics from this PR? I'd prefer to keep the PR as small as possible, and I don't think the statistics are key here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2038301359 From dlong at openjdk.org Thu Apr 4 22:43:02 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 Apr 2024 22:43:02 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 09:04:36 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into topic.0131.c1-array-clone > - Merge branch 'master' into topic.0131.c1-array-clone > - Reserve necessary frame map space for clone use cases > - 8302850: C1 primitive array clone intrinsic in graph > > * Combine array length, new type array and arraycopy for clone in c1 graph. > * Add OmitCheckFlags to skip arraycopy checks. > * Instantiate ArrayCopyStub only if necessary. > * Avoid zeroing newly created arrays for clone. > * Add array null after c1 clone compilation test. > * Pass force reexecute to intrinsic via value stack. > This is needed to be able to deoptimize correctly this intrinsic. > * When new type array or array copy are used for the clone intrinsic, > their state needs to be based on the state before for deoptimization > to work as expected. > - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86" > > This reverts commit fe5d916724614391a685bbef58ea939c84197d07. > - 8302850: Link code emit infos for null check and alloc array > - 8302850: Null check array before getting its length > > * Added a jtreg test to verify the null check works. > Without the fix this test fails with a SEGV crash. > - 8302850: Force reexecuting clone in case of a deoptimization > > * Copy state including locals for clone > so that reexecution works as expected. > - 8302850: Avoid instantiating array copy stub for clone use cases > - 8302850: Primitive array copy C1 intrinsic for aarch64 and x86 > > * Clone calls that involve Phi nodes are not supported. > * Add unimplemented stubs for other platforms. My suggested cleanup is here: https://github.com/dean-long/jdk/tree/pr/17667 Also, you'll need a 2nd review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2038391526 From kvn at openjdk.org Thu Apr 4 22:46:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Apr 2024 22:46:09 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v3] In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 17:46:14 GMT, Vladimir Kozlov wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> EA tests, static test classes, add @requires, fix comment > > Can we also add statistic about how many different barriers C2 generates and eliminates? It will help to know if we missing some optimization with these changes. > @vnkozlov What do you think about excluding barrier statistics from this PR? I'd prefer to keep the PR as small as possible, and I don't think the statistics are key here. Yes, it could be done separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2038395048 From duke at openjdk.org Thu Apr 4 23:10:38 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 4 Apr 2024 23:10:38 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: > The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. > > The performance data using the ComputePI.java benchmark (part of this PR) is as follows: > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 > ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 > ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 > ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 > ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 > ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 > > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 > ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 > ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 > ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 > ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 > ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix failure for KNL ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18503/files - new: https://git.openjdk.org/jdk/pull/18503/files/970716f4..b4a11ba8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18503.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18503/head:pull/18503 PR: https://git.openjdk.org/jdk/pull/18503 From duke at openjdk.org Thu Apr 4 23:45:09 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 4 Apr 2024 23:45:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Wed, 3 Apr 2024 21:17:22 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix L2F cvtsi2ssq > > src/hotspot/cpu/x86/assembler_x86.cpp line 2034: > >> 2032: InstructionAttr attributes(AVX_128bit, /* rex_w */ VM_Version::supports_evex(), /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 2033: attributes.set_rex_vex_w_reverted(); >> 2034: int encode = simd_prefix_and_encode(dst, src, src, VEX_SIMD_F2, VEX_OPCODE_0F, &attributes); > > Can you explain this change? Similar to #18089, the purpose of this change is to remove the slowdown due to false dependency. For example, using the current `(dst, dst, src)` encoding in the case of `VCVTSD2SS xmm1, xmm2, xmm3/m64`, the instruction converts one double precision floating-point value in xmm3/m64 to one single precision floating-point value and **merge with high bits in xmm2**. This merge with high bits of xmm2 causes a false dependency as xmm1 and xmm2 are the same in `(dst, dst, src)` encoding. We are removing the false dependency by (1) removing the m64 source in VCVTSDSS instruction encoding in the .ad file (2) load `m64` source in `src` before calling `VCVTSD2SS `and explicitly zeroing out the of high bits in `src` using `vmovsd src, m64` and then calling `VCVTSD2SS dst, src, src`. Thus `dst[0:63]` now gets the result of convert operation from `src[0:63]` and since` src[64:127]` is already zeroed out, it's put in `dst[64:127] `without a false dependency. Thanks, Vamsi ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552592854 From duke at openjdk.org Thu Apr 4 23:48:08 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 4 Apr 2024 23:48:08 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: <_FZu1eoEuvz1kIayx7ct1Fo5OGqhywif93sVrHBctD4=.72330cb1-59f7-4292-b70d-9d54a90b611e@github.com> On Thu, 4 Apr 2024 02:35:19 GMT, Srinivas Vamsi Parasa wrote: > Next tests failed when running with `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` flags > compiler/intrinsics/zip/TestFpRegsABI.java The KNL related failure was fixed in the latest commit by adding the check `if (UseAVX > 2 && !attributes->uses_vl()) `in line 11713 for` src/hotspot/cpu/x86/assembler_x86.cpp` Could you please have a look at this change? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2038456748 From duke at openjdk.org Fri Apr 5 00:15:09 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 00:15:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 00:10:34 GMT, Vladimir Kozlov wrote: > I will submit new testing. Thank you Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2038481272 From kvn at openjdk.org Fri Apr 5 00:15:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 00:15:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Thu, 4 Apr 2024 23:42:15 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2034: >> >>> 2032: InstructionAttr attributes(AVX_128bit, /* rex_w */ VM_Version::supports_evex(), /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >>> 2033: attributes.set_rex_vex_w_reverted(); >>> 2034: int encode = simd_prefix_and_encode(dst, src, src, VEX_SIMD_F2, VEX_OPCODE_0F, &attributes); >> >> Can you explain this change? > > Similar to #18089, the purpose of this change is to remove the slowdown due to false dependency. For example, using the current `(dst, dst, src)` encoding in the case of `VCVTSD2SS xmm1, xmm2, xmm3/m64`, the instruction converts one double precision floating-point value in xmm3/m64 to one single precision floating-point value and **merge with high bits in xmm2**. This merge with high bits of xmm2 causes a false dependency as xmm1 and xmm2 are the same in `(dst, dst, src)` encoding. > > We are removing the false dependency by (1) removing the m64 source in VCVTSDSS instruction encoding in the .ad file (2) load `m64` source in `src` before calling `VCVTSD2SS `and explicitly zeroing out the of high bits in `src` using `vmovsd src, m64` and then calling `VCVTSD2SS dst, src, src`. Thus `dst[0:63]` now gets the result of convert operation from `src[0:63]` and since` src[64:127]` is already zeroed out, it's put in `dst[64:127] `without a false dependency. > > Thanks, > Vamsi Thank you for explaining. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552615300 From kvn at openjdk.org Fri Apr 5 00:15:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 00:15:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Thu, 4 Apr 2024 23:10:38 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix failure for KNL I will submit new testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2038480264 From dlong at openjdk.org Fri Apr 5 02:14:08 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Apr 2024 02:14:08 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: <0tHoCLaawyt8Gu0QrnhaeqMYxn_iDxUqdj7vvK5CL7s=.be78ae97-c62d-45dd-a01e-1e0ff984ab10@github.com> On Wed, 20 Mar 2024 09:04:36 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into topic.0131.c1-array-clone > - Merge branch 'master' into topic.0131.c1-array-clone > - Reserve necessary frame map space for clone use cases > - 8302850: C1 primitive array clone intrinsic in graph > > * Combine array length, new type array and arraycopy for clone in c1 graph. > * Add OmitCheckFlags to skip arraycopy checks. > * Instantiate ArrayCopyStub only if necessary. > * Avoid zeroing newly created arrays for clone. > * Add array null after c1 clone compilation test. > * Pass force reexecute to intrinsic via value stack. > This is needed to be able to deoptimize correctly this intrinsic. > * When new type array or array copy are used for the clone intrinsic, > their state needs to be based on the state before for deoptimization > to work as expected. > - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86" > > This reverts commit fe5d916724614391a685bbef58ea939c84197d07. > - 8302850: Link code emit infos for null check and alloc array > - 8302850: Null check array before getting its length > > * Added a jtreg test to verify the null check works. > Without the fix this test fails with a SEGV crash. > - 8302850: Force reexecuting clone in case of a deoptimization > > * Copy state including locals for clone > so that reexecution works as expected. > - 8302850: Avoid instantiating array copy stub for clone use cases > - 8302850: Primitive array copy C1 intrinsic for aarch64 and x86 > > * Clone calls that involve Phi nodes are not supported. > * Add unimplemented stubs for other platforms. My patch still needs some work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2038633732 From dlong at openjdk.org Fri Apr 5 02:20:03 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Apr 2024 02:20:03 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 09:04:36 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into topic.0131.c1-array-clone > - Merge branch 'master' into topic.0131.c1-array-clone > - Reserve necessary frame map space for clone use cases > - 8302850: C1 primitive array clone intrinsic in graph > > * Combine array length, new type array and arraycopy for clone in c1 graph. > * Add OmitCheckFlags to skip arraycopy checks. > * Instantiate ArrayCopyStub only if necessary. > * Avoid zeroing newly created arrays for clone. > * Add array null after c1 clone compilation test. > * Pass force reexecute to intrinsic via value stack. > This is needed to be able to deoptimize correctly this intrinsic. > * When new type array or array copy are used for the clone intrinsic, > their state needs to be based on the state before for deoptimization > to work as expected. > - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86" > > This reverts commit fe5d916724614391a685bbef58ea939c84197d07. > - 8302850: Link code emit infos for null check and alloc array > - 8302850: Null check array before getting its length > > * Added a jtreg test to verify the null check works. > Without the fix this test fails with a SEGV crash. > - 8302850: Force reexecuting clone in case of a deoptimization > > * Copy state including locals for clone > so that reexecution works as expected. > - 8302850: Avoid instantiating array copy stub for clone use cases > - 8302850: Primitive array copy C1 intrinsic for aarch64 and x86 > > * Clone calls that involve Phi nodes are not supported. > * Add unimplemented stubs for other platforms. I think we could eventually relax the requirement that receiver_klass be loaded, at least for object arrays, but for simplicity my patch will follow the existing behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2038637347 From jbhateja at openjdk.org Fri Apr 5 02:34:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Apr 2024 02:34:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 00:09:00 GMT, Vladimir Kozlov wrote: >> Similar to #18089, the purpose of this change is to remove the slowdown due to false dependency. For example, using the current `(dst, dst, src)` encoding in the case of `VCVTSD2SS xmm1, xmm2, xmm3/m64`, the instruction converts one double precision floating-point value in xmm3/m64 to one single precision floating-point value and **merge with high bits in xmm2**. This merge with high bits of xmm2 causes a false dependency as xmm1 and xmm2 are the same in `(dst, dst, src)` encoding. >> >> We are removing the false dependency by (1) removing the m64 source in VCVTSDSS instruction encoding in the .ad file (2) load `m64` source in `src` before calling `VCVTSD2SS `and explicitly zeroing out the of high bits in `src` using `vmovsd src, m64` and then calling `VCVTSD2SS dst, src, src`. Thus `dst[0:63]` now gets the result of convert operation from `src[0:63]` and since` src[64:127]` is already zeroed out, it's put in `dst[64:127] `without a false dependency. >> >> Thanks, >> Vamsi > > Thank you for explaining. > Similar to #18089, the purpose of this change is to remove the slowdown due to false dependency. For example, using the current `(dst, dst, src)` encoding in the case of `VCVTSD2SS xmm1, xmm2, xmm3/m64`, the instruction converts one double precision floating-point value in xmm3/m64 to one single precision floating-point value and **merge with high bits in xmm2**. This merge with high bits of xmm2 causes a false dependency as xmm1 and xmm2 are the same in `(dst, dst, src)` encoding. > > We are removing the false dependency by (1) removing the m64 source in VCVTSDSS instruction encoding in the .ad file (2) load `m64` source in `src` before calling `VCVTSD2SS `and explicitly zeroing out the of high bits in `src` using `vmovsd src, m64` and then calling `VCVTSD2SS dst, src, src`. Thus `dst[0:63]` now gets the result of convert operation from `src[0:63]` and since` src[64:127]` is already Hi Vamsi, This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 32 bits are copied from non destructive source operand and for vex encoded institution higher 128 bits is zerod out OR are preserved for REX encoded variant. VCVTSD2SS (VEX.128 Encoded Version) ? DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]); DEST[127:32] := SRC1[127:32] DEST[MAXVL-1:128] := 0 CVTSD2SS (128-bit Legacy SSE Version) ? DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[63:0]); (* DEST[MAXVL-1:32] Unmodified *) You change can lead to incorretness https://github.com/openjdk/jdk/blob/0b01144ecec1283adaaaf1a7f53d075a56f030ae/src/hotspot/cpu/x86/assembler_x86.cpp#L11764 > zeroed out, it's put in `dst[64:127] `without a false dependency. > > Thanks, Vamsi ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552692245 From jbhateja at openjdk.org Fri Apr 5 03:46:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Apr 2024 03:46:12 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 00:09:00 GMT, Vladimir Kozlov wrote: >> Similar to #18089, the purpose of this change is to remove the slowdown due to false dependency. For example, using the current `(dst, dst, src)` encoding in the case of `VCVTSD2SS xmm1, xmm2, xmm3/m64`, the instruction converts one double precision floating-point value in xmm3/m64 to one single precision floating-point value and **merge with high bits in xmm2**. This merge with high bits of xmm2 causes a false dependency as xmm1 and xmm2 are the same in `(dst, dst, src)` encoding. >> >> We are removing the false dependency by (1) removing the m64 source in VCVTSDSS instruction encoding in the .ad file (2) load `m64` source in `src` before calling `VCVTSD2SS `and explicitly zeroing out the of high bits in `src` using `vmovsd src, m64` and then calling `VCVTSD2SS dst, src, src`. Thus `dst[0:63]` now gets the result of convert operation from `src[0:63]` and since` src[64:127]` is already zeroed out, it's put in `dst[64:127] `without a false dependency. >> >> Thanks, >> Vamsi > > Thank you for explaining. This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction. VCVTSD2SS (VEX.128 Encoded Version) ? DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]); DEST[127:32] := SRC1[127:32] DEST[MAXVL-1:128] := 0 User is only interested in lower 32 bit of destination and passing source as NDS will prevent false dependency for AVX targets since instruction dispatch will not be held for false dependency anymore and will be issued to OOO backend the moment source is ready ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552785288 From jbhateja at openjdk.org Fri Apr 5 03:46:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Apr 2024 03:46:12 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Thu, 4 Apr 2024 23:10:38 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix failure for KNL src/hotspot/cpu/x86/assembler_x86.cpp line 11713: > 11711: } > 11712: > 11713: if (UseAVX > 2 && !attributes->uses_vl()) { This is already coved by below assertion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552745596 From qamai at openjdk.org Fri Apr 5 05:39:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Apr 2024 05:39:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Thu, 4 Apr 2024 23:10:38 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix failure for KNL While changing `vcvtss2sd dst, dst, src` to `vcvtss2sd dst, src, src` looks fine, adding a `pxor` before every `cvtsi2ss` seems extra as it puts more pressure on the front-end. I propose to have a non-allocatable register such as `xmm15` and use it as the first source register for these nodes. It also helps enable the memory version of these instructions without worrying about unwanted dependencies. Cheers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2038983984 From qamai at openjdk.org Fri Apr 5 06:00:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Apr 2024 06:00:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 03:43:24 GMT, Jatin Bhateja wrote: >> Thank you for explaining. > > This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction. > > VCVTSD2SS (VEX.128 Encoded Version) ? > DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]); > DEST[127:32] := SRC1[127:32] > DEST[MAXVL-1:128] := 0 > > User is only interested in lower 32 bit of destination and passing source as NDS will prevent false dependency for AVX targets since instruction dispatch will not be held for false dependency anymore and will be issued to OOO backend the moment source is ready This change modifies the defined behaviours of `cvtss2sd`. Without AVX, it would retains the bits 64-127 of `dst` while with it the bits would be copied from `src`. I would suggest separating the matching rules instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552967699 From jbhateja at openjdk.org Fri Apr 5 06:18:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Apr 2024 06:18:08 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 05:57:27 GMT, Quan Anh Mai wrote: >> This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction. >> >> VCVTSD2SS (VEX.128 Encoded Version) ? >> DEST[31:0] := Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]); >> DEST[127:32] := SRC1[127:32] >> DEST[MAXVL-1:128] := 0 >> >> User is only interested in lower 32 bit of destination and passing source as NDS will prevent false dependency for AVX targets since instruction dispatch will not be held for false dependency anymore and will be issued to OOO backend the moment source is ready > > This change modifies the defined behaviours of `cvtss2sd`. Without AVX, it would retains the bits 64-127 of `dst` while with it the bits would be copied from `src`. I would suggest separating the matching rules instead. Its a cleaver trick to dodge false dependency without compromising on correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1552980667 From epeter at openjdk.org Fri Apr 5 06:51:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 06:51:15 GMT Subject: RFR: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 [v7] In-Reply-To: References: Message-ID: <5xjZEIWOBSQrNLVH7masHk7_NtRbeBuHJmU7WOv1WZM=.a98cebca-dfbd-49e7-a8bf-c382c6469ffd@github.com> On Thu, 4 Apr 2024 15:30:45 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace > > Update is good. @vnkozlov @chhagedorn thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18532#issuecomment-2039064383 From epeter at openjdk.org Fri Apr 5 06:51:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 06:51:16 GMT Subject: Integrated: 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:05 GMT, Emanuel Peter wrote: > In [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812 I refactored the dependency graph. It seems I made a typo, and missed a single `!`, which broke `VLoopDependencyGraph::compute_depth` (formerly `SuperWord::compute_max_depth`). > > The consequence was that all nodes in the dependency graph had the same depth `1`. A node is supposed to have a higher depth than all its inputs, except for Phi nodes, which have depth 0, as they are at the beginning of the loop's basic block, i.e. they are at the beginning of the DAG. > > **Details** > > Well, it is a bit more complicated. I had not just forgotten about the `!`. Before the change, we used to iterate over the body multiple times, until the depth computation is stable. When I saw this, I assumed this was not necessary, since the `body` is already ordered, such that `def` is before `use`. So I reduced it to a single pass over the `body`. > > But this assumption was wrong: I added some assertion code, which detected that something was wrong with the ordering in the `body`. In the failing example, I saw that we had a `Load` and a `Store` with the same memory state. Given the edges, our ordering algorithm for the `body` could schedule `Load` before `Store` or `Store` before `Load`. But that is incorrect: our assumption is that in such cases `Loads` always happen before `Stores`. > > Therefore, I had to change the traversal order in `VLoopBody::construct`, so that we visit `Load` before `Store`. With this, I now know that the `body` order is correct for both the data dependency and the memory dependency. Therefore, I only need to iterate over the `body` once in `VLoopDependencyGraph::compute_depth`. > > **More Backgroud / Details** > > This bug was reported because there were timeouts with `TestAlignVectorFuzzer.java`. This fix seems to improve the compile time drastically for the example below. It seems to be an example with a large dependency graph, where we still attempt to create some packs. This means there is a large amount of `independence` checks on the dependency graph. If those are not pruned well, then they visit many more nodes than necessary. > > Why did I not catch this earlier? I had a compile time benchmark for [JDK-8325651](https://bugs.openjdk.org/browse/JDK-8325651) / https://github.com/openjdk/jdk/pull/17812, but it seems it was not sensitive enough. It has a dense graph, but never actually created any packs. My new benchmark creates packs, which unlocks more checks d... This pull request has now been integrated. Changeset: 9da5170a Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/9da5170a0eb9f141022f86d749af3b5780b75cb7 Stats: 181 lines in 4 files changed: 171 ins; 0 del; 10 mod 8327978: C2 SuperWord: Fix compilation time regression in dependency graph traversal after JDK-8325651 Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18532 From chagedorn at openjdk.org Fri Apr 5 06:57:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 06:57:30 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies Message-ID: https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. Thanks, Christian ------------- Commit messages: - add more tests - 8327111: Replace remaining uses of create_bool_from_template_assertion_predicate() which involve additional transformations with new code from JDK-8327110 Changes: https://git.openjdk.org/jdk/pull/18628/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18628&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327111 Stats: 186 lines in 5 files changed: 100 ins; 82 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18628.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18628/head:pull/18628 PR: https://git.openjdk.org/jdk/pull/18628 From qamai at openjdk.org Fri Apr 5 06:58:01 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Apr 2024 06:58:01 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 06:14:57 GMT, Jatin Bhateja wrote: >> This change modifies the defined behaviours of `cvtss2sd`. Without AVX, it would retains the bits 64-127 of `dst` while with it the bits would be copied from `src`. I would suggest separating the matching rules instead. > > Its a cleaver trick to dodge false dependency without compromising on correctness. @jatin-bhateja I get it but IMO it shouldn't be the responsibility of the assembler to do that, the assembler should emit machine code in a manner that respects what is being written. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1553028424 From chagedorn at openjdk.org Fri Apr 5 06:59:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 06:59:03 GMT Subject: RFR: 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() In-Reply-To: <2OGZLIsZVeNInlzKx90h02PlYcahs_IA7CLe4ZikmT8=.7ea361cd-4ecc-4688-a1b7-1c111022dac0@github.com> References: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> <2OGZLIsZVeNInlzKx90h02PlYcahs_IA7CLe4ZikmT8=.7ea361cd-4ecc-4688-a1b7-1c111022dac0@github.com> Message-ID: <7rCVuw21PSp-3UmU2KrfPf4nurNv2aBGA8LfR3njaog=.61bf2e5c-b928-4227-a61c-fcd2d6ec2738@github.com> On Thu, 4 Apr 2024 15:26:29 GMT, Vladimir Kozlov wrote: >> This patch replaces all `TypeInterfaces::intersection_with()` + `eq()` usages with a simpler `contains()` call which does the same. >> >> Thanks, >> Christian > > Good. Thanks @vnkozlov and @rwestrel for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18620#issuecomment-2039087186 From chagedorn at openjdk.org Fri Apr 5 06:59:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 06:59:03 GMT Subject: Integrated: 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() In-Reply-To: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> References: <0KYj6BYL1-2aFWEPbiqWoTe0OHSUD3NUTGyZMP4rqMA=.4808d844-deaa-4c68-b344-eb1a954d102b@github.com> Message-ID: On Thu, 4 Apr 2024 11:09:07 GMT, Christian Hagedorn wrote: > This patch replaces all `TypeInterfaces::intersection_with()` + `eq()` usages with a simpler `contains()` call which does the same. > > Thanks, > Christian This pull request has now been integrated. Changeset: 6bc6392d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/6bc6392d2b073434d2cfac4c5f6f2908bd8fe77e Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod 8329201: C2: Replace TypeInterfaces::intersection_with() + eq() with contains() Reviewed-by: kvn, roland ------------- PR: https://git.openjdk.org/jdk/pull/18620 From roland at openjdk.org Fri Apr 5 07:41:13 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 07:41:13 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: <5qko_2OZ1ZZ7f65V0yyuHTg4cBrx6SfCOeIuMjVm8PM=.0832d479-5350-44d1-b0bc-fbaaaa3892f7@github.com> On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... comment to keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-2039155048 From epeter at openjdk.org Fri Apr 5 07:48:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 07:48:41 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v29] In-Reply-To: References: Message-ID: <4qaM_jSSLlxcehHsbjqBdKilKBJ_MqCc9PuoiL3kJBc=.aaf61156-b1e5-49d9-bf4a-bf74853e3d6e@github.com> > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve comments, and extract collect_merge_list ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/330d6745..637b3686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=27-28 Stats: 87 lines in 1 file changed: 49 ins; 27 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From thartmann at openjdk.org Fri Apr 5 07:50:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 07:50:18 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table Message-ID: Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. Thanks, Tobias ------------- Commit messages: - 8321204: C2: assert(false) failed: node should be in igvn hash table Changes: https://git.openjdk.org/jdk/pull/18647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321204 Stats: 10 lines in 1 file changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18647/head:pull/18647 PR: https://git.openjdk.org/jdk/pull/18647 From epeter at openjdk.org Fri Apr 5 07:58:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 07:58:43 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v30] In-Reply-To: References: Message-ID: <66NohgVLyhHhRTGCAn9Ww4xqJDk_rUhX5L5x-DXjBPE=.4bfa09bf-1e9a-454d-a018-ea27842f8620@github.com> > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: extract make_merged_store ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/637b3686..52d8cf82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=28-29 Stats: 53 lines in 1 file changed: 28 ins; 21 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From thartmann at openjdk.org Fri Apr 5 08:13:17 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 08:13:17 GMT Subject: RFR: 8329749: UseNeon flag is unused Message-ID: After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. Thanks, Tobias ------------- Commit messages: - 8329749: UseNeon flag is unused Changes: https://git.openjdk.org/jdk/pull/18648/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18648&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329749 Stats: 8 lines in 3 files changed: 0 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18648/head:pull/18648 PR: https://git.openjdk.org/jdk/pull/18648 From chagedorn at openjdk.org Fri Apr 5 08:57:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 08:57:51 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag In-Reply-To: References: Message-ID: <4qqnEWfcXakENAIHAuPNc-xebXi74pdTtwaacpMw1pY=.0f9cf8df-96d6-4332-84a5-29efb74a150e@github.com> On Fri, 5 Apr 2024 08:07:39 GMT, Tobias Hartmann wrote: > After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. > > Thanks, > Tobias Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18648#pullrequestreview-1982380148 From thartmann at openjdk.org Fri Apr 5 08:57:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 08:57:53 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag In-Reply-To: References: Message-ID: <6rUspZjouhFQy4uolM3Y78S3_F7r2NY3_RZUb83NpvQ=.353b03f1-53c8-4ed6-a063-1385425a4d06@github.com> On Fri, 5 Apr 2024 08:07:39 GMT, Tobias Hartmann wrote: > After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. > > Thanks, > Tobias @chhagedorn reminded me that this is a product flag, so we need to obsolete it which requires a CSR. I'll update the PR shortly. Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2039237475 PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2039258931 From chagedorn at openjdk.org Fri Apr 5 08:58:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 08:58:05 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias Good catch! I've already thought we'll never get to the root of this problem. The fix looks good. > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). That's reasonable and an easy tweak to the assert. > I also executed some extensive testing with Node::hash hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. Nice, good idea! Stress testing the zero hash would definitely be beneficial and makes this very rare case much more common. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18647#pullrequestreview-1982352785 From epeter at openjdk.org Fri Apr 5 08:58:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 08:58:25 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v31] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - beautify and ASCII art for make_merged_store - renamings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/52d8cf82..acb06ea2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=29-30 Stats: 67 lines in 1 file changed: 31 ins; 5 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Fri Apr 5 09:13:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 09:13:37 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v32] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - extract Status::make - extract trace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/acb06ea2..c548fe8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=30-31 Stats: 58 lines in 1 file changed: 27 ins; 21 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From thartmann at openjdk.org Fri Apr 5 09:17:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 09:17:08 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18647#issuecomment-2039294077 From thartmann at openjdk.org Fri Apr 5 10:04:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 10:04:26 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: > After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Obsoleting the flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18648/files - new: https://git.openjdk.org/jdk/pull/18648/files/9da49928..5d8119ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18648&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18648&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18648/head:pull/18648 PR: https://git.openjdk.org/jdk/pull/18648 From epeter at openjdk.org Fri Apr 5 10:08:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:08:13 GMT Subject: RFR: 8327381 Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v8] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 08:31:48 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8327381](https://bugs.openjdk.org/browse/JDK-8327381) >> >> Currently the transformations for expressions with patterns `((x & m) u<= m)` or `((m & x) u<= m)` to `true` is in `BoolNode::Ideal` function with a new constant node of value `1` created. However, this is technically a type-improving (reduction in range) transformation that's better suited in `BoolNode::Value` function. >> >> New unit test `test/hotspot/jtreg/compiler/c2/TestBoolNodeGvn.java` asserting on IR nodes and correctness of this transformation is added and passing. > > Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: > > - update comments > - fix indentation again Looks better, thanks for the updates. I had another idea to make the code a bit more simple. FYI: I'll be out of the office next week. src/hotspot/share/opto/subnode.cpp line 1816: > 1814: // Change ((x & m) u<= m) or ((m & x) u<= m) to always true > 1815: // Same with ((x & m) u< m+1) and ((m & x) u< m+1) > 1816: if (cop == Op_CmpU && cmp1->Opcode() == Op_AndI) { You made this a bit more complicated than the original. Or was there a specific reason for the `is_Sub`? I'd do this: Suggestion: // Change ((x & m) u<= m) or ((m & x) u<= m) to always true // Same with ((x & m) u< m+1) and ((m & x) u< m+1) Node* cmp = in(1); if (cmp != nullptr && cmp->Opcode() == Op_CmpU) { Node* cmp1 = cmp->in(1); Node* cmp2 = cmp->in(2); if (cmp1->Opcode() == Op_AndI) { ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18198#pullrequestreview-1982536313 PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1553305055 From epeter at openjdk.org Fri Apr 5 10:08:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:08:13 GMT Subject: RFR: 8327381 Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v8] In-Reply-To: References: Message-ID: <3zuVDnNd_9nUXHjG1TCWQjVVWuLcyCLAOEgJKeGnDL0=.996e0ab4-58d0-47c9-875b-26bcaae19887@github.com> On Fri, 5 Apr 2024 09:58:12 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with two additional commits since the last revision: >> >> - update comments >> - fix indentation again > > src/hotspot/share/opto/subnode.cpp line 1816: > >> 1814: // Change ((x & m) u<= m) or ((m & x) u<= m) to always true >> 1815: // Same with ((x & m) u< m+1) and ((m & x) u< m+1) >> 1816: if (cop == Op_CmpU && cmp1->Opcode() == Op_AndI) { > > You made this a bit more complicated than the original. Or was there a specific reason for the `is_Sub`? I'd do this: > Suggestion: > > // Change ((x & m) u<= m) or ((m & x) u<= m) to always true > // Same with ((x & m) u< m+1) and ((m & x) u< m+1) > Node* cmp = in(1); > if (cmp != nullptr && cmp->Opcode() == Op_CmpU) { > Node* cmp1 = cmp->in(1); > Node* cmp2 = cmp->in(2); > if (cmp1->Opcode() == Op_AndI) { You could also move the whole code to its own method, and name it something like `BoolNode::Value_cmpu_and_mask`. Maybe you find an even more descriptive name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18198#discussion_r1553309782 From epeter at openjdk.org Fri Apr 5 10:15:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:15:01 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> Message-ID: On Thu, 4 Apr 2024 09:56:28 GMT, Hamlin Li wrote: > > There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. > > In previous discussion, there are several reasons why it's implemented in this way: > > 1. test the whole range of 32 bits is slow, and even slow for a 64 ranges double. I guess we could try it for 32 bit floats. But it would take a while. If we can make sure it does not take much more than one minute, we can do that. But of course 64 bit doubles would be infeasible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1553335033 From epeter at openjdk.org Fri Apr 5 10:18:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:18:11 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 11:12:24 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Add comments, revert to requires_strict_order and other minor changes You probably want to change the name of the PR again: `Add "is_associative" flag for floating-point add-reduction` -> `8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add-reduction` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2039421329 From epeter at openjdk.org Fri Apr 5 10:22:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:22:01 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 11:12:24 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Add comments, revert to requires_strict_order and other minor changes src/hotspot/share/opto/vectorIntrinsics.cpp line 1742: > 1740: if (mask != nullptr && !use_predicate) { > 1741: Node* reduce_identity = gvn().transform(VectorNode::scalar2vector(init, > 1742: num_elem, Type::get_const_basic_type(elem_bt))); Suggestion: num_elem, Type::get_const_basic_type(elem_bt))); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1553343739 From epeter at openjdk.org Fri Apr 5 10:27:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:27:14 GMT Subject: RFR: 8320725: C2: Add "is_associative" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 11:12:24 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Add comments, revert to requires_strict_order and other minor changes Looks better, good work! FYI: I'll be out of the office next week, can look at it again afterwards! src/hotspot/share/opto/vectorIntrinsics.cpp line 1742: > 1740: if (mask != nullptr && !use_predicate) { > 1741: Node* reduce_identity = gvn().transform(VectorNode::scalar2vector(init, > 1742: num_elem, Type::get_const_basic_type(elem_bt))); The indentation is not ok here. Suggestion, keep it as it used to be: Suggestion: Node* reduce_identity = gvn().transform(VectorNode::scalar2vector(init, num_elem, Type::get_const_basic_type(elem_bt))); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-1982623978 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1553346551 From mli at openjdk.org Fri Apr 5 10:29:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 5 Apr 2024 10:29:01 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> Message-ID: On Fri, 5 Apr 2024 10:12:29 GMT, Emanuel Peter wrote: >>> There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. >> >> In previous discussion, there are several reasons why it's implemented in this way: >> 1. test the whole range of 32 bits is slow, and even slow for a 64 ranges double. >> 2. if it's too slow, then it's not feasible to make it an automatic test. >> these are expected by @eme64. >> >>> You can make the test much faster by copy-and-pasting the library code for Math.round(float) and letting the JIT compile it. >> >> Previously, I had [this question](https://github.com/openjdk/jdk/pull/17753#issuecomment-1992519401), but from the point view of correctness of the golden value. >> I think you make another point to change from @DontCompile to copying library java code. Thanks! >> I will do it. > >> > There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. >> >> In previous discussion, there are several reasons why it's implemented in this way: >> >> 1. test the whole range of 32 bits is slow, and even slow for a 64 ranges double. > > I guess we could try it for 32 bit floats. But it would take a while. If we can make sure it does not take much more than one minute, we can do that. But of course 64 bit doubles would be infeasible. As I remember, that's not the case in my local environment, i.e. it will take longer time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1553352302 From epeter at openjdk.org Fri Apr 5 10:40:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:40:25 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v33] In-Reply-To: References: <0D9E-3Nj0VvCYUmIXKgMoRI7W3xioc6n5phQ_TGNHRE=.80f0ef3a-243d-4eea-9351-c407ed92b6b8@github.com> Message-ID: On Thu, 28 Mar 2024 17:27:31 GMT, Quan Anh Mai wrote: >> @merykitty I see there are some proofs now, great! I'll have a look soon :) > > @eme64 Gentle ping on this. @merykitty I see that you have more proofs now, that is great! I am out of the office next week, but I plan to look at this PR after that. It is 2K+ lines, so that may take me a while. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-2039461897 From rehn at openjdk.org Fri Apr 5 10:45:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 5 Apr 2024 10:45:24 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: > Hi, please consider. > > [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. > Tested with gcc and clang, and llvm and binutils backend. > > I didn't find any use of the "DLL_ENTRY", so I removed it. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Use JNIEXPORT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18400/files - new: https://git.openjdk.org/jdk/pull/18400/files/28862745..b4003e9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18400&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18400&range=01-02 Stats: 22 lines in 5 files changed: 4 ins; 12 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18400.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18400/head:pull/18400 PR: https://git.openjdk.org/jdk/pull/18400 From rehn at openjdk.org Fri Apr 5 10:45:24 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 5 Apr 2024 10:45:24 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: <2wol6Z3Rd2zNpWUl8GWlXiC_vRWMzcuVFQh_-E_j_w0=.35a4b254-ce8b-4bad-a03e-381d3c2ddae8@github.com> On Thu, 21 Mar 2024 06:58:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > remove swap file Thanks, updated. (Since windows build seems to need HSDIS_TOOLCHAIN_DEFAULT_CFLAGS, just adding java.base:include seemed a good easy way) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2039470504 From epeter at openjdk.org Fri Apr 5 10:49:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:49:30 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v33] In-Reply-To: References: Message-ID: <9ZegY8QATrqtvekUuSdB2x7LOSbHIUDBvgOJa1biPbo=.d2d3d648-8435-431f-ab85-b776e0fdbb3b@github.com> > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: handle UseUnalignedAccesses in test, and a few cosmetics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/c548fe8f..c85cce1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=31-32 Stats: 75 lines in 2 files changed: 43 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Fri Apr 5 10:54:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 10:54:05 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 22:48:01 GMT, Vladimir Kozlov wrote: >>> * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths. >> >> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered. > >> > ``` >> > * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths. >> > ``` >> >> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered. > > I completely agree with Roland. @vnkozlov @rwestrel @TobiHartmann I refactored the code significantly, I think it is now much more well structured. I also only allow a singe RangeCheck now, that makes sure that the "first" store floats towards the uncommon trap. Feel free to re-review. I'm out of the office next week, and will return to this then, and re-run the benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2039489405 From chagedorn at openjdk.org Fri Apr 5 11:14:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 11:14:10 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:04:26 GMT, Tobias Hartmann wrote: >> After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Obsoleting the flag Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18648#pullrequestreview-1982823161 From bkilambi at openjdk.org Fri Apr 5 11:40:12 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 5 Apr 2024 11:40:12 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v4] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:15:02 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comments, revert to requires_strict_order and other minor changes > > You probably want to change the name of the PR again: > `Add "is_associative" flag for floating-point add-reduction` -> `8320725: AArch64: C2: Add "requires_strict_order" flag for floating-point add-reduction` @eme64 Thank you for the notification and your review ! I did change the title in the bug but forgot to do that in the PR as well. Changed that now. I will make the changes you suggested in the next PS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2039582209 From chagedorn at openjdk.org Fri Apr 5 12:02:11 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 12:02:11 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: References: Message-ID: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> On Fri, 22 Mar 2024 18:48:56 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge branch 'master' into licm > - @run driver -> @run main > - Add tests for add/sub reassociation > - Merge branch 'master' into licm > - Make inputs deterministic. Make size an arg. Fix comments. Formatting. > - Update test to utilize @setup method for arguments > - Merge branch 'master' into licm > - Add correctness test for some random tests with random inputs > - Add some correctness tests where we do reassociate > - Remove unused TestInfo parameter. Have some tests exit mid-loop. > - ... and 7 more: https://git.openjdk.org/jdk/compare/8daa6942...32cb9c0d Only some minor comments. Otherwise, looks good to me, too! src/hotspot/share/opto/loopTransform.cpp line 269: > 267: } > 268: > 269: //---------------------is_associative_cmp------------------------- I usually remove these legacy headers when touching old code. I don't think they add any benefit nowadays. src/hotspot/share/opto/loopTransform.cpp line 281: > 279: BoolNode* boolOut = n->out(i)->isa_Bool(); > 280: if (boolOut == nullptr || !(boolOut->_test._test == BoolTest::eq || > 281: boolOut->_test._test == BoolTest::ne)) { We should use underscores instead of camelCase. Suggestion: BoolNode* bool_out = n->out(i)->isa_Bool(); if (bool_out == nullptr || !(bool_out->_test._test == BoolTest::eq || bool_out->_test._test == BoolTest::ne)) { src/hotspot/share/opto/loopTransform.cpp line 311: > 309: || op == Op_OrI || op == Op_OrL > 310: || op == Op_XorI || op == Op_XorL > 311: || is_associative_cmp(n); You should also update the comment on L304 which does not mention cmps. src/hotspot/share/opto/loopTransform.cpp line 315: > 313: } > 314: > 315: //-------------------reassociate_add_sub_cmp--------------------- Can also be removed. src/hotspot/share/opto/loopTransform.cpp line 335: > 333: // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > 334: // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > 335: // Suggestion: src/hotspot/share/opto/loopTransform.cpp line 353: > 351: (n2_is_sub && !n1_is_cmp && inv2_idx == 2) || (n1_is_cmp && !n2_is_sub); > 352: bool neg_inv1 = > 353: (n1_is_sub && inv1_idx == 2) || (n1_is_cmp && inv2_idx == 1 && n2_is_sub); I suggest to remove the line breaks since the lines won't be longer than the added asserts above. src/hotspot/share/opto/loopnode.hpp line 742: > 740: Node* reassociate(Node* n1, PhaseIdealLoop *phase); > 741: // Reassociate invariant add, subtract, and compare expressions. > 742: Node* reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop *phase); Suggestion: Node* reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop* phase); test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateAddSub.java line 36: > 34: * @summary Test loop invariant code motion of add/sub through reassociation > 35: * @library /test/lib / > 36: * @run main compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub You should use `driver` since we do not want to stress the driver VM with additionally passed VM flags like `-Xcomp` when running IR tests. Suggestion: * @run driver compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 36: > 34: * @summary Test loop invariant code motion for cmp nodes through reassociation > 35: * @library /test/lib / > 36: * @run main compiler.c2.loopopts.InvariantCodeMotionReassociateCmp Suggestion: * @run driver compiler.c2.loopopts.InvariantCodeMotionReassociateCmp ------------- PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1982827351 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553439633 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553442818 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553444002 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553444134 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553444294 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553445604 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553447684 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553448796 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553453116 From ihse at openjdk.org Fri Apr 5 12:47:13 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 5 Apr 2024 12:47:13 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:45:24 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use JNIEXPORT Looks good to me now. Thanks for indulging me in doing it this way. :-) ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18400#pullrequestreview-1983072333 From ihse at openjdk.org Fri Apr 5 12:47:14 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 5 Apr 2024 12:47:14 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: <2wol6Z3Rd2zNpWUl8GWlXiC_vRWMzcuVFQh_-E_j_w0=.35a4b254-ce8b-4bad-a03e-381d3c2ddae8@github.com> References: <2wol6Z3Rd2zNpWUl8GWlXiC_vRWMzcuVFQh_-E_j_w0=.35a4b254-ce8b-4bad-a03e-381d3c2ddae8@github.com> Message-ID: On Fri, 5 Apr 2024 10:42:31 GMT, Robbin Ehn wrote: > Since windows build seems to need HSDIS_TOOLCHAIN_DEFAULT_CFLAGS Riiiight. That is since we use a completely different compiler, gcc instead of cl. (Which is probably the worst hack in all of the JDK build system...) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2039711663 From roland at openjdk.org Fri Apr 5 12:51:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 12:51:04 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 11:57:59 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8324517 >> - test and fix > > Thanks @rwestrel ! > Generally makes sense, I have a few suggestions and questions. @eme64 did you get a chance to look at the answers to your questions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2039721782 From epeter at openjdk.org Fri Apr 5 13:14:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:14:11 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 12:48:35 GMT, Roland Westrelin wrote: >> Thanks @rwestrel ! >> Generally makes sense, I have a few suggestions and questions. > > @eme64 did you get a chance to look at the answers to your questions? @rwestrel It seems I only get notifications for new messages, not responses. Looking at the PR now... ------------- PR Comment: https://git.openjdk.org/jdk/pull/18377#issuecomment-2039771054 From thartmann at openjdk.org Fri Apr 5 13:14:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Apr 2024 13:14:10 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: <3duIdpsQcZAlCUts51VydLPjQEM3H44aQZ4oo9jbdvs=.5d1f8069-fe3c-40d1-aaad-75f5ba51f536@github.com> On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias FTR, I filed [JDK-8329777](https://bugs.openjdk.org/browse/JDK-8329777). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18647#issuecomment-2039769950 From epeter at openjdk.org Fri Apr 5 13:14:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:14:11 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <89wFzkpUcY3PKi_ypzouWAXDEa1iV35rq_nOqsOS62o=.b9af444c-e2b2-4621-b015-ae276c921542@github.com> Message-ID: On Thu, 28 Mar 2024 13:42:35 GMT, Roland Westrelin wrote: >> Not sure if that is important, but it seems most other tests are in a package > > I never put tests in a package. So If there's an issue with that, then there are many more tests to fix. Fine with me. I was just curious, and have mostly seen tests that are in a package. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553606784 From epeter at openjdk.org Fri Apr 5 13:22:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:22:13 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 14:14:57 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8324517 > - test and fix Thanks for the updates and answers, @rwestrel ! Looks reasonable, a second reviewer should also have a close look though. test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 38: > 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined > 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated > 38: * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated Suggestion: * @run main TestArrayAccessAboveRCAfterRCCastIIEliminated ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18377#pullrequestreview-1983151249 PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553616123 From epeter at openjdk.org Fri Apr 5 13:22:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:22:13 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:14:52 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8324517 >> - test and fix > > test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 38: > >> 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined >> 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated >> 38: * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated > > Suggestion: > > * @run main TestArrayAccessAboveRCAfterRCCastIIEliminated Detail, optional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553616483 From roland at openjdk.org Fri Apr 5 13:22:13 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:22:13 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:15:07 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 38: >> >>> 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined >>> 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated >>> 38: * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated >> >> Suggestion: >> >> * @run main TestArrayAccessAboveRCAfterRCCastIIEliminated > > Detail, optional. Actually, shouldn't I have kept `-XX:-BackgroundCompilation` for this one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553622706 From epeter at openjdk.org Fri Apr 5 13:23:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:23:17 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> References: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> Message-ID: On Fri, 5 Apr 2024 11:12:06 GMT, Christian Hagedorn wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/71d3a1f1...32cb9c0d > > src/hotspot/share/opto/loopTransform.cpp line 269: > >> 267: } >> 268: >> 269: //---------------------is_associative_cmp------------------------- > > I usually remove these legacy headers when touching old code. I don't think they add any benefit nowadays. agree > src/hotspot/share/opto/loopTransform.cpp line 281: > >> 279: BoolNode* boolOut = n->out(i)->isa_Bool(); >> 280: if (boolOut == nullptr || !(boolOut->_test._test == BoolTest::eq || >> 281: boolOut->_test._test == BoolTest::ne)) { > > We should use underscores instead of camelCase. > Suggestion: > > BoolNode* bool_out = n->out(i)->isa_Bool(); > if (bool_out == nullptr || !(bool_out->_test._test == BoolTest::eq || > bool_out->_test._test == BoolTest::ne)) { Agree, this is the C++ style. CamelCase is Java style. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553623796 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553624395 From roland at openjdk.org Fri Apr 5 13:25:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:25:15 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 12:57:46 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/callGenerator.cpp line 1218: >> >>> 1216: // slow_call: >>> 1217: // result = slowGet(); >>> 1218: // goto continue; >> >> Now you have duplication of these comments, see above `remove_first_probe_if_when_it_never_hits`. Would it make sense to put this somewhere more "central"? > > And you further repeat the comments below. I fear that if someone would eventually make changes, they would not update all comments, and then the comments diverge. That one doesn't duplicate the one above `transform_get_subgraph()`. It's supposed to show what change was just made to the graph by `replace_current_exit_of_get_with_halt()`. Same for the one below, it's expected to show an incremental change. It's hard to show what changes without keeping the entire structure of the code I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1553627734 From epeter at openjdk.org Fri Apr 5 13:26:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:26:07 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> References: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> Message-ID: On Fri, 5 Apr 2024 11:19:33 GMT, Christian Hagedorn wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/edcbbb75...32cb9c0d > > test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateAddSub.java line 36: > >> 34: * @summary Test loop invariant code motion of add/sub through reassociation >> 35: * @library /test/lib / >> 36: * @run main compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub > > You should use `driver` since we do not want to stress the driver VM with additionally passed VM flags like `-Xcomp` when running IR tests. > Suggestion: > > * @run driver compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub But if we put `driver` here, do the flags get passed to the IR framework test VM? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553629023 From roland at openjdk.org Fri Apr 5 13:30:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:30:01 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:24:54 GMT, Emanuel Peter wrote: >> Actually, shouldn't I have kept `-XX:-BackgroundCompilation` for this one? > > I think it would be great to have one run with absolutely no flags. But then it's not even guaranteed that the compilation completes? I'm not sure I understand the rational. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553638327 From epeter at openjdk.org Fri Apr 5 13:30:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 13:30:00 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:19:31 GMT, Roland Westrelin wrote: >> Detail, optional. > > Actually, shouldn't I have kept `-XX:-BackgroundCompilation` for this one? I think it would be great to have one run with absolutely no flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1553630815 From roland at openjdk.org Fri Apr 5 13:37:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:37:49 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v14] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - review - test fix - test fix - Merge branch 'master' into JDK-8320649 - whitespaces - review - Merge branch 'master' into JDK-8320649 - review - 32 bit build fix - fix & test - ... and 11 more: https://git.openjdk.org/jdk/compare/18c925cd...1f8931d8 ------------- Changes: https://git.openjdk.org/jdk/pull/16966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=13 Stats: 2682 lines in 39 files changed: 2612 ins; 29 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Fri Apr 5 13:37:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:37:49 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: <01v1o3zC_rRJwGBcvBBU8WPwgYkMlonWLb9KCfSNCC8=.fe9ca49e-3228-47bd-8cf1-29a197fe0308@github.com> On Thu, 4 Apr 2024 13:29:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 5181: >> >>> 5179: } >>> 5180: >>> 5181: bool PhaseIdealLoop::optimize_scoped_value_get_nodes() { >> >> This is a bit of a monster method, with deep nesting. Hard to read. Can you break it up somehow into smaller methods? > > You seem to do an all-vs-all optimization here, right? > Could you do that in a nested loop, and then just dispatch for all combinations: > hits-hits > hits-get > get-hits > get-get > > Also: is there a reason for the reverse-order? Regarding the reverse-order: it makes removing the just processed element easier when it's replaced by a dominating node and so is dead. I added a comment for that. >> test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 68: >> >>> 66: "testSlowPath1,testSlowPath2,testSlowPath3,testSlowPath4,testSlowPath5,testSlowPath6,testSlowPath7,testSlowPath8,testSlowPath9,testSlowPath10"); >>> 67: for (String test : tests) { >>> 68: TestFramework.runWithFlags("-XX:+TieredCompilation", "--enable-preview", "-XX:CompileCommand=dontinline,java.lang.ScopedValue::slowGet", "-DTest=" + test); >> >> What is the reason for running each test individually? > > Hmm. Profile pollution. But if it is so bad, then won't that be an issue "in the real wold"? Is this test not very artificial? What I would expect is that if there are only a few uses of ScopedValue, then there's a good chance there's no profile pollution. If there are many and profile is polluted, this PR implements some optimizations that should do a reasonable job. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1553640775 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1553645825 From chagedorn at openjdk.org Fri Apr 5 13:38:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 13:38:14 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: References: <3CCigBs4mon8accLm7AItQWuBXXruFZLCtjPrjBhVos=.0f01d167-927a-4c9a-81ba-4499aad88674@github.com> Message-ID: On Fri, 5 Apr 2024 13:23:32 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateAddSub.java line 36: >> >>> 34: * @summary Test loop invariant code motion of add/sub through reassociation >>> 35: * @library /test/lib / >>> 36: * @run main compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub >> >> You should use `driver` since we do not want to stress the driver VM with additionally passed VM flags like `-Xcomp` when running IR tests. >> Suggestion: >> >> * @run driver compiler.c2.loopopts.InvariantCodeMotionReassociateAddSub > > But if we put `driver` here, do the flags get passed to the IR framework test VM? Yes, they are passed by the IR framework. I think for IR tests, we should not stress the driver VM. The sole purpose of the driver VM is to start the actual test VM (by passing the `javaoptions` and `vmoptions`) and do IR matching afterward. I don't think there is much benefit if we try to also stress the driver VM each time in a normal IR test. What we could do (and I'm not sure if we already do that) is to have a dedicated (or a set of) IR test that runs with `main` such that the driver VM is also stressed at some point. But that should be done separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1553650749 From roland at openjdk.org Fri Apr 5 13:37:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:37:49 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:30:08 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > src/hotspot/share/opto/loopnode.cpp line 5193: > >> 5191: } >> 5192: IfNode* iff = hits_in_cache->success_proj()->in(0)->as_If(); >> 5193: for (uint j = 0; j < _scoped_value_get_nodes.size(); j++) { > > Do you need the whole range? Now you have all i's and all j's. That is intended? Yes, it's intended. We're looking for a node that dominates `n` and can replace it. So we have to go over all nodes looking for a dominating one and for each `n` we have to check every other node for one that dominates `n` (and can replace it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1553639483 From rkennke at openjdk.org Fri Apr 5 13:38:19 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 Apr 2024 13:38:19 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking Message-ID: This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. Testing: - [x] manual test with dacapo as provided in the bug report - [ ] tier1 ------------- Commit messages: - 8329726: Use non-short forward jumps in lightweight locking Changes: https://git.openjdk.org/jdk/pull/18657/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329726 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18657/head:pull/18657 PR: https://git.openjdk.org/jdk/pull/18657 From roland at openjdk.org Fri Apr 5 13:37:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:37:49 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: <01v1o3zC_rRJwGBcvBBU8WPwgYkMlonWLb9KCfSNCC8=.fe9ca49e-3228-47bd-8cf1-29a197fe0308@github.com> References: <01v1o3zC_rRJwGBcvBBU8WPwgYkMlonWLb9KCfSNCC8=.fe9ca49e-3228-47bd-8cf1-29a197fe0308@github.com> Message-ID: <3REBwfZX67PmkyvFRPjZD-vi5win-tr7QTGWs045j_I=.4ccd9741-2c33-4587-b317-fdc0e90d6650@github.com> On Fri, 5 Apr 2024 13:29:09 GMT, Roland Westrelin wrote: >> You seem to do an all-vs-all optimization here, right? >> Could you do that in a nested loop, and then just dispatch for all combinations: >> hits-hits >> hits-get >> get-hits >> get-get >> >> Also: is there a reason for the reverse-order? > > Regarding the reverse-order: it makes removing the just processed element easier when it's replaced by a dominating node and so is dead. I added a comment for that. > You seem to do an all-vs-all optimization here, right? Could you do that in a nested loop, and then just dispatch for all combinations: hits-hits hits-get get-hits get-get I pushed a new commit that implements what you suggest I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1553651292 From roland at openjdk.org Fri Apr 5 13:53:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Apr 2024 13:53:10 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Tue, 26 Mar 2024 14:43:42 GMT, Kangcheng Xu wrote: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. That looks good to me. A comment that shows what the transformation does with pseudo code before and pseudo code after (before/after subgraphs) would be helpful I think. ------------- PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-1983245569 From jwaters at openjdk.org Fri Apr 5 14:04:12 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 5 Apr 2024 14:04:12 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: <2wol6Z3Rd2zNpWUl8GWlXiC_vRWMzcuVFQh_-E_j_w0=.35a4b254-ce8b-4bad-a03e-381d3c2ddae8@github.com> Message-ID: On Fri, 5 Apr 2024 12:43:30 GMT, Magnus Ihse Bursie wrote: > > Since windows build seems to need HSDIS_TOOLCHAIN_DEFAULT_CFLAGS > > Riiiight. That is since we use a completely different compiler, gcc instead of cl. (Which is probably the worst hack in all of the JDK build system...) I plan on adding support for using gcc to compile hsdis on Windows sometime soon (Like proper support for gcc as a toolchain on Windows, at least enough to support hsdis, so people can freely use any gcc instead of only being restricted to Cygwin), what might be an elegant solution to this issue, if you have any in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2039888898 From chagedorn at openjdk.org Fri Apr 5 14:56:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 14:56:20 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v2] In-Reply-To: References: Message-ID: <7J-UV3oSvjHfxkS2e7ds4JfzlkhHwdaVdiISjWOPEM8=.2afbce4e-e94c-4038-8bbb-9d549d01fa7a@github.com> > https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. > > Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: > - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). > - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). > > Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). > > I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: remove othervm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18628/files - new: https://git.openjdk.org/jdk/pull/18628/files/15bc211d..5960e82b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18628&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18628&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18628.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18628/head:pull/18628 PR: https://git.openjdk.org/jdk/pull/18628 From ihse at openjdk.org Fri Apr 5 15:18:11 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 5 Apr 2024 15:18:11 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: <2wol6Z3Rd2zNpWUl8GWlXiC_vRWMzcuVFQh_-E_j_w0=.35a4b254-ce8b-4bad-a03e-381d3c2ddae8@github.com> Message-ID: On Fri, 5 Apr 2024 14:01:29 GMT, Julian Waters wrote: > I plan on adding support for using gcc to compile hsdis on Windows sometime soon (Like proper support for gcc as a toolchain on Windows, at least enough to support hsdis, so people can freely use any gcc instead of only being restricted to Cygwin), what might be an elegant solution to this issue, if you have any in mind? The proper solution is, I think, to encapsulate toolchain variables/definitions into a package, and update the build system to not assume anything about toolchains. In a proper language, I'd create like a `class Toolchain` with everything the system would need to know to use a toolchain, and then pass a `Toolchain toolchain` to all invocations that need to do anything with a toolchain. Now this is not possible with make, but I hope it can be simulated using "fake namespaces" and something like a `$(TOOLCHAIN)_COMPILER` kind of setup. That would require some major surgery of the build system, but I think it will come out better in the end. After that is done, fixing so that hsdis/binutils can be compiled with gcc on Windows will be a trivial side-effect. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2040063933 From epeter at openjdk.org Fri Apr 5 15:44:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 15:44:00 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v2] In-Reply-To: <7J-UV3oSvjHfxkS2e7ds4JfzlkhHwdaVdiISjWOPEM8=.2afbce4e-e94c-4038-8bbb-9d549d01fa7a@github.com> References: <7J-UV3oSvjHfxkS2e7ds4JfzlkhHwdaVdiISjWOPEM8=.2afbce4e-e94c-4038-8bbb-9d549d01fa7a@github.com> Message-ID: On Fri, 5 Apr 2024 14:56:20 GMT, Christian Hagedorn wrote: >> https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. >> >> Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: >> - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). >> - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). >> >> Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). >> >> I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > remove othervm Looks good, a nice continuation of your series of fixes and refactorings! src/hotspot/share/opto/loopTransform.cpp line 1499: > 1497: Opaque4Node* new_opaque4_node; > 1498: if (new_stride == nullptr) { > 1499: new_opaque4_node = template_assertion_predicate_expression.clone_and_replace_init(new_init, control, this); Optional: Maybe you could add a small comment about why the `new_stride` is a `nullptr` in this case? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18628#pullrequestreview-1983565016 PR Review Comment: https://git.openjdk.org/jdk/pull/18628#discussion_r1553901435 From shade at openjdk.org Fri Apr 5 15:47:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Apr 2024 15:47:00 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:33:33 GMT, Roman Kennke wrote: > This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. > > Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Testing: > - [x] manual test with dacapo as provided in the bug report > - [ ] tier1 Sad to give up a short jump in synchronization code just for asserts. Maybe we give up on code readability a bit? E.g.: #ifdef ASSERT // Check that locked label is reached with ZF set. Label zf_bad_zero, zf_correct; jcc(Assembler::zero, zf_correct); jmp(zf_bad_zero) #endif bind(slow_path); #ifdef ASSERT // Check that slow_path label is reached with ZF not set. jccb(Assembler::notZero, zf_correct); stop("Fast Lock ZF != 0"); bind(zf_bad_zero); stop("Fast Lock ZF != 1"); bind(zf_correct); #endif ------------- PR Review: https://git.openjdk.org/jdk/pull/18657#pullrequestreview-1983580258 From chagedorn at openjdk.org Fri Apr 5 15:52:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 15:52:29 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v3] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. > > Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: > - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). > - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). > > Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). > > I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: add comment and assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18628/files - new: https://git.openjdk.org/jdk/pull/18628/files/5960e82b..3af28fc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18628&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18628&range=01-02 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18628.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18628/head:pull/18628 PR: https://git.openjdk.org/jdk/pull/18628 From epeter at openjdk.org Fri Apr 5 15:52:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 15:52:29 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v3] In-Reply-To: References: Message-ID: <-1pAQTwrGK2tSrSuBIWn27cBqz9PrGQ1m2JFY5EDGmc=.a3d03380-aff7-4945-8a27-2a5cc3a29f29@github.com> On Fri, 5 Apr 2024 15:49:19 GMT, Christian Hagedorn wrote: >> https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. >> >> Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: >> - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). >> - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). >> >> Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). >> >> I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add comment and assert Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18628#pullrequestreview-1983586531 From chagedorn at openjdk.org Fri Apr 5 15:52:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 15:52:29 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v2] In-Reply-To: References: <7J-UV3oSvjHfxkS2e7ds4JfzlkhHwdaVdiISjWOPEM8=.2afbce4e-e94c-4038-8bbb-9d549d01fa7a@github.com> Message-ID: <-eQ6gi56mZt7YHBJSUmIfClX2NSwlt8bd73L0Wa4RrY=.d6d169db-0809-4960-9c7d-e64c0f96113e@github.com> On Fri, 5 Apr 2024 15:39:43 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> remove othervm > > src/hotspot/share/opto/loopTransform.cpp line 1499: > >> 1497: Opaque4Node* new_opaque4_node; >> 1498: if (new_stride == nullptr) { >> 1499: new_opaque4_node = template_assertion_predicate_expression.clone_and_replace_init(new_init, control, this); > > Optional: Maybe you could add a small comment about why the `new_stride` is a `nullptr` in this case? Thanks for your review! Good idea, added a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18628#discussion_r1553914415 From epeter at openjdk.org Fri Apr 5 15:52:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Apr 2024 15:52:29 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v2] In-Reply-To: <-eQ6gi56mZt7YHBJSUmIfClX2NSwlt8bd73L0Wa4RrY=.d6d169db-0809-4960-9c7d-e64c0f96113e@github.com> References: <7J-UV3oSvjHfxkS2e7ds4JfzlkhHwdaVdiISjWOPEM8=.2afbce4e-e94c-4038-8bbb-9d549d01fa7a@github.com> <-eQ6gi56mZt7YHBJSUmIfClX2NSwlt8bd73L0Wa4RrY=.d6d169db-0809-4960-9c7d-e64c0f96113e@github.com> Message-ID: On Fri, 5 Apr 2024 15:46:38 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1499: >> >>> 1497: Opaque4Node* new_opaque4_node; >>> 1498: if (new_stride == nullptr) { >>> 1499: new_opaque4_node = template_assertion_predicate_expression.clone_and_replace_init(new_init, control, this); >> >> Optional: Maybe you could add a small comment about why the `new_stride` is a `nullptr` in this case? > > Thanks for your review! Good idea, added a comment. Ah very nice, even an assert! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18628#discussion_r1553916167 From chagedorn at openjdk.org Fri Apr 5 15:57:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Apr 2024 15:57:01 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v3] In-Reply-To: <-1pAQTwrGK2tSrSuBIWn27cBqz9PrGQ1m2JFY5EDGmc=.a3d03380-aff7-4945-8a27-2a5cc3a29f29@github.com> References: <-1pAQTwrGK2tSrSuBIWn27cBqz9PrGQ1m2JFY5EDGmc=.a3d03380-aff7-4945-8a27-2a5cc3a29f29@github.com> Message-ID: On Fri, 5 Apr 2024 15:48:01 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and assert > > Marked as reviewed by epeter (Reviewer). Thanks @eme64 for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18628#issuecomment-2040154057 From duke at openjdk.org Fri Apr 5 16:00:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 16:00:11 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 06:55:47 GMT, Quan Anh Mai wrote: >> Its a cleaver trick to dodge false dependency without compromising on correctness. > > @jatin-bhateja I get it but IMO it shouldn't be the responsibility of the assembler to do that, the assembler should emit machine code in a manner that respects what is being written. > This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction. Please see the updated description incorporating the correction dst[63:0] -> dst[31,0] for `cvtss2sd` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1553927611 From duke at openjdk.org Fri Apr 5 16:00:12 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 16:00:12 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 03:27:48 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix failure for KNL > > src/hotspot/cpu/x86/assembler_x86.cpp line 11713: > >> 11711: } >> 11712: >> 11713: if (UseAVX > 2 && !attributes->uses_vl()) { > > This is already coved by below assertion Without this check, the test is failing for KNL (-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting) as Vladimir mentioned. Is there a better way to handle the KNL case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1553930336 From rkennke at openjdk.org Fri Apr 5 16:05:23 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 Apr 2024 16:05:23 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v2] In-Reply-To: References: Message-ID: > This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. > > Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Testing: > - [x] manual test with dacapo as provided in the bug report > - [ ] tier1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Shuffle code to preserve short-jump on non-assert paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18657/files - new: https://git.openjdk.org/jdk/pull/18657/files/4a8a0e64..025755b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18657/head:pull/18657 PR: https://git.openjdk.org/jdk/pull/18657 From duke at openjdk.org Fri Apr 5 17:36:46 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Apr 2024 17:36:46 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v15] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: - Fix typo - Formatting, use @run driver, remove legacy header comments - Merge remote-tracking branch 'josh/licm' into licm - Merge branch 'master' into licm - @run driver -> @run main - Add tests for add/sub reassociation - Merge branch 'master' into licm - Merge branch 'master' into licm - Merge branch 'master' into licm - Merge branch 'master' into licm - ... and 14 more: https://git.openjdk.org/jdk/compare/5eb4c7ff...1b27aae4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/32cb9c0d..1b27aae4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=13-14 Stats: 29378 lines in 689 files changed: 12395 ins; 12512 del; 4471 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From sviswanathan at openjdk.org Fri Apr 5 17:41:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 Apr 2024 17:41:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 15:57:03 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 11713: >> >>> 11711: } >>> 11712: >>> 11713: if (UseAVX > 2 && !attributes->uses_vl()) { >> >> This is already coved by below assertion > > Without this check, the test is failing for KNL (-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting) as Vladimir mentioned. Is there a better way to handle the KNL case? Yes, there is an easy way. For the instructs where you added the pxor instruction generation, you could change the dst register type from regF to vlRegF. This restricts the xmm register to xmm0-xmm15 for KNL, thereby not needing the evex encoding and in-turn not needing the avx512vl support for pxor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1554045455 From kvn at openjdk.org Fri Apr 5 17:49:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 17:49:12 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Thu, 4 Apr 2024 23:10:38 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix failure for KNL My new testing passed. But I want to hear an answer to @merykitty suggestion about using xmm15. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040327299 From kvn at openjdk.org Fri Apr 5 17:59:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 17:59:08 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 16:05:23 GMT, Roman Kennke wrote: >> This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. >> >> Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? >> >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. >> >> Testing: >> - [x] manual test with dacapo as provided in the bug report >> - [ ] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Shuffle code to preserve short-jump on non-assert paths Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18657#pullrequestreview-1983812939 From kvn at openjdk.org Fri Apr 5 18:10:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 18:10:00 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias Thank you for fixing this. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18647#pullrequestreview-1983828904 From kvn at openjdk.org Fri Apr 5 18:12:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 18:12:03 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:04:26 GMT, Tobias Hartmann wrote: >> After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Obsoleting the flag Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18648#pullrequestreview-1983832240 From shade at openjdk.org Fri Apr 5 18:16:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Apr 2024 18:16:03 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 21:56:40 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Guard everything by feature flag > - Revert "Statistics for barriers generated/eliminated" > > This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. I am okay with this, but it needs more eyes. src/hotspot/share/opto/c2_globals.hpp line 794: > 792: \ > 793: product(bool, UseStoreStoreForCtor, true, DIAGNOSTIC, \ > 794: "Use StoreStore barrier instead of Release barrier at the end of" \ Should be a space after "of ", these lines are just concatenated. src/hotspot/share/opto/macro.cpp line 640: > 638: use->Opcode() == Op_MemBarRelease || > 639: (UseStoreStoreForCtor && > 640: use->Opcode() == Op_MemBarStoreStore))) { Suggestion: (UseStoreStoreForCtor && use->Opcode() == Op_MemBarStoreStore))) { src/hotspot/share/opto/memnode.cpp line 3428: > 3426: } > 3427: } else if (opc == Op_MemBarRelease || > 3428: (UseStoreStoreForCtor && opc == Op_MemBarStoreStore)) { Suggestion: } else if (opc == Op_MemBarRelease || (UseStoreStoreForCtor && opc == Op_MemBarStoreStore)) { src/hotspot/share/opto/parse1.cpp line 1020: > 1018: (support_IRIW_for_not_multiple_copy_atomic_cpu && wrote_volatile()))) { > 1019: _exits.insert_mem_bar(UseStoreStoreForCtor ? Op_MemBarStoreStore > 1020: : Op_MemBarRelease, Suggestion: _exits.insert_mem_bar(UseStoreStoreForCtor ? Op_MemBarStoreStore : Op_MemBarRelease, src/hotspot/share/opto/stringopts.cpp line 2014: > 2012: assert(AllocateNode::Ideal_allocation(result) != nullptr, "should be newly allocated"); > 2013: kit.insert_mem_bar( > 2014: UseStoreStoreForCtor ? Op_MemBarStoreStore : Op_MemBarRelease, result); Suggestion: kit.insert_mem_bar(UseStoreStoreForCtor ? Op_MemBarStoreStore : Op_MemBarRelease, result); ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-1983748538 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554020869 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554021686 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554025465 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554025863 PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554026083 From kvn at openjdk.org Fri Apr 5 18:16:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 18:16:09 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 15:52:29 GMT, Christian Hagedorn wrote: >> https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. >> >> Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: >> - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). >> - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). >> >> Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). >> >> I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add comment and assert Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18628#pullrequestreview-1983839628 From sviswanathan at openjdk.org Fri Apr 5 18:21:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 Apr 2024 18:21:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> On Fri, 5 Apr 2024 17:46:17 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> fix failure for KNL > > My new testing passed. > But I want to hear an answer to @merykitty suggestion about using xmm15. @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040372881 From sviswanathan at openjdk.org Fri Apr 5 18:21:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 Apr 2024 18:21:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 17:37:57 GMT, Sandhya Viswanathan wrote: >> Without this check, the test is failing for KNL (-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting) as Vladimir mentioned. Is there a better way to handle the KNL case? > > Yes, there is an easy way. For the instructs where you added the pxor instruction generation, you could change the dst register type from regF to vlRegF. This restricts the xmm register to xmm0-xmm15 for KNL, thereby not needing the evex encoding and in-turn not needing the avx512vl support for pxor. and likewise from regD to vlRegD. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1554083856 From kvn at openjdk.org Fri Apr 5 18:49:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 18:49:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> Message-ID: On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan wrote: >> My new testing passed. >> But I want to hear an answer to @merykitty suggestion about using xmm15. > > @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. Okay. I will wait changes @sviswa7 suggested to use vlRegD and vlRegF. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040435432 From duke at openjdk.org Fri Apr 5 18:54:10 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 18:54:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 17:46:17 GMT, Vladimir Kozlov wrote: > My new testing passed. Thanks Vladimir! > I will wait changes @sviswa7 suggested to use vlRegD and vlRegF. Will make the changes and let you know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040443596 From duke at openjdk.org Fri Apr 5 18:54:10 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 18:54:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: <-V-SMggZXZ0QCl5D5fM9ykpz7mrUVvm74eTYi8bxBYY=.70cb3295-828c-449a-bcb4-83238adc7ce6@github.com> On Fri, 5 Apr 2024 18:17:58 GMT, Sandhya Viswanathan wrote: >> Yes, there is an easy way. For the instructs where you added the pxor instruction generation, you could change the dst register type from regF to vlRegF. This restricts the xmm register to xmm0-xmm15 for KNL, thereby not needing the evex encoding and in-turn not needing the avx512vl support for pxor. > > and likewise from regD to vlRegD. > Yes, there is an easy way. For the instructs where you added the pxor instruction generation, you could change the dst register type from regF to vlRegF. Thanks Sandhya, will make the changes and push an update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1554157717 From kvn at openjdk.org Fri Apr 5 19:02:14 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 19:02:14 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v33] In-Reply-To: <9ZegY8QATrqtvekUuSdB2x7LOSbHIUDBvgOJa1biPbo=.d2d3d648-8435-431f-ab85-b776e0fdbb3b@github.com> References: <9ZegY8QATrqtvekUuSdB2x7LOSbHIUDBvgOJa1biPbo=.d2d3d648-8435-431f-ab85-b776e0fdbb3b@github.com> Message-ID: On Fri, 5 Apr 2024 10:49:30 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > handle UseUnalignedAccesses in test, and a few cosmetics src/hotspot/share/opto/memnode.cpp line 2880: > 2878: // the optimization, if this RangeCheck[i+1] fails, then we execute only StoreB[i+0], and then trap. After > 2879: // the optimization, the new StoreI[i+0] is on the passing path of RangeCheck[i+1], and StoreB[i+0] on the > 2880: // failing path. Can we detect presence of RangeCheck which may cause us to move some stores on fail path and bailout the optimization. I don't think it is frequent case. I assume you will get RC on each store or not at all ("main" part of counted loop). Am I wrong here? I don't remember, does C2 optimize RangeCheck nodes in linear code (it does in loops)? test/micro/org/openjdk/bench/vm/compiler/MergeStores.java line 2: > 1: /* > 2: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. New Year ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1554169054 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1554156929 From dlong at openjdk.org Fri Apr 5 19:06:59 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Apr 2024 19:06:59 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Tue, 26 Mar 2024 14:43:42 GMT, Kangcheng Xu wrote: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Do we also handle the reverse, int-typed parallel iv in a long counted loop? On a related topic, I noticed that checks for is_range_check_if seem to require the type to match the loop type, but I wonder if that could be relaxed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2040471872 From duke at openjdk.org Fri Apr 5 19:53:31 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 19:53:31 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v6] In-Reply-To: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: > The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. > > The performance data using the ComputePI.java benchmark (part of this PR) is as follows: > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 > ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 > ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 > ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 > ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 > ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 > > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 > ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 > ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 > ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 > ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 > ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change reg to vlReg to fix KNL failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18503/files - new: https://git.openjdk.org/jdk/pull/18503/files/b4a11ba8..8c34df80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18503&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18503.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18503/head:pull/18503 PR: https://git.openjdk.org/jdk/pull/18503 From duke at openjdk.org Fri Apr 5 19:53:31 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 5 Apr 2024 19:53:31 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> Message-ID: On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan wrote: >> My new testing passed. >> But I want to hear an answer to @merykitty suggestion about using xmm15. > > @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > Okay. I will wait changes @sviswa7 suggested to use vlRegD and vlRegF. Please see the updated commit which uses vlRegD and vlRegF. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040529692 From kvn at openjdk.org Fri Apr 5 20:05:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 20:05:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> Message-ID: On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan wrote: >> My new testing passed. >> But I want to hear an answer to @merykitty suggestion about using xmm15. > > @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > > Okay. I will wait changes @sviswa7 suggested to use vlRegD and vlRegF. > > Please see the updated commit which uses vlRegD and vlRegF. Okay. I need to run testing again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040544844 From qamai at openjdk.org Fri Apr 5 20:14:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Apr 2024 20:14:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> Message-ID: On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan wrote: >> My new testing passed. >> But I want to hear an answer to @merykitty suggestion about using xmm15. > > @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. @sviswa7 Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect. > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases. Cheers, Quan Anh ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040556663 From qamai at openjdk.org Fri Apr 5 20:18:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 Apr 2024 20:18:10 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v4] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-niLtlGA1SlJaGnF7lPakY0eXX1njpTFpuTuWslhMMc=.846d48d4-74ee-43cf-9b95-9665f5fe0816@github.com> Message-ID: On Fri, 5 Apr 2024 15:55:01 GMT, Srinivas Vamsi Parasa wrote: >> @jatin-bhateja I get it but IMO it shouldn't be the responsibility of the assembler to do that, the assembler should emit machine code in a manner that respects what is being written. > >> This is a downcast from double precision to single precision value, thus only lower 32 bits of destination hold the actual results for conversion, upper 127:32 bits are copied from non destructive source operand for vex encoded instruction. > > Please see the updated description incorporating the correction dst[63:0] -> dst[31,0] for `cvtss2sd` @vamsi-parasa > This change modifies the defined behaviours of cvtss2sd. Without AVX, it would retains the bits 64-127 of dst while with it the bits would be copied from src. I would suggest separating the matching rules instead. Please address this, fyi in similar cases we created separate methods in the `MacroAssembler` such as `movflt` or `movdbl`. Feel free to disagree but I think the assembler should not behave differently compared to the corresponding assembly instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18503#discussion_r1554255271 From dlong at openjdk.org Fri Apr 5 20:29:01 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Apr 2024 20:29:01 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 21:56:40 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Guard everything by feature flag > - Revert "Statistics for barriers generated/eliminated" > > This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. src/hotspot/share/opto/escape.cpp line 202: > 200: if (!UseStoreStoreForCtor || n->req() > MemBarNode::Precedent) { > 201: storestore_worklist.append(n->as_MemBarStoreStore()); > 202: } This case and the next case could use a more detailed explanation. We have 4 different possible inputs: {StoreStore, Release} x {w/ Precedent, w/o Precedent} and 2 possible outcomes: worklist or record_for_optimizer. It's not obvious to me that we are doing the right thing for all cases, both in the old code and the new code. Previously, I believe this optimization did not apply to the end-of-ctor-with-final barrier, but now it does. If it should always apply, then shouldn't it also apply to the Release barrier when !UseStoreStoreForCtor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1554264291 From dlong at openjdk.org Fri Apr 5 21:25:09 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Apr 2024 21:25:09 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias Is it worth fixing the hash function too, so it never returns 0? The hash lookup probably only cares about the low bits anyway, so mapping 0 to 0x80000000 seems safe, or unconditionally OR the value with 0x80000000. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18647#issuecomment-2040642019 From sviswanathan at openjdk.org Fri Apr 5 22:33:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 5 Apr 2024 22:33:19 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> Message-ID: <6dC7-zzu-QbRo_aRxEFiHqqOBOyvopCnwfPdFBi9du0=.1f8d1702-d97b-496c-87d8-1614c9ea3d49@github.com> On Fri, 5 Apr 2024 18:17:00 GMT, Sandhya Viswanathan wrote: >> My new testing passed. >> But I want to hear an answer to @merykitty suggestion about using xmm15. > > @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > @sviswa7 > > Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect. > > > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > > You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases. > > Cheers, Quan Anh @merykitty I would like to disagree, decision to reserve a register for entire duration of program cannot be taken lightly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040711784 From kvn at openjdk.org Fri Apr 5 23:15:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Apr 2024 23:15:02 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v6] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 19:53:31 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change reg to vlReg to fix KNL failure Executive (my ;^) decision: we go with current changes: no xmm15 reservation. I am starting (I hope final) testing round. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040752434 From duke at openjdk.org Fri Apr 5 23:21:14 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Apr 2024 23:21:14 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v15] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 17:36:46 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Fix typo > - Formatting, use @run driver, remove legacy header comments > - Merge remote-tracking branch 'josh/licm' into licm > - Merge branch 'master' into licm > - @run driver -> @run main > - Add tests for add/sub reassociation > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - ... and 14 more: https://git.openjdk.org/jdk/compare/b6cc1e1d...1b27aae4 Can ignore warning and let the bot squash everything together. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-2040756113 From liach at openjdk.org Fri Apr 5 23:51:10 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 5 Apr 2024 23:51:10 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: Message-ID: <_QvfSuxwJg70LjurvtuZlujGzpSsptGx5Lfblj-ejLg=.37bd2ae1-4f9d-44f2-b1e8-731f4ec8fc32@github.com> On Thu, 4 Apr 2024 21:56:40 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Guard everything by feature flag > - Revert "Statistics for barriers generated/eliminated" > > This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. On a side note, would it be safe to replace explicit constructor emulation release barriers (Unsafe.storeFence) elsewhere in the JDK with storeStore, like in ClassValue, MutableCallSite, ClassSpecializer.Factory, ObjectInputStream, and Properties? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2040777070 From qamai at openjdk.org Sat Apr 6 01:47:24 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 6 Apr 2024 01:47:24 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <6dC7-zzu-QbRo_aRxEFiHqqOBOyvopCnwfPdFBi9du0=.1f8d1702-d97b-496c-87d8-1614c9ea3d49@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> <6dC7-zzu-QbRo_aRxEFiHqqOBOyvopCnwfPdFBi9du0=.1f8d1702-d97b-496c-87d8-1614c9ea3d49@github.com> Message-ID: <0fGRbqPsCsvzBBnAszA5TCPex-k6pjyKJK_-Odkt88U=.d019daad-819c-46cf-9d08-7c7660e8e700@github.com> On Fri, 5 Apr 2024 22:30:35 GMT, Sandhya Viswanathan wrote: >> @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > >> @sviswa7 >> >> Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect. >> >> > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. >> >> You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases. >> >> Cheers, Quan Anh > > @merykitty I would like to disagree, decision to reserve a register for entire duration of program cannot be taken lightly. @sviswa7 I didn't disagree with you, I just made a more conservative proposal that uses `xmm15` here without reserving it, what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2040850196 From fyang at openjdk.org Sat Apr 6 02:27:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 6 Apr 2024 02:27:03 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 16:02:22 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this non-vectorized implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25... > > ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: > > - Dispose of some unneeded instructions > - Move buf_end up > - Add missing instructions for accum function split > - Prettify labels and accum function > - Split accum function > - Eliminate L_nmax loop counter > - Move repeating code under function > - Add `enter` and `leave` I witnessed performance regression on unmatched board when count > 2048. JMH numbers: Before: Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 thrpt 25 1050.761 ? 54.862 ops/ms TestAdler32.testAdler32Update 128 thrpt 25 953.858 ? 42.102 ops/ms TestAdler32.testAdler32Update 256 thrpt 25 821.011 ? 21.154 ops/ms TestAdler32.testAdler32Update 512 thrpt 25 624.207 ? 19.724 ops/ms TestAdler32.testAdler32Update 1024 thrpt 25 436.040 ? 5.875 ops/ms TestAdler32.testAdler32Update 2048 thrpt 25 265.020 ? 3.058 ops/ms TestAdler32.testAdler32Update 5012 thrpt 25 124.934 ? 0.799 ops/ms TestAdler32.testAdler32Update 8192 thrpt 25 70.026 ? 0.243 ops/ms TestAdler32.testAdler32Update 16384 thrpt 25 35.885 ? 0.055 ops/ms TestAdler32.testAdler32Update 32768 thrpt 25 16.883 ? 0.027 ops/ms TestAdler32.testAdler32Update 65536 thrpt 25 7.648 ? 0.006 ops/ms After: Benchmark (count) Mode Cnt Score Error Units TestAdler32.testAdler32Update 64 thrpt 25 4360.280 ? 39.921 ops/ms TestAdler32.testAdler32Update 128 thrpt 25 2766.595 ? 16.027 ops/ms TestAdler32.testAdler32Update 256 thrpt 25 1634.373 ? 5.412 ops/ms TestAdler32.testAdler32Update 512 thrpt 25 880.028 ? 1.463 ops/ms TestAdler32.testAdler32Update 1024 thrpt 25 457.724 ? 0.296 ops/ms TestAdler32.testAdler32Update 2048 thrpt 25 233.605 ? 0.072 ops/ms TestAdler32.testAdler32Update 5012 thrpt 25 96.610 ? 0.020 ops/ms TestAdler32.testAdler32Update 8192 thrpt 25 59.275 ? 0.012 ops/ms TestAdler32.testAdler32Update 16384 thrpt 25 29.726 ? 0.004 ops/ms TestAdler32.testAdler32Update 32768 thrpt 25 14.736 ? 0.009 ops/ms TestAdler32.testAdler32Update 65536 thrpt 25 6.658 ? 0.002 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2040891223 From kvn at openjdk.org Sat Apr 6 03:44:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 6 Apr 2024 03:44:09 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v6] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Fri, 5 Apr 2024 19:53:31 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. >> >> The performance data using the ComputePI.java benchmark (part of this PR) is as follows: >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 >> ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 >> ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 >> ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 >> ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 >> ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 >> >> >> >> Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup >> -- | -- | -- | -- >> ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 >> ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 >> ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 >> ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 >> ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 >> ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change reg to vlReg to fix KNL failure My testing of v05 passed - no new failures. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18503#pullrequestreview-1984433915 From amitkumar at openjdk.org Sat Apr 6 08:44:09 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 6 Apr 2024 08:44:09 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 15:10:25 GMT, Sidraya Jayagond wrote: >> Fix sign extension on 4 byte load from argument stack slot to GPR. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > copyright header I have done `{tier1} X {fastdebug, slowdebug, release}` test and do not see any new failure appearing. Thanks Sid for fixing it. ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/18601#pullrequestreview-1984465276 From duke at openjdk.org Sat Apr 6 12:59:12 2024 From: duke at openjdk.org (altrisi) Date: Sat, 6 Apr 2024 12:59:12 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:04:26 GMT, Tobias Hartmann wrote: >> After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Obsoleting the flag Should the `UseNeon` enum constant this code referenced be removed too? https://github.com/openjdk/jdk/blob/5d8119abefcc0958cce949247e8232c5319aa304/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/aarch64/AArch64.java#L191-L193 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2041075052 From aph at openjdk.org Sat Apr 6 13:22:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 6 Apr 2024 13:22:08 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 16:43:18 GMT, Magnus Ihse Bursie wrote: > I apologize for the late reply. I've been just working spotty hours due to spring break. I apologize for my bad temper. ? Thanks to everyone working on this. I still think that hsdis ought not to have any dependencies on HotSpot, but I'm not going to be fanatical about it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2041080237 From jbhateja at openjdk.org Mon Apr 8 02:35:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Apr 2024 02:35:33 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: > This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. > > Kindly review and approve. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Cleanup predicates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18464/files - new: https://git.openjdk.org/jdk/pull/18464/files/05ccc786..9154491a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18464&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18464&range=00-01 Stats: 9 lines in 3 files changed: 0 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18464.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18464/head:pull/18464 PR: https://git.openjdk.org/jdk/pull/18464 From jbhateja at openjdk.org Mon Apr 8 02:38:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Apr 2024 02:38:59 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 16:40:31 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup predicates. > > src/hotspot/cpu/x86/x86.ad line 1755: > >> 1753: case Op_ClearArray: >> 1754: if ((size_in_bits != 512) && !VM_Version::supports_avx512vl()) { >> 1755: return false; > > Please add comment to clarify condition. I am reading it as ClearArray will not be supported for NOT avx512 because we can have vector length 512 bits for not avx512. This is only pertinent to known sized clear arrays which are optimized for AVX-512 targets, we already have such a check as part of matcher predicate, so removing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18464#discussion_r1555163994 From thartmann at openjdk.org Mon Apr 8 05:09:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 05:09:08 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 16:05:23 GMT, Roman Kennke wrote: >> This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. >> >> Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? >> >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. >> >> Testing: >> - [x] manual test with dacapo as provided in the bug report >> - [ ] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Shuffle code to preserve short-jump on non-assert paths > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. Although a time out is a bit of a confusing failure mode for the test, I think it would be better than no test at all, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18657#issuecomment-2041871538 From thartmann at openjdk.org Mon Apr 8 05:42:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 05:42:20 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v3] In-Reply-To: References: Message-ID: > After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed UseNeon from AArch64.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18648/files - new: https://git.openjdk.org/jdk/pull/18648/files/5d8119ab..370a7b9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18648&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18648&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18648.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18648/head:pull/18648 PR: https://git.openjdk.org/jdk/pull/18648 From thartmann at openjdk.org Mon Apr 8 05:42:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 05:42:20 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: On Sat, 6 Apr 2024 12:56:38 GMT, altrisi wrote: > Should the `UseNeon` enum constant this code referenced be removed too? Good catch. Removed as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2041903521 From thartmann at openjdk.org Mon Apr 8 05:42:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 05:42:20 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v2] In-Reply-To: References: Message-ID: <1IcFsB-2crYE2zvev2GiDPcDSVK1KpM3VEoOudEx2Eo=.aadf7de2-6287-4ac0-af4e-4cd3ed1b04bd@github.com> On Fri, 5 Apr 2024 18:09:27 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Obsoleting the flag > > Good Thanks for the reviews, @vnkozlov, @altrisi! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2041903919 From rehn at openjdk.org Mon Apr 8 06:19:09 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 8 Apr 2024 06:19:09 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v2] In-Reply-To: References: Message-ID: On Sat, 6 Apr 2024 13:18:54 GMT, Andrew Haley wrote: > Thanks to everyone working on this. I still think that hsdis ought not to have any dependencies on HotSpot, but I'm not going to be fanatical about it. I agree, good :) If everyone is 'okay' with this, can someone do the second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2041943627 From chagedorn at openjdk.org Mon Apr 8 06:36:14 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Apr 2024 06:36:14 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v15] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 17:36:46 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Fix typo > - Formatting, use @run driver, remove legacy header comments > - Merge remote-tracking branch 'josh/licm' into licm > - Merge branch 'master' into licm > - @run driver -> @run main > - Add tests for add/sub reassociation > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - Merge branch 'master' into licm > - ... and 14 more: https://git.openjdk.org/jdk/compare/7ab2c08f...1b27aae4 Thanks for making the changes. One more minor thing, otherwise, looks good! src/hotspot/share/opto/loopTransform.cpp line 277: > 275: } > 276: for (DUIterator i = n->outs(); n->has_out(i); i++) { > 277: BoolNode *bool_out = n->out(i)->isa_Bool(); Suggestion: BoolNode* bool_out = n->out(i)->isa_Bool(); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1985540351 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1555286670 From thartmann at openjdk.org Mon Apr 8 07:33:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 07:33:11 GMT Subject: Integrated: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias This pull request has now been integrated. Changeset: d1aad712 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d1aad71209092013a89b3b85a258dd4d2e31224a Stats: 10 lines in 1 file changed: 0 ins; 9 del; 1 mod 8321204: C2: assert(false) failed: node should be in igvn hash table Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18647 From thartmann at openjdk.org Mon Apr 8 07:33:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 07:33:11 GMT Subject: RFR: 8321204: C2: assert(false) failed: node should be in igvn hash table In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 07:45:20 GMT, Tobias Hartmann wrote: > Over recent years, we spuriously hit the "node should be in igvn hash table" assert but could never reproduce it. I suspect that this is an extremely rare case where `Node::hash` is 0 which is equivalent to `Node::NO_HASH` and we therefore return false from `NodeHash::hash_delete`. The hash depends on "random" factors like `Node` pointers and it being zero is unfortunate but harmless: > https://github.com/openjdk/jdk/blob/f26e4308992d989d71e7fbfaa3feb95f0ea17c06/src/hotspot/share/opto/node.hpp#L1114-L1118 > > I verified this by hardcoding the `ConstraintCastNode::hash()` to 0 which immediately triggers the assert. > > Although the value of the assert which was added by [JDK-8024070](https://bugs.openjdk.org/browse/JDK-8024070) in JDK 9 is questionable, I think it's worth keeping it but revert the additional printing added by [JDK-8312218](https://bugs.openjdk.org/browse/JDK-8312218). > > I also executed some extensive testing with `Node::hash` hardcoded to zero and found some additional issues that I will address with separate bugs. We might want to add a stress flag to trigger this and similar hash collisions more often. > > Thanks, > Tobias Thanks for the review, Vladimir and Dean! > Is it worth fixing the hash function too, so it never returns 0? The hash lookup probably only cares about the low bits anyway, so mapping 0 to 0x80000000 seems safe, or unconditionally OR the value with 0x80000000. I think that's worth investigating separately. I added it to [JDK-8329777](https://bugs.openjdk.org/browse/JDK-8329777). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18647#issuecomment-2042048956 From rkennke at openjdk.org Mon Apr 8 08:19:32 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 Apr 2024 08:19:32 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v3] In-Reply-To: References: Message-ID: > This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. > > Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Testing: > - [x] manual test with dacapo as provided in the bug report > - [ ] tier1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add test-case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18657/files - new: https://git.openjdk.org/jdk/pull/18657/files/025755b2..f6231d8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=01-02 Stats: 38 lines in 1 file changed: 38 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18657/head:pull/18657 PR: https://git.openjdk.org/jdk/pull/18657 From rkennke at openjdk.org Mon Apr 8 08:19:33 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 Apr 2024 08:19:33 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v2] In-Reply-To: References: Message-ID: <6Sp_r7ebf0ZVeRovreKoME2yHN4UQteXU6TgF9RRWxo=.177ed3d5-3297-4a75-96c9-6241ce4a3f72@github.com> On Mon, 8 Apr 2024 05:06:28 GMT, Tobias Hartmann wrote: > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Although a time out is a bit of a confusing failure mode for the test, I think it would be better than no test at all, right? Right. I added a test-case with a short timeout. It fails without the change, and passes with it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18657#issuecomment-2042134453 From aboldtch at openjdk.org Mon Apr 8 08:44:10 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 Apr 2024 08:44:10 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v3] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 08:19:32 GMT, Roman Kennke wrote: >> This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. >> >> Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? >> >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. >> >> Testing: >> - [x] manual test with dacapo as provided in the bug report >> - [ ] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add test-case lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18657#pullrequestreview-1985801663 From aph at openjdk.org Mon Apr 8 09:32:00 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 8 Apr 2024 09:32:00 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v3] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 05:42:20 GMT, Tobias Hartmann wrote: >> After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed UseNeon from AArch64.java Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18648#pullrequestreview-1985908008 From tholenstein at openjdk.org Mon Apr 8 09:35:56 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 09:35:56 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v7] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with six additional commits since the last revision: - Save state in XML - remove unused - Printer.exportStates - saveState flag - Printer.java copyright - remove unused InputStream in Printer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/a3fe22a9..4aa3601b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=05-06 Stats: 1072 lines in 22 files changed: 470 ins; 348 del; 254 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 09:38:38 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 09:38:38 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v8] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: revert igv.sh ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/4aa3601b..2377ead6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=06-07 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From shade at openjdk.org Mon Apr 8 09:46:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Apr 2024 09:46:09 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v3] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 08:19:32 GMT, Roman Kennke wrote: >> This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. >> >> Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? >> >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. >> >> Testing: >> - [x] manual test with dacapo as provided in the bug report >> - [ ] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add test-case Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18657#pullrequestreview-1985937130 From rkennke at openjdk.org Mon Apr 8 10:37:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 Apr 2024 10:37:09 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v2] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 05:06:28 GMT, Tobias Hartmann wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Shuffle code to preserve short-jump on non-assert paths > >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Although a time out is a bit of a confusing failure mode for the test, I think it would be better than no test at all, right? @TobiHartmann could you check the test? I am not sure if the flags are reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18657#issuecomment-2042406834 From tholenstein at openjdk.org Mon Apr 8 10:52:45 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 10:52:45 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v9] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update RemoveAction.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/2377ead6..1c880edf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From chagedorn at openjdk.org Mon Apr 8 10:55:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Apr 2024 10:55:12 GMT Subject: Integrated: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:50:37 GMT, Christian Hagedorn wrote: > https://github.com/openjdk/jdk/pull/18293 started to replace `create_bool_from_template_assertion_predicate()` usages to fix an endless DFS traversal problem. This patch is a follow-up to replace the last usage of `create_bool_from_template_assertion_predicate()` in `clone_assertion_predicate_and_initialize()` to completely fix the problem. > > Depending on where `clone_assertion_predicate_and_initialize()` is called from, we need to clone the Template Assertion Predicate Expression differently: > - Create a new Template Assertion Predicate for a main loop: Clone everything except for the `OpaqueLoopInitNode` which needs to be replaced with a new `OpaqueLoopInitNode` (done with `clone_and_replace_init()`). > - Create an Initialized Assertion Predicate for all other cases: Clone everything except for the `OpaqueLoop*Nodes` which are replaced with an actual init and stride value (done with `clone_and_replace_init_and_stride()`). > > Note that it's incorrect to _not_ create new Template Assertion Predicates in case we split the loop further (for example after peeling a loop). This will be eventually fixed with [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981). > > I've extended the test added with https://github.com/openjdk/jdk/pull/18293 to also cover the now fixed cases. > > Thanks, > Christian This pull request has now been integrated. Changeset: fc18201b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/fc18201bbdac7ac7d78767c780d3efe5352ee77a Stats: 189 lines in 5 files changed: 103 ins; 82 del; 4 mod 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18628 From chagedorn at openjdk.org Mon Apr 8 10:55:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Apr 2024 10:55:12 GMT Subject: RFR: 8327111: Replace remaining usage of create_bool_from_template_assertion_predicate() which requires additional OpaqueLoop*Nodes transformation strategies [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 18:13:48 GMT, Vladimir Kozlov wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment and assert > > Looks good. Thanks @vnkozlov for your review! I will now merge everything into https://github.com/openjdk/jdk/pull/16877 and update the PR accordingly with the new code of JDK-8271109/8271110/8271111. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18628#issuecomment-2042438194 From tholenstein at openjdk.org Mon Apr 8 11:03:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 11:03:33 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v10] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/1c880edf..17cd1c5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=08-09 Stats: 981 lines in 28 files changed: 447 ins; 448 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 11:32:46 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 11:32:46 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v11] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: undo some unnecessary changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/17cd1c5e..3199ec77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=09-10 Stats: 234 lines in 25 files changed: 96 ins; 97 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From thartmann at openjdk.org Mon Apr 8 11:38:13 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 11:38:13 GMT Subject: Integrated: 8329749: Obsolete the unused UseNeon flag In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 08:07:39 GMT, Tobias Hartmann wrote: > After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 8648890f Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8648890f86fb3d869950614c97c2df648352168d Stats: 10 lines in 5 files changed: 1 ins; 7 del; 2 mod 8329749: Obsolete the unused UseNeon flag Reviewed-by: chagedorn, kvn, aph ------------- PR: https://git.openjdk.org/jdk/pull/18648 From thartmann at openjdk.org Mon Apr 8 11:38:13 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 11:38:13 GMT Subject: RFR: 8329749: Obsolete the unused UseNeon flag [v3] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 05:42:20 GMT, Tobias Hartmann wrote: >> After [JDK-8328264](https://bugs.openjdk.org/browse/JDK-8328264), the UseNeon flag is only used by JVMCI code and @dougxc confirmed that it can be removed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed UseNeon from AArch64.java Thanks for the review, Andrew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18648#issuecomment-2042517148 From thartmann at openjdk.org Mon Apr 8 11:57:11 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Apr 2024 11:57:11 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v3] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 08:19:32 GMT, Roman Kennke wrote: >> This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. >> >> Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? >> >> Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. >> >> Testing: >> - [x] manual test with dacapo as provided in the bug report >> - [ ] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add test-case Looks good but I think it's safer to use the default timeout because otherwise we risk a false-positive timeout on slow machines when running with additional flags like -XX:+DeoptimizeALot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18657#issuecomment-2042551146 From tholenstein at openjdk.org Mon Apr 8 12:12:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 12:12:33 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v12] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove invokeLater ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/3199ec77..8735e07a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=10-11 Stats: 31 lines in 3 files changed: 0 ins; 27 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 12:32:52 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 12:32:52 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v13] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: improved save() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/8735e07a..90dd36f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=11-12 Stats: 15 lines in 1 file changed: 2 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 12:37:40 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 12:37:40 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v14] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: requestProcessor instead of default ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/90dd36f6..2829412b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=12-13 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 12:48:51 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 12:48:51 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v15] In-Reply-To: References: Message-ID: <4GzvasQVxsayTxS1nURbvY-TkZq5-gIpheRmSo8ZhZ4=.b814018d-9f89-4391-83a2-879ce5cbabc5@github.com> > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the important **states** of IGV to a workspace directory: > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > ### Saving the state of IGV > - For IGV to save the state it needs a workspace, which can be passed in 3 different ways: > 1) Using a global variable : `IGV_WORKSPACE=path/to/igv_workspace ./igv.sh` > 2) Passed as an argument : `./igv.sh path/to/igv_workspace ` > 3) With default location `IdealGraphVisualizer/workspace` : `./igv.sh` > > Open IGV with the following example workspace (unzipped) [igv_workspace.zip](https://github.com/openjdk/jdk/files/14311092/igv_workspace.zip) should look something like this: > ![workspace](https://github.com/openjdk/jdk/assets/71546117/58da409d-fb02-4b21-8914-1cae9752b17f) > > > ### Workspace > A workspace is a directory where IGV saves imported graphs as _graphs.xml_ and opened graph tabs to _state.igv_. A workspace is loaded when IGV is opened. The current workspace is saved when IGV is closed or when the workspace is changed to a different directory. When changing a directory the state of the new workspace is loaded. > - Click here to select a different workspace directory > ![path](https://github.com/openjdk/jdk/assets/71546117/cff50c4d-fbc1-4112-916d-00f9ce14b27d) > > - Imports an XML file (group and graphs) into the current workspace > ![import XML](https://github.com/openjdk/jdk/assets/71546117/a92ae702-e599-4459-960d-849365dbaa1d) > > - saves the state of the current workspace > ![save_workspace](https://github.com/openjdk/jdk/assets/71546117/16759da6-367b-47ce-997b-f176b0cbfc0f) > - imported graphs (_graphs.xml_) > - opened graph tabs + extracted nodes (_state.igv_) > > > - Export the selected groups to a separate XML file > ![save_selected](https://github.com/openjdk/jdk/assets/71546117/20cc347c-7ff5-4181-8539-4ed585d5a0bc) > > - Delete the selected groups and graphs > ![delete_selected](https://github.com/openjdk/jdk/assets/71546117/1cb14b43-9b1a-4073-8964-075fc070a0b0) > > - Clear the workspace and delete all groups and graphs > ![clear_workspace](https://github.com/openjdk/jdk/assets/71546117/b40b1805-3f39-4d49-9bc2-0d538976cf25) > > > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjd... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update ImportAction.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/2829412b..40897182 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Mon Apr 8 14:03:25 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Apr 2024 14:03:25 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v16] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### What's new > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and open `graphs.xml` > opens the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when opening the `graphs.xml`: > > graph > > A new `` introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > - Open allows the user to open an XML file. In IGV there is either no XML opened `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > open > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > save > > > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file > - `Export`: Allows the user to save a subset (selected) ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 - added comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/40897182..b13a472c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=14-15 Stats: 31 lines in 1 file changed: 31 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From rkennke at openjdk.org Mon Apr 8 15:25:39 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 Apr 2024 15:25:39 GMT Subject: RFR: 8329726: Use non-short forward jumps in lightweight locking [v4] In-Reply-To: References: Message-ID: > This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. > > Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Testing: > - [x] manual test with dacapo as provided in the bug report > - [ ] tier1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use default timeout for test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18657/files - new: https://git.openjdk.org/jdk/pull/18657/files/f6231d8f..8f6d6d95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18657&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18657/head:pull/18657 PR: https://git.openjdk.org/jdk/pull/18657 From duke at openjdk.org Mon Apr 8 15:33:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 8 Apr 2024 15:33:11 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v6] In-Reply-To: References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: <_cJlCYoljfaPAYXYu09Jd3Md66fkbwaX44cmYPtfOUg=.8ff0d584-3149-4e48-802d-44a2a2548f90@github.com> On Sat, 6 Apr 2024 03:41:21 GMT, Vladimir Kozlov wrote: > My testing of v05 passed - no new failures. Thank you Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2043063929 From duke at openjdk.org Mon Apr 8 16:00:36 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Apr 2024 16:00:36 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v10] In-Reply-To: References: Message-ID: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review some formatting suggestions from @shipilev Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/5ff6bef5..558fcaa6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=08-09 Stats: 8 lines in 4 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Mon Apr 8 16:29:14 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Apr 2024 16:29:14 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: Message-ID: <1tnvXHkbK3bwbuFMyXb9mUAKIgAFvfcA3v3pOR9PQRw=.0218898e-b611-4168-8b5e-5edf4782b563@github.com> On Fri, 5 Apr 2024 20:26:08 GMT, Dean Long wrote: > This case and the next case could use a more detailed explanation. We have 4 different possible inputs: {StoreStore, Release} x {w/ Precedent, w/o Precedent} and 2 possible outcomes: worklist or record_for_optimizer. We can eliminate barriers when it's precedent is an escaping object. If the barrier does not have a precedent, we cannot elide it, which is why we don't include it in the worklist / `record_for_optimizer`. I think its confusing because StoreStore barriers are optimized in `escape.cpp`, while `Release` barriers are optimized in [memnode.cpp](https://github.com/openjdk/jdk/blob/115f4193eb39d8469ac8127e38798a3f041c22e0/src/hotspot/share/opto/memnode.cpp#L3431). I would have preferred if all escape-based on optimizations of barriers were just done in one place. > Previously, I believe this optimization did not apply to the end-of-ctor-with-final barrier, but now it does. This is correct. End of ctor did not have `StoreStore` barriers. They had `Release` barriers, which escape analysis already handles. We have to check `n->req() > MemBarNode::Precedent`, or else we run into assertion errors [here](https://github.com/openjdk/jdk/blob/9ac3b77d0d69227ded6ef3843ebf5c18ceee37b5/src/hotspot/share/opto/escape.cpp#L2590) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1556115517 From duke at openjdk.org Mon Apr 8 16:36:11 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Apr 2024 16:36:11 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: <_QvfSuxwJg70LjurvtuZlujGzpSsptGx5Lfblj-ejLg=.37bd2ae1-4f9d-44f2-b1e8-731f4ec8fc32@github.com> References: <_QvfSuxwJg70LjurvtuZlujGzpSsptGx5Lfblj-ejLg=.37bd2ae1-4f9d-44f2-b1e8-731f4ec8fc32@github.com> Message-ID: On Fri, 5 Apr 2024 23:48:52 GMT, Chen Liang wrote: > On a side note, would it be safe to replace explicit constructor emulation release barriers (Unsafe.storeFence) elsewhere in the JDK with storeStore, like in ClassValue, MutableCallSite, ClassSpecializer.Factory, ObjectInputStream, and Properties? I don't think its safe in general. Maybe for some of the use cases it would be desirable. `Unsafe.storeFence` is marked for removal anyway. Consumers can use `VarHandle.storeStoreFence` if it fits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2043192772 From duke at openjdk.org Mon Apr 8 18:42:43 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Apr 2024 18:42:43 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v16] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp formatting Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/1b27aae4..3bcab3bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From duke at openjdk.org Mon Apr 8 18:42:46 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Apr 2024 18:42:46 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v15] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 06:32:02 GMT, Christian Hagedorn wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: >> >> - Fix typo >> - Formatting, use @run driver, remove legacy header comments >> - Merge remote-tracking branch 'josh/licm' into licm >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Merge branch 'master' into licm >> - Merge branch 'master' into licm >> - Merge branch 'master' into licm >> - ... and 14 more: https://git.openjdk.org/jdk/compare/6bf3451d...1b27aae4 > > src/hotspot/share/opto/loopTransform.cpp line 277: > >> 275: } >> 276: for (DUIterator i = n->outs(); n->has_out(i); i++) { >> 277: BoolNode *bool_out = n->out(i)->isa_Bool(); > > Suggestion: > > BoolNode* bool_out = n->out(i)->isa_Bool(); Thanks, committed suggestion through GitHub ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1556262849 From sviswanathan at openjdk.org Mon Apr 8 18:44:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 8 Apr 2024 18:44:21 GMT Subject: RFR: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used [v5] In-Reply-To: <6dC7-zzu-QbRo_aRxEFiHqqOBOyvopCnwfPdFBi9du0=.1f8d1702-d97b-496c-87d8-1614c9ea3d49@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> <-q6u19dMyTqqqv9nIbQ4Q956YHuNP5_fYwLzGhqY5UM=.917322fe-493a-4b45-895b-6a625d14ada3@github.com> <6dC7-zzu-QbRo_aRxEFiHqqOBOyvopCnwfPdFBi9du0=.1f8d1702-d97b-496c-87d8-1614c9ea3d49@github.com> Message-ID: <0EkARwrw_gHwluXWo-DXsRYR5n_NLkH0TYewuxgHFzY=.f556f2fb-4ca1-40bc-afe4-88731882df9c@github.com> On Fri, 5 Apr 2024 22:30:35 GMT, Sandhya Viswanathan wrote: >> @vnkozlov If I understand the proposal from @merykitty correctly, the suggestion is to reserve xmm15 as non allocatable throughout. This sounds like a big overhead for cases where every xmm register is usable say in a Vector API kernel. From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. > >> @sviswa7 >> >> Yes with my proposal we are losing 1 out of 16 registers, which is a cost. But emitting an additional instruction for every conversion from integer to floating point values is also a cost. A more conservative solution is to use the last register in the allocation chunk which will often be unused, and when it is used, the function should be crowded with other instructions such that this particular dependency will not have a profound effect. >> >> > From Vamsi's microbenchmark runs, he has clearly shown that the gain of his optimization is way more than any overhead of doing pxor just before the converts. >> >> You cannot reach that conclusion, we are trading off here, and this benchmark is chosen because it is bottlenecked by that particular dependency. The situation may not be the same for the other cases. >> >> Cheers, Quan Anh > > @merykitty I would like to disagree, decision to reserve a register for entire duration of program cannot be taken lightly. > @sviswa7 I didn't disagree with you, I just made a more conservative proposal that uses `xmm15` here without reserving it, what do you think? Let us go with Vladimir's executive decision for now and integrate this. Any improvements in subsequent PRs is always welcome. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18503#issuecomment-2043428518 From duke at openjdk.org Mon Apr 8 18:44:22 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 8 Apr 2024 18:44:22 GMT Subject: Integrated: 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used In-Reply-To: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> References: <8HUy9c75ZHxW1CyRw6J-xXBPNbtB7wrxoM6rha0ftNU=.d25e5130-84c2-4046-b357-eacfe6caedc0@github.com> Message-ID: On Tue, 26 Mar 2024 23:19:16 GMT, Srinivas Vamsi Parasa wrote: > The goal of this small PR is improve the performance of convert instructions and address the slowdown when AVX>0 is used. > > The performance data using the ComputePI.java benchmark (part of this PR) is as follows: > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=3) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 511.34 | 510.989 | 1.0 > ComputePI.compute_pi_flt_dbl | 2024.06 | 518.695 | 3.9 > ComputePI.compute_pi_int_dbl | 695.482 | 453.054 | 1.5 > ComputePI.compute_pi_int_flt | 799.268 | 449.83 | 1.8 > ComputePI.compute_pi_long_dbl | 802.992 | 454.891 | 1.8 > ComputePI.compute_pi_long_flt | 628.62 | 463.617 | 1.4 > > > > Benchmark (ns/op) | Stock JDK | This PR (AVX=0) | Speedup > -- | -- | -- | -- > ComputePI.compute_pi_dbl_flt | 473.778 | 472.529 | 1.0 > ComputePI.compute_pi_flt_dbl | 536.004 | 538.418 | 1.0 > ComputePI.compute_pi_int_dbl | 458.08 | 460.245 | 1.0 > ComputePI.compute_pi_int_flt | 477.305 | 476.975 | 1.0 > ComputePI.compute_pi_long_dbl | 455.132 | 455.064 | 1.0 > ComputePI.compute_pi_long_flt | 474.734 | 476.571 | 1.0 This pull request has now been integrated. Changeset: 7e5ef79f Author: vamsi-parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/7e5ef79f953877cde6389998edcfe3fecb9b900e Stats: 223 lines in 4 files changed: 217 ins; 0 del; 6 mod 8323116: [REDO] Computational test more than 2x slower when AVX instructions are used Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18503 From kvn at openjdk.org Mon Apr 8 19:25:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 Apr 2024 19:25:10 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. This looks good. You need second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18464#pullrequestreview-1987253711 From vlivanov at openjdk.org Mon Apr 8 21:55:10 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 8 Apr 2024 21:55:10 GMT Subject: RFR: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 02:35:33 GMT, Jatin Bhateja wrote: >> This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. >> >> Kindly review and approve. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup predicates. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18464#pullrequestreview-1987463585 From jbhateja at openjdk.org Tue Apr 9 01:40:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Apr 2024 01:40:14 GMT Subject: Integrated: 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 In-Reply-To: References: Message-ID: On Sun, 24 Mar 2024 09:58:59 GMT, Jatin Bhateja wrote: > This bug fix patch tightens the predication check for small constant length clear array pattern and relaxes associated feature checks. Modified few comments for clarity. > > Kindly review and approve. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: fbc1e666 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/fbc1e6661e26c30a9cf7bc57afd70fde1c642bcb Stats: 19 lines in 5 files changed: 2 ins; 3 del; 14 mod 8328181: C2: assert(MaxVectorSize >= 32) failed: vector length should be >= 32 Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/18464 From luhenry at openjdk.org Tue Apr 9 08:09:10 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 9 Apr 2024 08:09:10 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: <42T5rhl-oMvCAsHetMt7e44qpi4YmEh-JZasLQcMfRI=.00f60086-d586-46ff-be03-02e6d9aec719@github.com> On Fri, 5 Apr 2024 10:45:24 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use JNIEXPORT Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18400#pullrequestreview-1988392854 From aph at openjdk.org Tue Apr 9 08:12:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 9 Apr 2024 08:12:11 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v10] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:44:34 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use java library code of Math.round as golden value test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 64: > 62: > 63: int golden_round(float a) { > 64: // below code is copied from java.base/share/classes/java/lang/Math.java Suggestion: static int golden_round(float a) { // below code is copied from java.base/share/classes/java/lang/Math.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557173877 From mli at openjdk.org Tue Apr 9 08:43:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 08:43:30 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v11] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: make methods static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/a8c4172d..ec51c774 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=09-10 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From mli at openjdk.org Tue Apr 9 08:43:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 08:43:30 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v10] In-Reply-To: References: Message-ID: <_pXN7m_crQT7FbP9J6NCfWJ6uLjG_gzugAnEs8fBZqY=.f1635113-9519-4485-974f-5f99a64b1044@github.com> On Tue, 9 Apr 2024 08:09:51 GMT, Andrew Haley wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> use java library code of Math.round as golden value > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatRandom.java line 64: > >> 62: >> 63: int golden_round(float a) { >> 64: // below code is copied from java.base/share/classes/java/lang/Math.java > > Suggestion: > > static int golden_round(float a) { > // below code is copied from java.base/share/classes/java/lang/Math.java Thanks for the suggestion, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557228307 From aph at openjdk.org Tue Apr 9 08:48:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 9 Apr 2024 08:48:03 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> Message-ID: <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> On Fri, 5 Apr 2024 10:26:06 GMT, Hamlin Li wrote: >>> > There's no need for randomness or arrays or special values in the 32-bit case. You can, and should, test the entire 32-bit range in a few lines of code by using floatBitsToInt. >>> >>> In previous discussion, there are several reasons why it's implemented in this way: >>> >>> 1. test the whole range of 32 bits is slow, and even slow for a 64 ranges double. >> >> I guess we could try it for 32 bit floats. But it would take a while. If we can make sure it does not take much more than one minute, we can do that. But of course 64 bit doubles would be infeasible. > > As I remember, that's not the case in my local environment, i.e. it will take longer time. If it takes longer than a few seconds, there's something wrong with your computer or your compiler. It's only 4G tests. I get 6,255.47 msec task-clock:u # 1.057 CPUs utilized 19,681,407,235 cycles:u # 3.146 GHz (95.55%) 110,397,691,854 instructions:u # 5.61 insn per cycle (95.55%) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557235952 From mli at openjdk.org Tue Apr 9 08:57:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 08:57:02 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> Message-ID: On Tue, 9 Apr 2024 08:45:38 GMT, Andrew Haley wrote: >> As I remember, that's not the case in my local environment, i.e. it will take longer time. > > If it takes longer than a few seconds, there's something wrong with your computer or your compiler. It's only 4G tests. I get > > > 6,255.47 msec task-clock:u # 1.057 CPUs utilized > 19,681,407,235 cycles:u # 3.146 GHz (95.55%) > 110,397,691,854 instructions:u # 5.61 insn per cycle (95.55%) Ah, maybe you're right, I was using qemu for arm test. Let me try it later on a real arm machine later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557249800 From chagedorn at openjdk.org Tue Apr 9 09:03:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Apr 2024 09:03:05 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v16] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 18:42:43 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > formatting > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1988505979 From rehn at openjdk.org Tue Apr 9 09:04:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 9 Apr 2024 09:04:10 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: <42T5rhl-oMvCAsHetMt7e44qpi4YmEh-JZasLQcMfRI=.00f60086-d586-46ff-be03-02e6d9aec719@github.com> References: <42T5rhl-oMvCAsHetMt7e44qpi4YmEh-JZasLQcMfRI=.00f60086-d586-46ff-be03-02e6d9aec719@github.com> Message-ID: On Tue, 9 Apr 2024 08:06:19 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Use JNIEXPORT > > Marked as reviewed by luhenry (Committer). Thank you @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2044496543 From ihse at openjdk.org Tue Apr 9 09:17:09 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 9 Apr 2024 09:17:09 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:45:24 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use JNIEXPORT FYI: The 2-reviewer rule only applies to Hotspot (and some other specific parts of the code base), and not generally for JDK fixes. So you had been fine to push with just one reviewer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2044521203 From rehn at openjdk.org Tue Apr 9 09:42:02 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 9 Apr 2024 09:42:02 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:45:24 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use JNIEXPORT Yea, I know, but if hsdis is not an external library I considered it part of hotspot as it is the only use, hence I want two reviews. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18400#issuecomment-2044575768 From roland at openjdk.org Tue Apr 9 09:53:34 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Apr 2024 09:53:34 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg Message-ID: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> The crash occurs when a virtual call is devirtualized late. Inlining is not attempted then. So no new inlining diagnostic message is produced which causes the assert failure. There's some valuable information that can be reported though (the call is devirtualized). ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/18685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18685&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327741 Stats: 85 lines in 2 files changed: 85 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18685/head:pull/18685 PR: https://git.openjdk.org/jdk/pull/18685 From mli at openjdk.org Tue Apr 9 10:09:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 10:09:11 GMT Subject: RFR: 8328614: hsdis: dlsym can't find decode symbol [v3] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 10:45:24 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. >> Tested with gcc and clang, and llvm and binutils backend. >> >> I didn't find any use of the "DLL_ENTRY", so I removed it. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Use JNIEXPORT Looks good. Although I'm not the right person to review this, thanks for explanation and discussion. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18400#pullrequestreview-1988652730 From fyang at openjdk.org Tue Apr 9 11:42:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 Apr 2024 11:42:13 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: Message-ID: On Thu, 14 Mar 2024 11:41:52 GMT, Hamlin Li wrote: >> Hi, >> Can you have a review on this patch to add RoundVF/RoundDF intrinsics? >> Thanks! >> >> ## Tests >> >> test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java >> >> test/jdk/java/lang/Math/RoundTests.java > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - merge master > - fix space > - add tests > - add test cases > - v2: (src + 0.5) + rdn > - Fix corner cases > - Merge branch 'master' into round-F+D-v > - refine code > - RoundVF/D: Initial commit Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/riscv_v.ad line 3670: > 3668: instruct vround_f(vReg dst, vReg src, fRegF tmp, vRegMask_V0 v0) %{ > 3669: match(Set dst (RoundVF src)); > 3670: effect(TEMP_DEF dst, TEMP tmp); You might want to add `TEMP v0` in effect as v0 is clobbered in `java_round_float_v`. Similar for `vround_d`. src/hotspot/cpu/riscv/riscv_v.ad line 3675: > 3673: __ csrwi(CSR_FRM, C2_MacroAssembler::rdn); > 3674: BasicType bt = Matcher::vector_element_basic_type(this); > 3675: __ vsetvli_helper(bt, Matcher::vector_length(this)); I think the code will be more readable if you put `csrwi` and `vsetvli_helper` instructions in `java_round_float_v`. We can add `BasicType bt` as parameter for `java_round_float_v`. Similar for vround_d. src/hotspot/cpu/riscv/riscv_v.ad line 3678: > 3676: __ java_round_float_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), > 3677: as_FloatRegister($tmp$$reg)); > 3678: __ csrwi(CSR_FRM, C2_MacroAssembler::rne); I don't think it's necessary to restore `CSR_FRM` to `rne` after `java_round_float_v`. As I remembered, we always set the required rounding mode in other places where it makes an effect. Similar for vround_d. ------------- PR Review: https://git.openjdk.org/jdk/pull/17745#pullrequestreview-1988809339 PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1557488393 PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1557492135 PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1557494390 From tholenstein at openjdk.org Tue Apr 9 12:19:31 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 9 Apr 2024 12:19:31 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v17] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### What's new > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and open `graphs.xml` > opens the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when opening the `graphs.xml`: > > graph > > A new `` introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - Open allows the user to open an XML file. In IGV there is either no XML opened `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > > > import:export > > - `Import`: Allows the ... Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 39 additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8324950 - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 - Update ImportAction.java - added comments - requestProcessor instead of default - improved save() - remove invokeLater - undo some unnecessary changes - copyright year - Update RemoveAction.java - ... and 29 more: https://git.openjdk.org/jdk/compare/8008710e...ed4cb76b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/b13a472c..ed4cb76b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=15-16 Stats: 509150 lines in 5545 files changed: 61779 ins; 112229 del; 335142 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From duke at openjdk.org Tue Apr 9 13:13:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Apr 2024 13:13:22 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v3] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request incrementally with two additional commits since the last revision: - Vectorize intrinsic - Use zext.h instructions when possible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/b9512458..a57a9046 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=01-02 Stats: 162 lines in 3 files changed: 96 ins; 22 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From duke at openjdk.org Tue Apr 9 13:13:22 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Apr 2024 13:13:22 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: References: Message-ID: On Sat, 6 Apr 2024 02:24:04 GMT, Fei Yang wrote: >> ArsenyBochkarev has updated the pull request incrementally with eight additional commits since the last revision: >> >> - Dispose of some unneeded instructions >> - Move buf_end up >> - Add missing instructions for accum function split >> - Prettify labels and accum function >> - Split accum function >> - Eliminate L_nmax loop counter >> - Move repeating code under function >> - Add `enter` and `leave` > > I witnessed performance regression on unmatched board when count > 2048. > JMH numbers: > > Before: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32.testAdler32Update 64 thrpt 25 1050.761 ? 54.862 ops/ms > TestAdler32.testAdler32Update 128 thrpt 25 953.858 ? 42.102 ops/ms > TestAdler32.testAdler32Update 256 thrpt 25 821.011 ? 21.154 ops/ms > TestAdler32.testAdler32Update 512 thrpt 25 624.207 ? 19.724 ops/ms > TestAdler32.testAdler32Update 1024 thrpt 25 436.040 ? 5.875 ops/ms > TestAdler32.testAdler32Update 2048 thrpt 25 265.020 ? 3.058 ops/ms > TestAdler32.testAdler32Update 5012 thrpt 25 124.934 ? 0.799 ops/ms > TestAdler32.testAdler32Update 8192 thrpt 25 70.026 ? 0.243 ops/ms > TestAdler32.testAdler32Update 16384 thrpt 25 35.885 ? 0.055 ops/ms > TestAdler32.testAdler32Update 32768 thrpt 25 16.883 ? 0.027 ops/ms > TestAdler32.testAdler32Update 65536 thrpt 25 7.648 ? 0.006 ops/ms > > After: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32.testAdler32Update 64 thrpt 25 4360.280 ? 39.921 ops/ms > TestAdler32.testAdler32Update 128 thrpt 25 2766.595 ? 16.027 ops/ms > TestAdler32.testAdler32Update 256 thrpt 25 1634.373 ? 5.412 ops/ms > TestAdler32.testAdler32Update 512 thrpt 25 880.028 ? 1.463 ops/ms > TestAdler32.testAdler32Update 1024 thrpt 25 457.724 ? 0.296 ops/ms > TestAdler32.testAdler32Update 2048 thrpt 25 233.605 ? 0.072 ops/ms > TestAdler32.testAdler32Update 5012 thrpt 25 96.610 ? 0.020 ops/ms > TestAdler32.testAdler32Update 8192 thrpt 25 59.275 ? 0.012 ops/ms > TestAdler32.testAdler32Update 16384 thrpt 25 29.726 ? 0.004 ops/ms > TestAdler32.testAdler32Update 32768 thrpt 25 14.736 ? 0.009 ops/ms > TestAdler32.testAdler32Update 65536 thrpt 25 6.658 ? 0.002 ops/ms @RealFYang Hi, thanks for pointing out! To achieve additional acceleration, I did a vectorization and re-measured performance on Kendryte K230 with RVV 1.0 enabled: Disabled intrinsic: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | -------------------------------------- | ---------- | -------- | ------- | ------ | ------- | --------- | | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 1867.257 | 10.034 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 1651.408 | 10.354 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1345.505 | 4.847 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 976.550 | 3.889 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 634.572 | 1.256 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 371.763 | 0.588 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 168.774 | 0.147 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 106.578 | 0.135 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 54.216 | 0.097 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 25.744 | 0.025 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 12.992 | 0.064 | ops/ms | Enabled intrinsic: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | -------------------------------------- | ---------- | -------- | ------- | ------ | ------- | --------- | | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 7177.572 | 13.724 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 4724.756 | 6.231 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 2813.707 | 2.464 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1557.127 | 1.325 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 821.303 | 1.480 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 422.749 | 0.333 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 175.323 | 0.154 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 117.811 | 0.157 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 58.990 | 0.081 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 28.827 | 0.066 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 14.773 | 0.116 | ops/ms | It seems to me that there's a huge room for improvement in the current implementation. BTW, the data I used as a comparison from T-Head board was recorded a few months ago. Is it the code generation that has improved significantly? Or it's just me making some kind of mistake in measurements? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2045145255 From duke at openjdk.org Tue Apr 9 13:25:35 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Apr 2024 13:25:35 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v4] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Dispose of trailing whitespace - Vectorize intrinsic - Use zext.h instructions when possible - Dispose of some unneeded instructions - Move buf_end up - Add missing instructions for accum function split - Prettify labels and accum function - Split accum function - Eliminate L_nmax loop counter - Move repeating code under function - ... and 2 more: https://git.openjdk.org/jdk/compare/635cb3c9...d06c15ca ------------- Changes: https://git.openjdk.org/jdk/pull/18382/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=03 Stats: 276 lines in 3 files changed: 276 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From mli at openjdk.org Tue Apr 9 13:33:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 13:33:01 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> Message-ID: On Tue, 9 Apr 2024 08:53:56 GMT, Hamlin Li wrote: >> If it takes longer than a few seconds, there's something wrong with your computer or your compiler. It's only 4G tests. I get >> >> >> 6,255.47 msec task-clock:u # 1.057 CPUs utilized >> 19,681,407,235 cycles:u # 3.146 GHz (95.55%) >> 110,397,691,854 instructions:u # 5.61 insn per cycle (95.55%) > > Ah, maybe you're right, I was using qemu for arm test. > Let me try it later on a real arm machine later. An update, I ran a modified Float test on a AWS `c7gn.2xlarge` EC2 instance (with 8 vCPU, 16G mem), and it tooks more than 20 minutes to finish. So it seems too long for an automatic test? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557635552 From duke at openjdk.org Tue Apr 9 13:51:30 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Apr 2024 13:51:30 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v5] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: Dispose of trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/d06c15ca..8a74349c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From aph at openjdk.org Tue Apr 9 13:58:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 9 Apr 2024 13:58:12 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> Message-ID: On Tue, 9 Apr 2024 13:30:23 GMT, Hamlin Li wrote: >> Ah, maybe you're right, I was using qemu for arm test. >> Let me try it later on a real arm machine later. > > An update, I ran a modified Float test on a AWS `c7gn.2xlarge` EC2 instance (with 8 vCPU, 16G mem), and it tooks more than 20 minutes to finish. So it seems too long for an automatic test? [foo.zip](https://github.com/openjdk/jdk/files/14919424/foo.zip) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557673445 From aph at openjdk.org Tue Apr 9 13:58:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 9 Apr 2024 13:58:13 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> Message-ID: <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> On Tue, 9 Apr 2024 13:55:03 GMT, Andrew Haley wrote: >> An update, I ran a modified Float test on a AWS `c7gn.2xlarge` EC2 instance (with 8 vCPU, 16G mem), and it tooks more than 20 minutes to finish. So it seems too long for an automatic test? > > [foo.zip](https://github.com/openjdk/jdk/files/14919424/foo.zip) Does that work for you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557674353 From rkennke at openjdk.org Tue Apr 9 14:53:15 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 Apr 2024 14:53:15 GMT Subject: Integrated: 8329726: Use non-short forward jumps in lightweight locking In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:33:33 GMT, Roman Kennke wrote: > This turns a few short-jumps to long-jumps in x86 lightweight locking code paths. When running with -XX:+ShowMessageBoxOnError, MA::stop() generates more code and jccb is not sufficient to address this. > > Two of the jccb are in ASSERT path anyway. However, another is also in a product path. We *could* generate jccb or jcc conditionally on ShowMessageBoxOnError, however, I don't think it is worth the trouble. WDYT? > > Unfortunately, I could not make a simple test-case, because ShowMessageBoxOnError stops and waits on error, which would make jtreg time-out. > > Testing: > - [x] manual test with dacapo as provided in the bug report > - [ ] tier1 This pull request has now been integrated. Changeset: 2e925f26 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/2e925f263d5a9a69f21e0c12bd71242fdff084cd Stats: 45 lines in 2 files changed: 41 ins; 0 del; 4 mod 8329726: Use non-short forward jumps in lightweight locking Reviewed-by: shade, kvn, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/18657 From rehn at openjdk.org Tue Apr 9 15:04:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 9 Apr 2024 15:04:19 GMT Subject: Integrated: 8328614: hsdis: dlsym can't find decode symbol In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 16:17:36 GMT, Robbin Ehn wrote: > Hi, please consider. > > [8327045](https://bugs.openjdk.org/browse/JDK-8327045) hide these symbols. > Tested with gcc and clang, and llvm and binutils backend. > > I didn't find any use of the "DLL_ENTRY", so I removed it. > > Thanks, Robbin This pull request has now been integrated. Changeset: 1e02a13a Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/1e02a13a7f02a6fe9aac38b93935bcc238f7d227 Stats: 18 lines in 5 files changed: 8 ins; 8 del; 2 mod 8328614: hsdis: dlsym can't find decode symbol Reviewed-by: ihse, luhenry, mli ------------- PR: https://git.openjdk.org/jdk/pull/18400 From tholenstein at openjdk.org Tue Apr 9 15:06:42 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 9 Apr 2024 15:06:42 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v18] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### What's new > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and open `graphs.xml` > opens the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when opening the `graphs.xml`: > > graph > > A new `` introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - Open allows the user to open an XML file. In IGV there is either no XML opened `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > > > import:export > > - `Import`: Allows the ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 - fix diffgraph ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/ed4cb76b..9bf13ba5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=16-17 Stats: 16 lines in 1 file changed: 5 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From galder at openjdk.org Tue Apr 9 15:54:06 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 9 Apr 2024 15:54:06 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: <9Gv-j8cJhIONZvbYT_V473KE4_T6FPVjvNW4x9iKpU0=.713d0cba-f297-4af3-b744-cc0310e616b4@github.com> On Fri, 5 Apr 2024 02:16:55 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into topic.0131.c1-array-clone >> - Merge branch 'master' into topic.0131.c1-array-clone >> - Reserve necessary frame map space for clone use cases >> - 8302850: C1 primitive array clone intrinsic in graph >> >> * Combine array length, new type array and arraycopy for clone in c1 graph. >> * Add OmitCheckFlags to skip arraycopy checks. >> * Instantiate ArrayCopyStub only if necessary. >> * Avoid zeroing newly created arrays for clone. >> * Add array null after c1 clone compilation test. >> * Pass force reexecute to intrinsic via value stack. >> This is needed to be able to deoptimize correctly this intrinsic. >> * When new type array or array copy are used for the clone intrinsic, >> their state needs to be based on the state before for deoptimization >> to work as expected. >> - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86" >> >> This reverts commit fe5d916724614391a685bbef58ea939c84197d07. >> - 8302850: Link code emit infos for null check and alloc array >> - 8302850: Null check array before getting its length >> >> * Added a jtreg test to verify the null check works. >> Without the fix this test fails with a SEGV crash. >> - 8302850: Force reexecuting clone in case of a deoptimization >> >> * Copy state including locals for clone >> so that reexecution works as expected. >> - 8302850: Avoid instantiating array copy stub for clone use cases >> - 8302850: Primitive array copy C1 intrinsic for aarch64 and x86 >> >> * Clone calls that involve Phi nodes are not supported. >> * Add unimplemented stubs for other platforms. > > I think we could eventually relax the requirement that receiver_klass be loaded, at least for object arrays, but for simplicity my patch will follow the existing behavior. @dean-long I tried your patch and my test worked fine with it. I had one doubt about the following: Value recv = apop(); apush(recv); Wouldn't you prefer to just peek the top of the stack rather than pop it to push it again? I would expect the former to be cheaper to do? > My patch still needs some work. What else does it need? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2045534567 From duke at openjdk.org Tue Apr 9 15:56:15 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 9 Apr 2024 15:56:15 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> References: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> Message-ID: On Mon, 25 Mar 2024 06:19:42 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/1a3ba540...32cb9c0d > > Code looks good, running testing now... Ping me again in 2 days if I don't report back by then ;) @eme64 could you take another pass at this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-2045539108 From kvn at openjdk.org Tue Apr 9 16:00:17 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Apr 2024 16:00:17 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 Message-ID: [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. ------------- Commit messages: - 8329967: Build failure after JDK-8329628 Changes: https://git.openjdk.org/jdk/pull/18700/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18700&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329967 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18700/head:pull/18700 PR: https://git.openjdk.org/jdk/pull/18700 From mli at openjdk.org Tue Apr 9 16:02:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 16:02:02 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Tue, 9 Apr 2024 13:55:22 GMT, Andrew Haley wrote: >> [foo.zip](https://github.com/openjdk/jdk/files/14919424/foo.zip) > > Does that work for you? Thanks for the sample code. I modify the current test a bit by using your code, i.e. change from 2 level nested loop to single while loop as below, let's call it `new test` @Test static boolean test(int testInt) { float testFloat = Float.intBitsToFloat(testInt); return Math.round(testFloat) != golden_round(testFloat); } @Run(test = "test") static void test_rounds(RunInfo runInfo) { for (int i = 0; i < 1000; i++) { test(i); } if (runInfo.isWarmUp()) { return; } boolean runTest = true; // modify here to have try. if (!runTest) return; int testInt = 0; boolean fail = false; do { fail |= test(testInt); } while (++testInt != 0); if (fail) { throw new RuntimeException(); } } It still took more than 5 minutes to finish the test; if I assign `runTest = false`, it will take seconds. So most of time is spent on the while loop in `test_rounds` with `@Run` annotation in new test, I'm not sure how the annotation @Run works, but seems that's the reason why it's slower than a pure while loop (in your sample code). But we need the annotations in the test (check below). There are still some gaps between this new test and current test: * we still not yet verify IR Node (`IRNode.ROUND_VF`); to verify it, we need to put the (part of) test into a nested loop, and put this loop in a function (`test_round` in current test), and annotate this function with `@IR` to verify the IR node. Or maybe there are other ways to implement this test and qualify below requirements? Currently I'm not sure. 1. run in a minute, as we want it to be an automatic test, 2. verify Math.round (intrinsic) result, 3. verify IR node (`IRNode.ROUND_VF) generation, 4. make sure all the verification is done after the warmup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1557906817 From thartmann at openjdk.org Tue Apr 9 16:06:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Apr 2024 16:06:09 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 In-Reply-To: References: Message-ID: <77Js7uoYOOyS6B0xm9ybK3mkFfD0u7IKyumx-f5ZfSU=.eeb21c27-896e-4c4b-95c1-d4f60c8ca288@github.com> On Tue, 9 Apr 2024 15:55:38 GMT, Vladimir Kozlov wrote: > [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18700#pullrequestreview-1989477789 From shade at openjdk.org Tue Apr 9 16:06:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 9 Apr 2024 16:06:09 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:55:38 GMT, Vladimir Kozlov wrote: > [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18700#pullrequestreview-1989478914 From dcubed at openjdk.org Tue Apr 9 16:06:10 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 9 Apr 2024 16:06:10 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 In-Reply-To: References: Message-ID: <0BgvQKN0GkFUqo13dlvC_gHSj9WKNH1eQ3M1dqxD7R8=.5c951f96-4f8f-4f06-8951-546707f2441c@github.com> On Tue, 9 Apr 2024 15:55:38 GMT, Vladimir Kozlov wrote: > [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. Thumbs up on the change and this is a trivial change. It looks simple enough. How was it tested? ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18700#pullrequestreview-1989479141 From kvn at openjdk.org Tue Apr 9 16:06:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Apr 2024 16:06:10 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 In-Reply-To: <0BgvQKN0GkFUqo13dlvC_gHSj9WKNH1eQ3M1dqxD7R8=.5c951f96-4f8f-4f06-8951-546707f2441c@github.com> References: <0BgvQKN0GkFUqo13dlvC_gHSj9WKNH1eQ3M1dqxD7R8=.5c951f96-4f8f-4f06-8951-546707f2441c@github.com> Message-ID: On Tue, 9 Apr 2024 16:02:10 GMT, Daniel D. Daugherty wrote: > It looks simple enough. How was it tested? Local build and I started tier1 in mach5. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18700#issuecomment-2045561317 From kvn at openjdk.org Tue Apr 9 16:24:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Apr 2024 16:24:11 GMT Subject: RFR: 8329967: Build failure after JDK-8329628 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:55:38 GMT, Vladimir Kozlov wrote: > [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. Looks like few builds passed in GHA and my testing in mach5. I consider it as verification of the fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18700#issuecomment-2045593290 From kvn at openjdk.org Tue Apr 9 16:24:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Apr 2024 16:24:11 GMT Subject: Integrated: 8329967: Build failure after JDK-8329628 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:55:38 GMT, Vladimir Kozlov wrote: > [JDK-8329629](https://bugs.openjdk.org/browse/JDK-8329629) changes added this new code to codeCache.cpp few hours before I integrated [JDK-8329628](https://bugs.openjdk.org/browse/JDK-8329628). GitHub automatic merge did not detect conflict because it was new code. This pull request has now been integrated. Changeset: b80ba085 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/b80ba0851841a8490e61371ac4ef3514dc6eddf5 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8329967: Build failure after JDK-8329628 Reviewed-by: thartmann, shade, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/18700 From stuefe at openjdk.org Tue Apr 9 16:43:32 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 9 Apr 2024 16:43:32 GMT Subject: RFR: JDK-8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 Message-ID: It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. Thanks to @calvinccheung for finding this bug. The fix is to make all relevant variables unsigned, thus preventing negative overflow. ------------- Commit messages: - Fix overflow Changes: https://git.openjdk.org/jdk/pull/18698/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18698&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329656 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18698.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18698/head:pull/18698 PR: https://git.openjdk.org/jdk/pull/18698 From mli at openjdk.org Tue Apr 9 17:23:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 17:23:35 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v4] In-Reply-To: References: Message-ID: > Hi, > Can you have a review on this patch to add RoundVF/RoundDF intrinsics? > Thanks! > > ## Tests > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > > test/jdk/java/lang/Math/RoundTests.java Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - restore round mode back to rne - Merge branch 'master' into round-F+D-v - fix minors - merge master - fix space - add tests - add test cases - v2: (src + 0.5) + rdn - Fix corner cases - Merge branch 'master' into round-F+D-v - ... and 2 more: https://git.openjdk.org/jdk/compare/21867c92...b7081bc9 ------------- Changes: https://git.openjdk.org/jdk/pull/17745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17745&range=03 Stats: 242 lines in 7 files changed: 238 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17745/head:pull/17745 PR: https://git.openjdk.org/jdk/pull/17745 From mli at openjdk.org Tue Apr 9 17:23:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 9 Apr 2024 17:23:36 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 11:32:52 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - merge master >> - fix space >> - add tests >> - add test cases >> - v2: (src + 0.5) + rdn >> - Fix corner cases >> - Merge branch 'master' into round-F+D-v >> - refine code >> - RoundVF/D: Initial commit > > src/hotspot/cpu/riscv/riscv_v.ad line 3670: > >> 3668: instruct vround_f(vReg dst, vReg src, fRegF tmp, vRegMask_V0 v0) %{ >> 3669: match(Set dst (RoundVF src)); >> 3670: effect(TEMP_DEF dst, TEMP tmp); > > You might want to add `TEMP v0` in effect as v0 is clobbered in `java_round_float_v`. Similar for `vround_d`. Thanks for catching! Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 3675: > >> 3673: __ csrwi(CSR_FRM, C2_MacroAssembler::rdn); >> 3674: BasicType bt = Matcher::vector_element_basic_type(this); >> 3675: __ vsetvli_helper(bt, Matcher::vector_length(this)); > > I think the code will be more readable if you put `csrwi` and `vsetvli_helper` instructions in `java_round_float_v`. We can add `BasicType bt` as parameter for `java_round_float_v`. Similar for vround_d. done. > src/hotspot/cpu/riscv/riscv_v.ad line 3678: > >> 3676: __ java_round_float_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), >> 3677: as_FloatRegister($tmp$$reg)); >> 3678: __ csrwi(CSR_FRM, C2_MacroAssembler::rne); > > I don't think it's necessary to restore `CSR_FRM` to `rne` after `java_round_float_v`. As I remembered, we always set the required rounding mode in other places where it makes an effect. Similar for vround_d. Seems it should be unnecessary, but without it, 1. there will test failure. e.g. test_suba which uses `vsub` intrinsic, and it does not set rounding mode explicitly. 2. an assert at `src/hotspot/os/linux/os_linux.cpp:1948` which is triggered at the end of program (when calling System.exit in the test). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1558048408 PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1558048493 PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1558048561 From vlivanov at openjdk.org Tue Apr 9 18:08:09 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 Apr 2024 18:08:09 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg In-Reply-To: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> Message-ID: On Tue, 9 Apr 2024 09:48:54 GMT, Roland Westrelin wrote: > The crash occurs when a virtual call is devirtualized late. Inlining > is not attempted then. So no new inlining diagnostic message is > produced which causes the assert failure. There's some valuable > information that can be reported though (the call is > devirtualized). Looks good. Thanks for fixing it. I'll submit it for testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/18685#pullrequestreview-1989780112 From sviswanathan at openjdk.org Tue Apr 9 18:10:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Apr 2024 18:10:59 GMT Subject: RFR: 8329254: optimize integral reverse operations on x86 GFNI target. In-Reply-To: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> References: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> Message-ID: On Thu, 28 Mar 2024 11:41:21 GMT, Jatin Bhateja wrote: > - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960 > > - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+). > > > BaseLine: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.120 us/op > Longs.reverse 500 avgt 2 0.221 us/op > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.050 us/op > Longs.reverse 500 avgt 2 0.086 us/op > > > Kindly review. > > Best Regards, > Jatin @jatin-bhateja Thanks a lot for putting this PR together. The register class for the following two instructs in x86_64.ad also need change: From: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **regF** xtmp1, **regF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **regD** xtmp1, **regD** xtmp2, rRegL rtmp, rFlagsReg cr) To: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **vlRegF** xtmp1, **vlRegF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **vlRegD** xtmp1, **vlRegD** xtmp2, rRegL rtmp, rFlagsReg cr) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18530#issuecomment-2045808500 From vlivanov at openjdk.org Tue Apr 9 19:06:59 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 Apr 2024 19:06:59 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg In-Reply-To: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> Message-ID: On Tue, 9 Apr 2024 09:48:54 GMT, Roland Westrelin wrote: > The crash occurs when a virtual call is devirtualized late. Inlining > is not attempted then. So no new inlining diagnostic message is > produced which causes the assert failure. There's some valuable > information that can be reported though (the call is > devirtualized). test/hotspot/jtreg/compiler/print/TestPrintInliningLateVirtualCall.java line 28: > 26: * @bug 8327741 > 27: * @summary JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg > 28: * @run main/othervm -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:+PrintInlining TestPrintInliningLateVirtualCall The test misses `-XX:+UnlockDiagnosticVMOptions` flag: Error: VM option 'PrintInlining' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. Error: The unlock option must precede 'PrintInlining'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18685#discussion_r1558155155 From dlong at openjdk.org Tue Apr 9 19:47:09 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 9 Apr 2024 19:47:09 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2045936864 From ccheung at openjdk.org Wed Apr 10 00:31:08 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 10 Apr 2024 00:31:08 GMT Subject: RFR: JDK-8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:06:40 GMT, Thomas Stuefe wrote: > It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. > > Thanks to @calvinccheung for finding this bug. > > The fix is to make all relevant variables unsigned, thus preventing negative overflow. Looks good. Thanks! ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18698#pullrequestreview-1990497114 From sjayagond at openjdk.org Wed Apr 10 03:15:02 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 10 Apr 2024 03:15:02 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 15:10:25 GMT, Sidraya Jayagond wrote: >> Fix sign extension on 4 byte load from argument stack slot to GPR. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > copyright header @TheRealMDoerr please review this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18601#issuecomment-2046469601 From fyang at openjdk.org Wed Apr 10 03:24:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Apr 2024 03:24:12 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 17:20:07 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv_v.ad line 3678: >> >>> 3676: __ java_round_float_v(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg), >>> 3677: as_FloatRegister($tmp$$reg)); >>> 3678: __ csrwi(CSR_FRM, C2_MacroAssembler::rne); >> >> I don't think it's necessary to restore `CSR_FRM` to `rne` after `java_round_float_v`. As I remembered, we always set the required rounding mode in other places where it makes an effect. Similar for vround_d. > > Seems it should be unnecessary, but without it, > 1. there will be test failure. e.g. test_suba which uses `vsub_fp` intrinsic, and it does not set rounding mode explicitly, also `vsub_fp`, `vdiv_fp`, and so on. > 2. an assert at `src/hotspot/os/linux/os_linux.cpp:1948` which is triggered at the end of program (when calling System.exit in the test). Ah, I see. I don't think it's safe for `vsub_fp`, `vdiv_fp`, etc to depend on some uncertain dynamic rounding mode on Java code entry. And it seems that the RISC-V ISA spec even doesn't specifies a default dynamic rounding mode. This reminds me that we might be lacking some pieces of the puzzle. If we want to keep the `RNE` (Round to Nearest) rounding mode for Java code (like you do in this PR), we will also need to save and restore the default floating-point control state when we enter and leave Java code perfering `RNE`. Something similar for aarch64: https://bugs.openjdk.org/browse/JDK-8319973. Then we can remove the existing settings of rounding mode in file riscv_v.ad (grep "csrwi(CSR_FRM"), which should be good for performance. I would suggest we fix this in another PR before we continue with this one. Let me know if you are interested : -) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1558783467 From chagedorn at openjdk.org Wed Apr 10 06:19:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Apr 2024 06:19:17 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> References: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> Message-ID: On Mon, 25 Mar 2024 06:19:42 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/81410e0b...32cb9c0d > > Code looks good, running testing now... Ping me again in 2 days if I don't report back by then ;) > @eme64 could you take another pass at this? Just to let you know, he will be back next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-2046610072 From dlong at openjdk.org Wed Apr 10 06:58:12 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Apr 2024 06:58:12 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 02:16:55 GMT, Dean Long wrote: >> Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into topic.0131.c1-array-clone >> - Merge branch 'master' into topic.0131.c1-array-clone >> - Reserve necessary frame map space for clone use cases >> - 8302850: C1 primitive array clone intrinsic in graph >> >> * Combine array length, new type array and arraycopy for clone in c1 graph. >> * Add OmitCheckFlags to skip arraycopy checks. >> * Instantiate ArrayCopyStub only if necessary. >> * Avoid zeroing newly created arrays for clone. >> * Add array null after c1 clone compilation test. >> * Pass force reexecute to intrinsic via value stack. >> This is needed to be able to deoptimize correctly this intrinsic. >> * When new type array or array copy are used for the clone intrinsic, >> their state needs to be based on the state before for deoptimization >> to work as expected. >> - Revert "8302850: Primitive array copy C1 intrinsic for aarch64 and x86" >> >> This reverts commit fe5d916724614391a685bbef58ea939c84197d07. >> - 8302850: Link code emit infos for null check and alloc array >> - 8302850: Null check array before getting its length >> >> * Added a jtreg test to verify the null check works. >> Without the fix this test fails with a SEGV crash. >> - 8302850: Force reexecuting clone in case of a deoptimization >> >> * Copy state including locals for clone >> so that reexecution works as expected. >> - 8302850: Avoid instantiating array copy stub for clone use cases >> - 8302850: Primitive array copy C1 intrinsic for aarch64 and x86 >> >> * Clone calls that involve Phi nodes are not supported. >> * Add unimplemented stubs for other platforms. > > I think we could eventually relax the requirement that receiver_klass be loaded, at least for object arrays, but for simplicity my patch will follow the existing behavior. > @dean-long I tried your patch and my test worked fine with it. I had one doubt about the following: > > ``` > Value recv = apop(); > apush(recv); > ``` > > Wouldn't you prefer to just peek the top of the stack rather than pop it to push it again? I would expect the former to be cheaper to do? Sure, feel free to improve it. > > My patch still needs some work. > > What else does it need? I fixed the issue I had, so it should be good to test out now. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2046658406 From roland at openjdk.org Wed Apr 10 07:21:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Apr 2024 07:21:15 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg [v2] In-Reply-To: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> Message-ID: <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> > The crash occurs when a virtual call is devirtualized late. Inlining > is not attempted then. So no new inlining diagnostic message is > produced which causes the assert failure. There's some valuable > information that can be reported though (the call is > devirtualized). Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: +UnlockDiagnosticVMOptions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18685/files - new: https://git.openjdk.org/jdk/pull/18685/files/0e7ef52c..1bd94447 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18685&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18685&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18685/head:pull/18685 PR: https://git.openjdk.org/jdk/pull/18685 From roland at openjdk.org Wed Apr 10 07:21:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Apr 2024 07:21:15 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg [v2] In-Reply-To: References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> Message-ID: On Tue, 9 Apr 2024 19:04:40 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> +UnlockDiagnosticVMOptions > > test/hotspot/jtreg/compiler/print/TestPrintInliningLateVirtualCall.java line 28: > >> 26: * @bug 8327741 >> 27: * @summary JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg >> 28: * @run main/othervm -XX:-BackgroundCompilation -XX:+PrintCompilation -XX:+PrintInlining TestPrintInliningLateVirtualCall > > The test misses `-XX:+UnlockDiagnosticVMOptions` flag: > > Error: VM option 'PrintInlining' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions. > Error: The unlock option must precede 'PrintInlining'. Right. Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18685#discussion_r1558969625 From fyang at openjdk.org Wed Apr 10 07:34:10 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Apr 2024 07:34:10 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: References: Message-ID: <7Cx4jgZ678c3UAcArxmIyr-qm9xB136mRybsaOEtWv0=.ce17294a-41b7-45f3-97e2-489851a51fb4@github.com> On Sat, 6 Apr 2024 02:24:04 GMT, Fei Yang wrote: >> ArsenyBochkarev has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > I witnessed performance regression on unmatched board when count > 2048. > JMH numbers: > > Before: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32.testAdler32Update 64 thrpt 25 1050.761 ? 54.862 ops/ms > TestAdler32.testAdler32Update 128 thrpt 25 953.858 ? 42.102 ops/ms > TestAdler32.testAdler32Update 256 thrpt 25 821.011 ? 21.154 ops/ms > TestAdler32.testAdler32Update 512 thrpt 25 624.207 ? 19.724 ops/ms > TestAdler32.testAdler32Update 1024 thrpt 25 436.040 ? 5.875 ops/ms > TestAdler32.testAdler32Update 2048 thrpt 25 265.020 ? 3.058 ops/ms > TestAdler32.testAdler32Update 5012 thrpt 25 124.934 ? 0.799 ops/ms > TestAdler32.testAdler32Update 8192 thrpt 25 70.026 ? 0.243 ops/ms > TestAdler32.testAdler32Update 16384 thrpt 25 35.885 ? 0.055 ops/ms > TestAdler32.testAdler32Update 32768 thrpt 25 16.883 ? 0.027 ops/ms > TestAdler32.testAdler32Update 65536 thrpt 25 7.648 ? 0.006 ops/ms > > After: > Benchmark (count) Mode Cnt Score Error Units > TestAdler32.testAdler32Update 64 thrpt 25 4360.280 ? 39.921 ops/ms > TestAdler32.testAdler32Update 128 thrpt 25 2766.595 ? 16.027 ops/ms > TestAdler32.testAdler32Update 256 thrpt 25 1634.373 ? 5.412 ops/ms > TestAdler32.testAdler32Update 512 thrpt 25 880.028 ? 1.463 ops/ms > TestAdler32.testAdler32Update 1024 thrpt 25 457.724 ? 0.296 ops/ms > TestAdler32.testAdler32Update 2048 thrpt 25 233.605 ? 0.072 ops/ms > TestAdler32.testAdler32Update 5012 thrpt 25 96.610 ? 0.020 ops/ms > TestAdler32.testAdler32Update 8192 thrpt 25 59.275 ? 0.012 ops/ms > TestAdler32.testAdler32Update 16384 thrpt 25 29.726 ? 0.004 ops/ms > TestAdler32.testAdler32Update 32768 thrpt 25 14.736 ? 0.009 ops/ms > TestAdler32.testAdler32Update 65536 thrpt 25 6.658 ? 0.002 ops/ms > @RealFYang Hi, thanks for pointing out! To achieve additional acceleration, I did a vectorization and re-measured performance on Kendryte K230 with RVV 1.0 enabled: That's great to hear! I was not aware that it could run a full-featured Linux system. May I ask what kind of Linux distro are you running with? > It seems to me that there's a huge room for improvement in the current implementation. Have you finished improving this with RVV 1.0? I can take another look when that is done. > BTW, the data I used as a comparison from T-Head board was recorded a few months ago. Is it the code generation that has improved significantly? Or it's just me making some kind of mistake in measurements? I am not sure what you mean. But I don't think there is a big change in this part? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2046729405 From roland at openjdk.org Wed Apr 10 07:42:20 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Apr 2024 07:42:20 GMT Subject: RFR: 8328822: C2: "negative trip count?" assert failure in profile predicate code Message-ID: The assert failure is caused by: ABS(min_jint) = min_jint Given the `ABS` is part of a floating computation, the fix I propose is to cast the value to float before the `ABS`. ------------- Commit messages: - test - test - fix Changes: https://git.openjdk.org/jdk/pull/18707/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18707&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328822 Stats: 65 lines in 2 files changed: 64 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18707.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18707/head:pull/18707 PR: https://git.openjdk.org/jdk/pull/18707 From mdoerr at openjdk.org Wed Apr 10 08:27:09 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 Apr 2024 08:27:09 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 15:10:25 GMT, Sidraya Jayagond wrote: >> Fix sign extension on 4 byte load from argument stack slot to GPR. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > copyright header LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18601#pullrequestreview-1991083800 From shade at openjdk.org Wed Apr 10 08:51:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 Apr 2024 08:51:09 GMT Subject: RFR: 8328822: C2: "negative trip count?" assert failure in profile predicate code In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 07:37:20 GMT, Roland Westrelin wrote: > The assert failure is caused by: > > ABS(min_jint) = min_jint > > > Given the `ABS` is part of a floating computation, the fix I propose > is to cast the value to float before the `ABS`. Looks good and simple enough to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18707#pullrequestreview-1991133565 From rcastanedalo at openjdk.org Wed Apr 10 08:54:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 Apr 2024 08:54:00 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v18] In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:06:42 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` >> shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> >> save >> >> - `Save..` saves the current opened xml file. Create a new file if no file is opened. >> - `Save as...` save the current graphs as a copy to an xml file. >> Note: there is no... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 > - fix diffgraph Hi Toby, this feature seems very useful, thanks for developing it! I very much prefer the current model that saves the state into the graph XML file compared to the original proposal. A few comments (this is not a full review yet): - The export/import icons I see differ from those shown in the PR description: ![outline](https://github.com/openjdk/jdk/assets/8792647/946952d4-4ddc-495f-b5cd-22adf9721839) - I find the distinction between 'Save' and 'Export' a bit unclear from a user perspective. My suggestion would be to either remove 'Export' (one can achieve the same result by removing the unnecessary graphs and then saving all remaining ones) or rename 'Save', 'Save as...', and 'Export' with 'Save all', 'Save all as...', and 'Save selected as...' (and using similar icons for the three), if that is the only difference between saving and exporting. - It would be nice to make more tab-specific state persistent, for example which view is displayed (sea of nodes, CFG, etc.) or whether neighboring nodes of extracted nodes are shown. This could be done here or in a future RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2046941480 From sjayagond at openjdk.org Wed Apr 10 10:15:11 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 10 Apr 2024 10:15:11 GMT Subject: RFR: 8329545: [s390x] Fix garbage value being passed in Argument Register [v2] In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 15:10:25 GMT, Sidraya Jayagond wrote: >> Fix sign extension on 4 byte load from argument stack slot to GPR. > > Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: > > copyright header Thanks for the reviews. I integrate now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18601#issuecomment-2047125279 From sjayagond at openjdk.org Wed Apr 10 10:15:11 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Wed, 10 Apr 2024 10:15:11 GMT Subject: Integrated: 8329545: [s390x] Fix garbage value being passed in Argument Register In-Reply-To: References: Message-ID: On Wed, 3 Apr 2024 11:40:44 GMT, Sidraya Jayagond wrote: > Fix sign extension on 4 byte load from argument stack slot to GPR. This pull request has now been integrated. Changeset: e0fd6c4c Author: Sidraya Jayagond Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/e0fd6c4c9e30ef107ea930c8ecc449842ae4b8d4 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8329545: [s390x] Fix garbage value being passed in Argument Register Reviewed-by: amitkumar, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/18601 From chagedorn at openjdk.org Wed Apr 10 10:40:10 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Apr 2024 10:40:10 GMT Subject: RFR: 8328822: C2: "negative trip count?" assert failure in profile predicate code In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 07:37:20 GMT, Roland Westrelin wrote: > The assert failure is caused by: > > ABS(min_jint) = min_jint > > > Given the `ABS` is part of a floating computation, the fix I propose > is to cast the value to float before the `ABS`. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18707#pullrequestreview-1991348810 From tholenstein at openjdk.org Wed Apr 10 10:42:11 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Apr 2024 10:42:11 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v18] In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 08:51:44 GMT, Roberto Casta?eda Lozano wrote: > Hi Toby, this feature seems very useful, thanks for developing it! I very much prefer the current model that saves the state into the graph XML file compared to the original proposal. A few comments (this is not a full review yet): Thanks for the comments! > * The export/import icons I see differ from those shown in the PR description: > > ![outline](https://private-user-images.githubusercontent.com/8792647/321167807-946952d4-4ddc-495f-b5cd-22adf9721839.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI3NDQ3MjYsIm5iZiI6MTcxMjc0NDQyNiwicGF0aCI6Ii84NzkyNjQ3LzMyMTE2NzgwNy05NDY5NTJkNC00ZGRjLTQ5NWYtYjVjZC0yMmFkZjk3MjE4MzkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQxMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MTBUMTAyMDI2WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTUyMzA4NWQ3NzljNGM3NDJhY2I0OGQzYTJmOGY1Zjg5MmVlODcwN2I5YmI3ODNmYWE5NzI2NzZkYzY0NDE0MCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.x5VxRqOAxE-ZWBjP9GlmvJacfQBMgrY4tAMTvvUtSuE) Strange. How did you checkout the PR? Seems like the newly added files `export.png` and `open.png` can not be found. (E.g. when applying as a patch like `git apply --index 8324950.diff` binary (png) files are not added) - Can you try: _Checkout this PR locally:_ `$ git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630` `$ git checkout pull/17630` > * I find the distinction between 'Save' and 'Export' a bit unclear from a user perspective. My suggestion would be to either remove 'Export' (one can achieve the same result by removing the unnecessary graphs and then saving all remaining ones) or rename 'Save', 'Save as...', and 'Export' with 'Save all', 'Save all as...', and 'Save selected as...' (and using similar icons for the three), if that is the only difference between saving and exporting. Right, I understand that it can be a bit unclear. I would suggest to drop the `Export` option and move `Import` next after `Open`: bar > * It would be nice to make more tab-specific state persistent, for example which view is displayed (sea of nodes, CFG, etc.) or whether neighboring nodes of extracted nodes are shown. This could be done here or in a future RFE. Yes, this would be nice. Also IGV could reopen the last opened XML at startup. But I suggest a future RFE for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2047180083 From tholenstein at openjdk.org Wed Apr 10 11:19:18 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Apr 2024 11:19:18 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v19] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` > shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: there is no autosave and IGV also does not ask if you want to save changes when closing it. > > impo...

Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision:

  remove ExportAction.java

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/17630/files
  - new: https://git.openjdk.org/jdk/pull/17630/files/9bf13ba5..1fd52c13

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=18
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=17-18

  Stats: 141 lines in 6 files changed: 4 ins; 124 del; 13 mod
  Patch: https://git.openjdk.org/jdk/pull/17630.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630

PR: https://git.openjdk.org/jdk/pull/17630

From amitkumar at openjdk.org  Wed Apr 10 11:26:16 2024
From: amitkumar at openjdk.org (Amit Kumar)
Date: Wed, 10 Apr 2024 11:26:16 GMT
Subject: RFR: 8330011: [s390x] update block-comments to make code consistent
Message-ID: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com>

It doesn't (shouldn't) affect the runtime. So I haven't run any test. But builds I have performed.

-------------

Commit messages:
 - updates block_comments

Changes: https://git.openjdk.org/jdk/pull/18710/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18710&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8330011
  Stats: 27 lines in 2 files changed: 2 ins; 0 del; 25 mod
  Patch: https://git.openjdk.org/jdk/pull/18710.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18710/head:pull/18710

PR: https://git.openjdk.org/jdk/pull/18710

From tholenstein at openjdk.org  Wed Apr 10 11:41:37 2024
From: tholenstein at openjdk.org (Tobias Holenstein)
Date: Wed, 10 Apr 2024 11:41:37 GMT
Subject: RFR: JDK-8324950: IGV: save the state to a file [v20]
In-Reply-To: <Nk6ZYyrpjT8uN85iqYZ7kcVS4P0ocTtBSjumbYKaIKs=.4a3f768e-32c3-4bc8-892e-a96111e970b9@github.com>
References: <Nk6ZYyrpjT8uN85iqYZ7kcVS4P0ocTtBSjumbYKaIKs=.4a3f768e-32c3-4bc8-892e-a96111e970b9@github.com>
Message-ID: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com>

> The current workflow in IGV is the following:
> 1) import an XML file with graphs or send via network
> 2) open or more graphs in a tab
> 3) extract a set of nodes to be displayed in the tab
> 4) close IGV and start from 1) again
> 
> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file.
> ### The new workflow
>  
> When opening IGV the user gets an empty workspace without any opened files. 
> - Graphs can be sent via the network to IGV
> - Graph can be opened from an XML file
> <img width= > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` > shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: there is no autosave and IGV also does not ask if you want to save changes when closing it. > > impo...

Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision:

  make methods in Printer static

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/17630/files
  - new: https://git.openjdk.org/jdk/pull/17630/files/1fd52c13..d1723a89

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=19
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=18-19

  Stats: 17 lines in 3 files changed: 0 ins; 7 del; 10 mod
  Patch: https://git.openjdk.org/jdk/pull/17630.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630

PR: https://git.openjdk.org/jdk/pull/17630

From rcastanedalo at openjdk.org  Wed Apr 10 11:41:37 2024
From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano)
Date: Wed, 10 Apr 2024 11:41:37 GMT
Subject: RFR: JDK-8324950: IGV: save the state to a file [v18]
In-Reply-To: <lqvbu1TNpwG9hX-cPOOwijNSXGOK_xkpitKyTsU1z1s=.ed98ecae-571c-43a3-8148-bc406bed1700@github.com>
References: <Nk6ZYyrpjT8uN85iqYZ7kcVS4P0ocTtBSjumbYKaIKs=.4a3f768e-32c3-4bc8-892e-a96111e970b9@github.com>
 <kYQN7uqpQW9NmQ5ej1o3G4KBmUw_FEU_6zPmbnNicpo=.3663b231-c5d0-4526-b485-8553d21c6315@github.com>
 <UgxBA-o_uPk12MzNTRkSXEIlDLkBaqJLoxMbdH_VAOs=.75b0bc9a-2cab-4f27-991e-29367e0b90c3@github.com>
 <lqvbu1TNpwG9hX-cPOOwijNSXGOK_xkpitKyTsU1z1s=.ed98ecae-571c-43a3-8148-bc406bed1700@github.com>
Message-ID: <lsVQesQIuYJRo85MPzUi2RAXHK-uCA2iEDnnt7KzfIs=.a0db7df2-b83b-44ef-ba62-55c412391f69@github.com>

On Wed, 10 Apr 2024 10:38:44 GMT, Tobias Holenstein <tholenstein at openjdk.org> wrote:

> Strange. How did you checkout the PR? Seems like the newly added files export.png and open.png can not be found. (E.g. when applying as a patch like git apply --index 8324950.diff binary (png) files are not added) - Can you try:

My bad, I did check the PR out via a plain-text diff file.

> I would suggest to drop the Export option and move Import next after

Sounds good to me.

> Yes, this would be nice. Also IGV could reopen the last opened XML at startup. But I suggest a future RFE for this.

Fair enough :)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2047300447

From mark.reinhold at oracle.com  Wed Apr 10 12:22:36 2024
From: mark.reinhold at oracle.com (Mark Reinhold)
Date: Wed, 10 Apr 2024 12:22:36 +0000
Subject: New candidate JEP: 475: Late Barrier Expansion for G1
Message-ID: <20240410082234.818097717@eggemoggin.niobe.net>

https://openjdk.org/jeps/475

  Summary: Simplify the implementation of the G1 garbage collector's
  barriers, which record information about application memory accesses,
  by shifting their expansion from early in the C2 JIT's compilation
  pipeline to later.

- Mark

From thartmann at openjdk.org  Wed Apr 10 12:49:33 2024
From: thartmann at openjdk.org (Tobias Hartmann)
Date: Wed, 10 Apr 2024 12:49:33 GMT
Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register
 for jump target
Message-ID: <Lz39hifZ_MfyDVJJGCfnt7gst_mpVQgL7ieapO2ePIk=.247c6593-70f9-4fed-a987-cc65c7d85984@github.com>

Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub:
https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264

With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`:
https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 


028 mov R29, 0x0000ffff78cc0080 # ptr

[...]

098     # pop frame 16
        ldp  lr, rfp, [sp,#0]              <- Epilog kills rfp (and lr + sp)
        add  sp, sp, #16

[...]

0a0 br R29 # R12 holds method


As a result, we jump to a References: <7Cx4jgZ678c3UAcArxmIyr-qm9xB136mRybsaOEtWv0=.ce17294a-41b7-45f3-97e2-489851a51fb4@github.com> Message-ID: On Wed, 10 Apr 2024 07:31:28 GMT, Fei Yang wrote: > > @RealFYang Hi, thanks for pointing out! To achieve additional acceleration, I did a vectorization and re-measured performance on Kendryte K230 with RVV 1.0 enabled: > > That's great to hear! I was not aware that it could run a full-featured Linux system. May I ask what kind of Linux distro are you running with? It's debian, https://www.remlab.net/op/k230-canmv-debian.shtml ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2047495798 From mli at openjdk.org Wed Apr 10 13:10:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Apr 2024 13:10:11 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: Message-ID: <3wamGp9toFZEr7IO54NC4VOU8dAfpL2WJyWTSNv0m_s=.ebec8482-610a-4f92-9f42-5fe79b41dd23@github.com> On Wed, 10 Apr 2024 03:21:35 GMT, Fei Yang wrote: >> Seems it should be unnecessary, but without it, >> 1. there will be test failure. e.g. test_suba which uses `vsub_fp` intrinsic, and it does not set rounding mode explicitly, also `vsub_fp`, `vdiv_fp`, and so on. >> 2. an assert at `src/hotspot/os/linux/os_linux.cpp:1948` which is triggered at the end of program (when calling System.exit in the test). > > Ah, I see. I don't think it's safe for `vsub_fp`, `vdiv_fp`, etc to depend on some uncertain dynamic rounding mode on Java code entry. And it seems that the RISC-V spec even doesn't specify a default dynamic rounding mode in `frm` at the ISA level. This reminds me that we might be lacking some pieces of the puzzle. > > If we want to keep `RNE` (Round to Nearest) as a default dynamic rounding mode for Java code (like you do in this PR), we will also need to save and restore the floating-point control state perfering `RNE` when we enter and leave Java code. Something similar for aarch64: https://bugs.openjdk.org/browse/JDK-8319973. This will also help eliminate existing explict settings of dynamic rounding mode to `RNE` in file riscv_v.ad (grep "csrwi(CSR_FRM"), which should be good for performance. I would suggest we fix this in another PR before we continue with this one. Let me know if you are interested : -) Thanks for discussion. Sure. Let me do some investigation and fix it first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1559407042 From rcastanedalo at openjdk.org Wed Apr 10 13:26:21 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 Apr 2024 13:26:21 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v20] In-Reply-To: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> References: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> Message-ID: <6RCayiDG-bebsPJplZiepH3VaW1L66YyAP0JFeA1GwE=.4c0b61b8-6c5d-4ea7-a65d-a66d4ac31a6c@github.com> On Wed, 10 Apr 2024 11:41:37 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` >> shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> >> save >> >> - `Save..` saves the current opened xml file. Create a new file if no file is opened. >> - `Save as...` save the current graphs as a copy to an xml file. >> Note: there is no... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make methods in Printer static Looks better, thanks! A couple of suggestions, for consistency between the File menu and the Outline icons. src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/actions/RemoveAllAction.java line 46: > 44: > 45: public RemoveAllAction() { > 46: putValue(Action.SHORT_DESCRIPTION, "clear workspace"); Suggestion: putValue(Action.SHORT_DESCRIPTION, "Clear workspace"); src/utils/IdealGraphVisualizer/Coordinator/src/main/resources/com/sun/hotspot/igv/coordinator/actions/Bundle.properties line 5: > 3: CTL_DiffGraphAction=Difference to current graph > 4: CTL_RemoveAction=Remove selected graphs and groups > 5: CTL_RemoveAllAction=Remove all graphs and groups Suggestion: CTL_RemoveAllAction=Clear workspace ------------- PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-1991667525 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1559421218 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1559426709 From rcastanedalo at openjdk.org Wed Apr 10 13:44:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 Apr 2024 13:44:15 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v20] In-Reply-To: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> References: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> Message-ID: On Wed, 10 Apr 2024 11:41:37 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` >> shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> >> save >> >> - `Save..` saves the current opened xml file. Create a new file if no file is opened. >> - `Save as...` save the current graphs as a copy to an xml file. >> Note: there is no... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make methods in Printer static When a graph file with several graphs is opened (e.g. [three-graphs.zip](https://github.com/openjdk/jdk/files/14932810/three-graphs.zip)), clicking on the tabs of the opened graphs does not highlight the corresponding entries in the Outline tree (as done when the graphs are freshly imported). Would it be possible to have the same behavior regardless of whether the displayed graphs are opened or imported? ------------- PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-1991733261 From roland at openjdk.org Wed Apr 10 13:46:20 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 10 Apr 2024 13:46:20 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 Message-ID: After range check elimination, a cast in the main loop becomes top because the type of its input (that depends on the iv phi) and the type recorded in the cast do not intersect. This is a case that's expected to be caught by assert predicates but, in this particular case, no assert predicate constant folds. The stride for the loop is -2. The iv phi type is `min+1..0` As a consequence, the init value for the main loop has type int. The range check that causes the issue is for array access: lArrFld[i11 + 1] = 6; The main loop is unrolled once. The second access in the loop is at `i11 - 1` which has type `min..-1`. The range check cast at that access becomes top. The assert predicates operates on an init value that has the shape: (CastII (AddI pre_loop_iv -2) int) and type int. That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. The assert predicate for the first iteration in the main loop is for index: (AddI (CastII (AddI pre_loop_iv -2) int) 1) And for the second: (AddI (CastII (AddI pre_loop_iv -2) int) -1) Both have type int so the assert predicate can't constant fold. I initially fixed this by changing the type of the cast from int to the type of the iv phi: (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) That allows the assert predicate for the second iteration to constant fold. But I was then worried narrowing the type of the cast would causes issues going forward so instead, I propose proceeding as in 8282592 and have assert predicates skip over the CastII (that part of 8282592 was later undone): (AddI (AddI pre_loop_iv -2) 1) which allows the assert predicate for the first iteration in the main loop to constant fold. The change from 8282592 caused issues because we used to narrow the type of a cast based on the condition that guards it. That was removed by 8319372. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/18724/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18724&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325494 Stats: 67 lines in 2 files changed: 67 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18724/head:pull/18724 PR: https://git.openjdk.org/jdk/pull/18724 From chagedorn at openjdk.org Wed Apr 10 14:14:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Apr 2024 14:14:01 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 13:41:11 GMT, Roland Westrelin wrote: > After range check elimination, a cast in the main loop becomes top > because the type of its input (that depends on the iv phi) and the > type recorded in the cast do not intersect. This is a case that's > expected to be caught by assert predicates but, in this particular > case, no assert predicate constant folds. > > The stride for the loop is -2. The iv phi type is `min+1..0` > > As a consequence, the init value for the main loop has type int. > > The range check that causes the issue is for array access: > > lArrFld[i11 + 1] = 6; > > > The main loop is unrolled once. The second access in the loop is at > `i11 - 1` which has type `min..-1`. The range check cast at that > access becomes top. The assert predicates operates on an init value > that has the shape: > > > (CastII (AddI pre_loop_iv -2) int) > > > and type int. > > That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. > > The assert predicate for the first iteration in the main loop is for > index: > > > (AddI (CastII (AddI pre_loop_iv -2) int) 1) > > > And for the second: > > > (AddI (CastII (AddI pre_loop_iv -2) int) -1) > > > Both have type int so the assert predicate can't constant fold. > > I initially fixed this by changing the type of the cast from int to > the type of the iv phi: > > > (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) > > > That allows the assert predicate for the second iteration to constant > fold. But I was then worried narrowing the type of the cast would > causes issues going forward so instead, I propose proceeding as in > 8282592 and have assert predicates skip over the CastII (that part of > 8282592 was later undone): > > > (AddI (AddI pre_loop_iv -2) 1) > > > which allows the assert predicate for the first iteration in the main > loop to constant fold. > > The change from 8282592 caused issues because we used to narrow the > type of a cast based on the condition that guards it. That was removed > by 8319372. I agree with that. I thought about same now that JDK-8319372 is in. Two minor comments, otherwise, looks good! src/hotspot/share/opto/loopTransform.cpp line 2019: > 2017: // skip over the cast added by PhaseIdealLoop::cast_incr_before_loop() when pre/post/main loops are created because > 2018: // it can get in the way of type propagation > 2019: assert(((CastIINode*)init)->carry_dependency() && loop_head->skip_assertion_predicates_with_halt() == init->in(0), "casted iv phi from pre loop expected"); You can use `is_CastII()` and `as_CastII()`. test/hotspot/jtreg/compiler/loopopts/TestAssertPredicateDoesntConstantFold.java line 32: > 30: */ > 31: > 32: public class TestAssertPredicateDoesntConstantFold { I suggest to move this test to the other predicate tests in `compiler/predicates` Nit: Suggestion: public class TestAssertionPredicateDoesntConstantFold { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18724#pullrequestreview-1991798819 PR Review Comment: https://git.openjdk.org/jdk/pull/18724#discussion_r1559506640 PR Review Comment: https://git.openjdk.org/jdk/pull/18724#discussion_r1559508534 From tholenstein at openjdk.org Wed Apr 10 14:14:29 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Apr 2024 14:14:29 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v21] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example.zip](https://github.com/openjdk/jdk/files/14905764/example.zip) and opening `graphs.xml` > shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their hidden (visible) nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: there is no autosave and IGV also does not ask if you want to save changes when closing it. > > impo...

Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision:

 - Update src/utils/IdealGraphVisualizer/Coordinator/src/main/resources/com/sun/hotspot/igv/coordinator/actions/Bundle.properties
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>
 - Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/actions/RemoveAllAction.java
   
   Co-authored-by: Roberto Casta?eda Lozano <robcasloz at users.noreply.github.com>

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/17630/files
  - new: https://git.openjdk.org/jdk/pull/17630/files/d1723a89..d1389af4

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=20
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=19-20

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/17630.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630

PR: https://git.openjdk.org/jdk/pull/17630

From mdoerr at openjdk.org  Wed Apr 10 14:25:09 2024
From: mdoerr at openjdk.org (Martin Doerr)
Date: Wed, 10 Apr 2024 14:25:09 GMT
Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer
 register for jump target
In-Reply-To: <Lz39hifZ_MfyDVJJGCfnt7gst_mpVQgL7ieapO2ePIk=.247c6593-70f9-4fed-a987-cc65c7d85984@github.com>
References: <Lz39hifZ_MfyDVJJGCfnt7gst_mpVQgL7ieapO2ePIk=.247c6593-70f9-4fed-a987-cc65c7d85984@github.com>
Message-ID: <jALzv74RMnPNXc6wQFYKbEnKLL8nW_eidnjwsEX61wI=.80ceec21-ccf8-4913-88db-52d5d0d9d823@github.com>

On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann <thartmann at openjdk.org> wrote:

> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub:
> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264
> 
> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`:
> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 
> 
> 
> 028 mov R29, 0x0000ffff78cc0080 # ptr
> 
> [...]
> 
> 098     # pop frame 16
>         ldp  lr, rfp, [sp,#0]              <- Epilog kills rfp (and lr + sp)
>         add  sp, sp, #16
> 
> [...]
> 
> 0a0 br R29 # R12 holds method
> 
> 
> As a result, we jump to a > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias I don't think PPC64 or s390 are affected. These platforms don't have a frame pointer register. Both currently use a fixed register for the jump target (the same one as for inline caches). The test passes on PPC64. Thanks for pinging me! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2047694105 From chagedorn at openjdk.org Wed Apr 10 14:44:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Apr 2024 14:44:59 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Thu, 4 Apr 2024 15:11:15 GMT, Christian Hagedorn wrote: >> If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). > >> > Should we go back to the previously suggested version without CastPP? >> >> Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. > > I'm afraid you're right. This could probably happen, too. > >> If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). > > I've first thought of doing it in `SubTypeCheckNode::Value()` but assumed we can get away with handling it in `GraphKit`. But as now figured out, this comes with new problems and does not seem to be safe. I will try to undo my current fix idea in `GraphKit` and do it in `SubTypeCheckNode::Value()` instead. This should work (not yet sure though what to do with `ExpandSubTypeCheckAtParseTime` and if it's easy to fix - otherwise, we could move forward with the proposal to remove it for good). I could not find a test that shows a benefit by doing the improved check with code from `try_improve()` inside `SubTypeCheckNode::Value()`. The original patch only showed a win for mainline when we have two identical `SubTypeCheckNodes` such that they can common up with an improved constant. But this cannot be achieved when having the code in `SubTypeCheckNode::Value()`. I therefore suggest to revert back to the original version to directly plug in a better constant, if we find one with `try_improve()`, and just skip the other non-constant cases with `LoadKlass` etc. For the Valhalla bug, I can do the more sophisticated fix to improve `SubTypeCheckNode::Value()` with code from `try_improve()` but it does not seem worth for mainline. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1559576777 From tholenstein at openjdk.org Wed Apr 10 15:04:19 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Apr 2024 15:04:19 GMT Subject: RFR: JDK-8324950: IGV: save the state to a file [v22] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14933957/example_graph.zip) and opening `graphs.xml` > shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: there is no autosave and IGV also does not ask if you want to save changes when closing it. > > References: Message-ID: On Tue, 9 Apr 2024 15:06:40 GMT, Thomas Stuefe wrote: > It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. > > Thanks to @calvinccheung for finding this bug. > > The fix is to make all relevant variables unsigned, thus preventing negative overflow. @MBaesken, could you take a look? Change is tiny. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18698#issuecomment-2047932240 From aph at openjdk.org Wed Apr 10 16:28:10 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 10 Apr 2024 16:28:10 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann wrote: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias src/hotspot/cpu/aarch64/aarch64.ad line 16193: > 16191: // TailJump below removes the return address. > 16192: // Don't use rfp for the jump_target because the > 16193: // MachEpilogNode that is inserted above will kill it. This is a somewhat opaque comment. Please say that a `MachEpilogNode` has already been emitted, so the current rfp contains a value that belongs to the caller of this method, so it must not be used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1559758272 From aph at openjdk.org Wed Apr 10 17:16:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 10 Apr 2024 17:16:11 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Tue, 9 Apr 2024 15:59:20 GMT, Hamlin Li wrote: >> Does that work for you? > > Thanks for the sample code. > > I modify the current test a bit by using your code, i.e. change from 2 level nested loop to single while loop as below, let's call it `new test` > > @Test > static boolean test(int testInt) { > float testFloat = Float.intBitsToFloat(testInt); > return Math.round(testFloat) != golden_round(testFloat); > } > > @Run(test = "test") > static void test_rounds(RunInfo runInfo) { > for (int i = 0; i < 1000; i++) { > test(i); > } > if (runInfo.isWarmUp()) { > return; > } > boolean runTest = true; // modify here to have try. > if (!runTest) return; > int testInt = 0; > boolean fail = false; > do { > fail |= test(testInt); > } while (++testInt != 0); > if (fail) { > throw new RuntimeException(); > } > } > > > It still took more than 5 minutes to finish the test; if I assign `runTest = false`, it will take seconds. So most of time is spent on the while loop in `test_rounds` with `@Run` annotation in new test, I'm not sure how the annotation @Run works, but seems that's the reason why it's slower than a pure while loop (in your sample code). But we need the annotations in the test (check below). > There are still some gaps between this new test and current test: > * we still not yet verify IR Node (`IRNode.ROUND_VF`); to verify it, we need to put the (part of) test into a nested loop, and put this loop in a function (`test_round` in current test), and annotate this function with `@IR` to verify the IR node. > > Or maybe there are other ways to implement this test and qualify below requirements? Currently I'm not sure. > 1. run in a minute, as we want it to be an automatic test, > 2. verify Math.round (intrinsic) result, > 3. verify IR node (`IRNode.ROUND_VF) generation, > 4. make sure all the verification is done after the warmup. Yes, I see. The `@Run` annotation runs it all with the IR framework, which you don't really need for an exhaustive test over the 32-bit range. I think I may have been rather misled by the "Add exhaustive tests for Math.round intrinsics" title, which this PR doesn't do. I strongly suggest that you do add an exhaustive test for the 32-bit range, without using the `@Run` annotation, just the bare code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1559818903 From aph at openjdk.org Wed Apr 10 17:21:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 10 Apr 2024 17:21:11 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Wed, 10 Apr 2024 17:13:22 GMT, Andrew Haley wrote: >> Thanks for the sample code. >> >> I modify the current test a bit by using your code, i.e. change from 2 level nested loop to single while loop as below, let's call it `new test` >> >> @Test >> static boolean test(int testInt) { >> float testFloat = Float.intBitsToFloat(testInt); >> return Math.round(testFloat) != golden_round(testFloat); >> } >> >> @Run(test = "test") >> static void test_rounds(RunInfo runInfo) { >> for (int i = 0; i < 1000; i++) { >> test(i); >> } >> if (runInfo.isWarmUp()) { >> return; >> } >> boolean runTest = true; // modify here to have try. >> if (!runTest) return; >> int testInt = 0; >> boolean fail = false; >> do { >> fail |= test(testInt); >> } while (++testInt != 0); >> if (fail) { >> throw new RuntimeException(); >> } >> } >> >> >> It still took more than 5 minutes to finish the test; if I assign `runTest = false`, it will take seconds. So most of time is spent on the while loop in `test_rounds` with `@Run` annotation in new test, I'm not sure how the annotation @Run works, but seems that's the reason why it's slower than a pure while loop (in your sample code). But we need the annotations in the test (check below). >> There are still some gaps between this new test and current test: >> * we still not yet verify IR Node (`IRNode.ROUND_VF`); to verify it, we need to put the (part of) test into a nested loop, and put this loop in a function (`test_round` in current test), and annotate this function with `@IR` to verify the IR node. >> >> Or maybe there are other ways to implement this test and qualify below requirements? Currently I'm not sure. >> 1. run in a minute, as we want it to be an automatic test, >> 2. verify Math.round (intrinsic) result, >> 3. verify IR node (`IRNode.ROUND_VF) generation, >> 4. make sure all the verification is done after the warmup. > > Yes, I see. The `@Run` annotation runs it all with the IR framework, which you don't really need for an exhaustive test over the 32-bit range. > > I think I may have been rather misled by the "Add exhaustive tests for Math.round intrinsics" title, which this PR doesn't do. I strongly suggest that you do add an exhaustive test for the 32-bit range, without using the `@Run` annotation, just the bare code. And my test only works on the scalar version, of course. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1559823545 From iklam at openjdk.org Wed Apr 10 18:11:09 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 10 Apr 2024 18:11:09 GMT Subject: RFR: JDK-8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:06:40 GMT, Thomas Stuefe wrote: > It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. > > Thanks to @calvinccheung for finding this bug. > > The fix is to make all relevant variables unsigned, thus preventing negative overflow. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18698#pullrequestreview-1992374057 From vlivanov at openjdk.org Wed Apr 10 18:19:01 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Apr 2024 18:19:01 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg [v2] In-Reply-To: <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> Message-ID: On Wed, 10 Apr 2024 07:21:15 GMT, Roland Westrelin wrote: >> The crash occurs when a virtual call is devirtualized late. Inlining >> is not attempted then. So no new inlining diagnostic message is >> produced which causes the assert failure. There's some valuable >> information that can be reported though (the call is >> devirtualized). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > +UnlockDiagnosticVMOptions Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18685#pullrequestreview-1992391179 From jbhateja at openjdk.org Wed Apr 10 18:40:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Apr 2024 18:40:11 GMT Subject: RFR: 8329254: optimize integral reverse operations on x86 GFNI target. In-Reply-To: References: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> Message-ID: On Tue, 9 Apr 2024 18:08:27 GMT, Sandhya Viswanathan wrote: > @jatin-bhateja Thanks a lot for putting this PR together. The register class for the following two instructs in x86_64.ad also need change: From: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **regF** xtmp1, **regF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **regD** xtmp1, **regD** xtmp2, rRegL rtmp, rFlagsReg cr) > > To: instruct bytes_reversebit_int_gfni(rRegI dst, rRegI src, **vlRegF** xtmp1, **vlRegF** xtmp2, rRegL rtmp, rFlagsReg cr) instruct bytes_reversebit_long_gfni(rRegL dst, rRegL src, **vlRegD** xtmp1, **vlRegD** xtmp2, rRegL rtmp, rFlagsReg cr) Hi @sviswa7 , GFNI is supported on Icelake+ CPUs, with regD/F register classes we select entire range of registers xmm1-31 on AVX512 targets which gives freedom to assembler to auto-promote instruction to EVEX encoding if allocator assigned a register from higher register bank, in this case since instruction operands are 128 bit registers, in principle an autopromotion on AVX512 target will only be feasible if target support VL, but given that all AVX512 GFNI targets support vector length orthogonality hence we should be good to go. For non AVX512 targets with GFNI we anyways deal with lower register bank. I still agree that it's good to be strict than keeping loose ends, given that cloud instances can be tuned to enable custom feature sets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18530#issuecomment-2048208836 From mli at openjdk.org Wed Apr 10 18:44:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Apr 2024 18:44:12 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Wed, 10 Apr 2024 17:18:00 GMT, Andrew Haley wrote: >> Yes, I see. The `@Run` annotation runs it all with the IR framework, which you don't really need for an exhaustive test over the 32-bit range. >> >> I think I may have been rather misled by the "Add exhaustive tests for Math.round intrinsics" title, which this PR doesn't do. I strongly suggest that you do add an exhaustive test for the 32-bit range, without using the `@Run` annotation, just the bare code. > > And my test only works on the scalar version, of course. Do you mean add an exhaustive test for the 32-bit range for the vector instrinsic or scalar one? The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1559908931 From jbhateja at openjdk.org Wed Apr 10 19:01:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Apr 2024 19:01:35 GMT Subject: RFR: 8329254: optimize integral reverse operations on x86 GFNI target. [v2] In-Reply-To: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> References: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> Message-ID: > - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960 > > - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+). > > > BaseLine: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.120 us/op > Longs.reverse 500 avgt 2 0.221 us/op > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.050 us/op > Longs.reverse 500 avgt 2 0.086 us/op > > > Kindly review. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comment resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18530/files - new: https://git.openjdk.org/jdk/pull/18530/files/08e83564..3f18ba84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18530&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18530&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18530/head:pull/18530 PR: https://git.openjdk.org/jdk/pull/18530 From kxu at openjdk.org Wed Apr 10 20:07:17 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 10 Apr 2024 20:07:17 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Fri, 5 Apr 2024 19:03:54 GMT, Dean Long wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Do we also handle the reverse, int-typed parallel iv in a long counted loop? > > On a related topic, I noticed that checks for is_range_check_if seem to require the type to match the loop type, but I wonder if that could be relaxed. @dean-long No, it doesn't at this stage. C2 never supported parallel iv of any type in long counted loops. However, I'm working on a patch adding supports for long and int iv in long counted loops. Since there are code dependencies, I'd prefer to have this pr merged first. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18489#issuecomment-2048346726 From dlong at openjdk.org Wed Apr 10 20:12:19 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Apr 2024 20:12:19 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v10] In-Reply-To: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> References: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> Message-ID: On Mon, 8 Apr 2024 16:00:36 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > some formatting suggestions from @shipilev > > Co-authored-by: Aleksey Shipil?v Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-1992590125 From dlong at openjdk.org Wed Apr 10 20:12:19 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Apr 2024 20:12:19 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: <1tnvXHkbK3bwbuFMyXb9mUAKIgAFvfcA3v3pOR9PQRw=.0218898e-b611-4168-8b5e-5edf4782b563@github.com> References: <1tnvXHkbK3bwbuFMyXb9mUAKIgAFvfcA3v3pOR9PQRw=.0218898e-b611-4168-8b5e-5edf4782b563@github.com> Message-ID: On Mon, 8 Apr 2024 16:26:29 GMT, Joshua Cao wrote: > I would have preferred if all escape-based on optimizations of barriers were just done in one place. It sounds like there is still some cleanup that could be done in this area. Is it worth a separate RFE? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1560002089 From sviswanathan at openjdk.org Wed Apr 10 20:16:17 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 Apr 2024 20:16:17 GMT Subject: RFR: 8329254: optimize integral reverse operations on x86 GFNI target. [v2] In-Reply-To: References: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> Message-ID: On Wed, 10 Apr 2024 19:01:35 GMT, Jatin Bhateja wrote: >> - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960 >> >> - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+). >> >> >> BaseLine: >> Benchmark (size) Mode Cnt Score Error Units >> Integers.reverse 500 avgt 2 0.120 us/op >> Longs.reverse 500 avgt 2 0.221 us/op >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> Integers.reverse 500 avgt 2 0.050 us/op >> Longs.reverse 500 avgt 2 0.086 us/op >> >> >> Kindly review. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comment resolution. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18530#pullrequestreview-1992595041 From aph at openjdk.org Wed Apr 10 20:23:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 10 Apr 2024 20:23:19 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Wed, 10 Apr 2024 18:41:29 GMT, Hamlin Li wrote: > The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. But it takes a long time because you use the @Run attribute, surely. If you ran that test just as a test, without the IR framework, it'd be fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1560012625 From dlong at openjdk.org Wed Apr 10 21:16:46 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Apr 2024 21:16:46 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann wrote: > I also wondered if R29 shouldn't be a callee-save (SOE) register in the C calling convention? Maybe for documentation purposes, but I don't think it would have any effect on generated code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2048446464 From duke at openjdk.org Wed Apr 10 22:21:43 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 10 Apr 2024 22:21:43 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v9] In-Reply-To: References: <1tnvXHkbK3bwbuFMyXb9mUAKIgAFvfcA3v3pOR9PQRw=.0218898e-b611-4168-8b5e-5edf4782b563@github.com> Message-ID: On Wed, 10 Apr 2024 20:09:04 GMT, Dean Long wrote: >>> This case and the next case could use a more detailed explanation. We have 4 different possible inputs: >> {StoreStore, Release} x {w/ Precedent, w/o Precedent} and 2 possible outcomes: worklist or record_for_optimizer. >> >> We can eliminate barriers when it's precedent is an escaping object. If the barrier does not have a precedent, we cannot elide it, which is why we don't include it in the worklist / `record_for_optimizer`. >> >> I think its confusing because StoreStore barriers are optimized in `escape.cpp`, while `Release` barriers are optimized in [memnode.cpp](https://github.com/openjdk/jdk/blob/115f4193eb39d8469ac8127e38798a3f041c22e0/src/hotspot/share/opto/memnode.cpp#L3431). I would have preferred if all escape-based on optimizations of barriers were just done in one place. >> >>> Previously, I believe this optimization did not apply to the end-of-ctor-with-final barrier, but now it does. >> >> This is correct. End of ctor did not have `StoreStore` barriers. They had `Release` barriers, which escape analysis already handles. We have to check `n->req() > MemBarNode::Precedent`, or else we run into assertion errors [here](https://github.com/openjdk/jdk/blob/9ac3b77d0d69227ded6ef3843ebf5c18ceee37b5/src/hotspot/share/opto/escape.cpp#L2590) > >> I would have preferred if all escape-based on optimizations of barriers were just done in one place. > > It sounds like there is still some cleanup that could be done in this area. Is it worth a separate RFE? Yes, I think so. Created https://bugs.openjdk.org/browse/JDK-8330062 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1560117526 From dchuyko at openjdk.org Wed Apr 10 22:24:53 2024 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 10 Apr 2024 22:24:53 GMT Subject: RFR: 8330061: Cleanup: follow code heaps order in CodeCache initialization and logging, code heap info in logs Message-ID: This is an additional tiny cleanup after CodeCache::initialize_heaps recaftoring (JDK-8311248). CodeCache::initialize_heaps: code heaps info is printed in code heaps order, final size adjustments and flags are made in code heaps order. CodeCache::allocate: assertion message contains blob type. CodeCache::print_trace: name of the heap containing the method is printed. Testing: jtreg test/hotspot/jtreg/compiler/codecache, tier1, tier2. ------------- Commit messages: - check_min_size(non-nmethod) order - Code cache log cleanups Changes: https://git.openjdk.org/jdk/pull/18732/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18732&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330061 Stats: 31 lines in 1 file changed: 19 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/18732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18732/head:pull/18732 PR: https://git.openjdk.org/jdk/pull/18732 From jkarthikeyan at openjdk.org Thu Apr 11 01:53:17 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 11 Apr 2024 01:53:17 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. Message-ID: This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. Thoughts and reviews would be appreciated! ------------- Commit messages: - Modify IR test Changes: https://git.openjdk.org/jdk/pull/18734/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18734&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329531 Stats: 49 lines in 1 file changed: 29 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/18734.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18734/head:pull/18734 PR: https://git.openjdk.org/jdk/pull/18734 From amitkumar at openjdk.org Thu Apr 11 03:37:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 11 Apr 2024 03:37:41 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 12:34:11 GMT, Tobias Hartmann wrote: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias I have tested on s390x, test passes there as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2048872064 From thartmann at openjdk.org Thu Apr 11 05:14:57 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 05:14:57 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Adjusted comments according to review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18716/files - new: https://git.openjdk.org/jdk/pull/18716/files/5a8b47ce..13b80fa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=00-01 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18716/head:pull/18716 PR: https://git.openjdk.org/jdk/pull/18716 From thartmann at openjdk.org Thu Apr 11 05:14:57 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 05:14:57 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 16:25:37 GMT, Andrew Haley wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjusted comments according to review > > src/hotspot/cpu/aarch64/aarch64.ad line 16193: > >> 16191: // TailJump below removes the return address. >> 16192: // Don't use rfp for the jump_target because the >> 16193: // MachEpilogNode that is inserted above will kill it. > > This is a somewhat opaque comment. Please say that a `MachEpilogNode` has already been emitted, so the current rfp contains a value that belongs to the caller of this method, so it must not be used. Thanks Andrew. I adjusted the comments, please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560442288 From thartmann at openjdk.org Thu Apr 11 05:25:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 05:25:46 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 14:22:14 GMT, Martin Doerr wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > I don't think PPC64 or s390 are affected. These platforms don't have a frame pointer register. Both currently use a fixed register for the jump target (the same one as for inline caches). > The test passes on PPC64. Thanks for pinging me! @TheRealMDoerr , @offamitkumar thanks for checking! > Both currently use a fixed register for the jump target (the same one as for inline caches). Looks to me as if no fixed register but `iRegP` is used for `jump_target`: https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/s390/s390.ad#L9505 But the `MachEpilog` code does look safe, as you said, there is no frame pointer register. RISC-V and ARM32 look affected to me though. Waiting for confirmation. Please note that `TestTailCallInArrayCopyStub.java` needs [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) to trigger the issue. > Maybe for documentation purposes, but I don't think it would have any effect on generated code. Thanks for checking. @dean-long, @theRealAph any preferences? Or should we leave it as is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2048947292 From stuefe at openjdk.org Thu Apr 11 05:26:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Apr 2024 05:26:45 GMT Subject: RFR: 8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 18:07:54 GMT, Ioi Lam wrote: >> It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. >> >> Thanks to @calvinccheung for finding this bug. >> >> The fix is to make all relevant variables unsigned, thus preventing negative overflow. > > LGTM Thanks @iklam and @calvinccheung ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18698#issuecomment-2048947701 From stuefe at openjdk.org Thu Apr 11 05:26:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 Apr 2024 05:26:45 GMT Subject: Integrated: 8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 15:06:40 GMT, Thomas Stuefe wrote: > It fixes an embarrassing OOB memory access when rolling the dice to get a random class space location on aarch64. > > Thanks to @calvinccheung for finding this bug. > > The fix is to make all relevant variables unsigned, thus preventing negative overflow. This pull request has now been integrated. Changeset: d9c84e76 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/d9c84e763a0880d33586dbb8dc90b66ede030444 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8329656: assertion failed in MAP_ARCHIVE_MMAP_FAILURE path: Invalid immediate -5 0 Reviewed-by: ccheung, iklam ------------- PR: https://git.openjdk.org/jdk/pull/18698 From dlong at openjdk.org Thu Apr 11 05:59:41 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 Apr 2024 05:59:41 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 21:14:18 GMT, Dean Long wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > >> I also wondered if R29 shouldn't be a callee-save (SOE) register in the C calling convention? > > Maybe for documentation purposes, but I don't think it would have any effect on generated code. > Thanks for checking. @dean-long, @theRealAph any preferences? Or should we leave it as is? I would leave it. I found this code that makes it sound like NS is better: https://github.com/openjdk/jdk/blob/70944ca54ad0090c734bb5b3082beb33450c4877/src/hotspot/share/opto/lcm.cpp#L910 because we know how to find oops even if they are in the FP register, so there is no need to kill it across a non-leaf runtime call. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2048975841 From chagedorn at openjdk.org Thu Apr 11 06:25:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Apr 2024 06:25:41 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg [v2] In-Reply-To: <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> Message-ID: On Wed, 10 Apr 2024 07:21:15 GMT, Roland Westrelin wrote: >> The crash occurs when a virtual call is devirtualized late. Inlining >> is not attempted then. So no new inlining diagnostic message is >> produced which causes the assert failure. There's some valuable >> information that can be reported though (the call is >> devirtualized). > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > +UnlockDiagnosticVMOptions Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18685#pullrequestreview-1993361620 From jbhateja at openjdk.org Thu Apr 11 06:31:51 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 11 Apr 2024 06:31:51 GMT Subject: Integrated: 8329254: optimize integral reverse operations on x86 GFNI target. In-Reply-To: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> References: <1i51xczi3Q5WG46f6dBmgkBzrKIo4aHi4M5t54ElymA=.4cc9f7ed-533a-480f-9177-cb3f534fa36c@github.com> Message-ID: On Thu, 28 Mar 2024 11:41:21 GMT, Jatin Bhateja wrote: > - Efficient GFNI based instruction sequence to compute integral reverse operation was added along with JEP-426 (VectorAPI 4th Incubation). https://bugs.openjdk.org/browse/JDK-8284960 > > - However, the CPUID based feature detection for GFNI was incorrectly performed under AVX512 check, fixing it shows roughly 2X performance improvement for Integer/Long.reverse APIs on E-core targets (MTL+). > > > BaseLine: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.120 us/op > Longs.reverse 500 avgt 2 0.221 us/op > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > Integers.reverse 500 avgt 2 0.050 us/op > Longs.reverse 500 avgt 2 0.086 us/op > > > Kindly review. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: b04b3047 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/b04b3047ff5c5526bdf47925210e2a35ca191e6e Stats: 6 lines in 2 files changed: 2 ins; 2 del; 2 mod 8329254: optimize integral reverse operations on x86 GFNI target. Reviewed-by: sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18530 From chagedorn at openjdk.org Thu Apr 11 06:32:54 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Apr 2024 06:32:54 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 13:19:11 GMT, Christian Hagedorn wrote: > This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. > > #### Background > > The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. > > Thanks, > Christian src/hotspot/share/opto/loopTransform.cpp line 1399: > 1397: // Is 'n' a node that can be found on the input chain of a Template Assertion Predicate bool (i.e. between a Template > 1398: // Assertion Predicate If node and the OpaqueLoop* nodes)? > 1399: static bool is_part_of_template_assertion_predicate_bool(Node* n) { Only used by the now dead `subgraph_has_opaque()` method. src/hotspot/share/opto/loopTransform.cpp line 1440: > 1438: for (uint i = 0; i < wq.size(); i++) { > 1439: Node* n = wq.at(i); > 1440: if (TemplateAssertionPredicateExpressionNode::valid_opcode(n)) { `valid_opcode()` also includes the `OpaqueLoop*` nodes which was not the case for `is_part_of_template_assertion_predicate_bool()`. Therefore needed to adapt the code below to handle that. src/hotspot/share/opto/predicates.cpp line 326: > 324: return node->is_Opaque1(); > 325: }; > 326: DataNodesOnPathsToTargets data_nodes_on_path_to_targets(TemplateAssertionPredicateExpressionNode::valid_opcode, Moved `TemplateAssertionPredicateExpression::maybe_contains()` to `TemplateAssertionPredicateExpressionNode::valid_opcode()` which suited better. src/hotspot/share/opto/predicates.hpp line 295: > 293: // The expression itself can belong to no, one, or two Template Assertion Predicates: > 294: // - None: This node is already dead (i.e. we replaced the Bool condition of the Template Assertion Predicate). > 295: // - Two: A OpaqueLoopInitNode could be part of two Template Assertion Predicates. This is a little odd but I did not want to change that. This can only happen when we have not cloned a Template Assertion Predicate Expression, yet (when cloning, we will not common the `OpaqueLoopInitNodes`). Maybe we could create a separate `OpaqueLoopInitNode` for the init and the last value Template Assertion Predicate at some point - but that would just beautify this code here and does not currently offer any other benefit. On top of that (even though rather minor), we have additional nodes that are actually not required to make Template Assertion Predicates work. So, I'm not sure if we really want that. src/hotspot/share/opto/split_if.cpp line 418: > 416: // untouched copy that is still recognized by the Template Assertion Predicate matching code. > 417: void PhaseIdealLoop::clone_template_assertion_predicate_expression_down(Node* node) { > 418: if (!TemplateAssertionPredicateExpressionNode::is_valid(node)) { Covers previous `subgraph_has_opaque()` call to check whether this node is part of a Template Assertion Predicate Expression. test/hotspot/jtreg/compiler/predicates/assertion/TestSplitIfCloningDown.java line 44: > 42: package compiler.predicates.assertion; > 43: > 44: public class TestSplitIfCloningDown { Just a sanity test that exercises the refactored code. There was no assert or anything alike when running the test before this refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559425203 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559426759 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559428056 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559461421 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559429400 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1559431474 From chagedorn at openjdk.org Thu Apr 11 06:32:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Apr 2024 06:32:53 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates Message-ID: This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. #### Background The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. Thanks, Christian ------------- Commit messages: - 8330004: Refactor cloning down code in Split If for Template Assertion Predicates Changes: https://git.openjdk.org/jdk/pull/18723/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18723&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330004 Stats: 319 lines in 6 files changed: 236 ins; 65 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/18723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18723/head:pull/18723 PR: https://git.openjdk.org/jdk/pull/18723 From thartmann at openjdk.org Thu Apr 11 06:41:42 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 06:41:42 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 05:14:57 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted comments according to review Okay, let's leave it then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049016562 From fyang at openjdk.org Thu Apr 11 06:41:42 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 06:41:42 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 06:37:22 GMT, Tobias Hartmann wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjusted comments according to review > > Okay, let's leave it then. @TobiHartmann : Yes, you are right. This issue also triggers on linux-riscv64 if I force allocating the frame pointer (x8) for `jump_target`. And I have prepared a similar fix. Could you please add it? Thanks. [18716-riscv.diff.txt](https://github.com/openjdk/jdk/files/14942269/18716-riscv.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049018151 From fyang at openjdk.org Thu Apr 11 06:46:44 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 06:46:44 GMT Subject: RFR: 8329258: [AArch64] TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 05:14:57 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted comments according to review src/hotspot/cpu/aarch64/aarch64.ad line 16193: > 16191: // TailJump below removes the return address. > 16192: // Don't use rfp for 'jump_target' because a MachEpilogNode has already been > 16193: // emitted just above the TailCall and it will reset rbp to the caller state. Suggestion: s/rbp/rfp/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560504863 From thartmann at openjdk.org Thu Apr 11 07:07:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 07:07:26 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v3] In-Reply-To: References: Message-ID: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Added RISCV patch and fixed comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18716/files - new: https://git.openjdk.org/jdk/pull/18716/files/13b80fa7..160ea006 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=01-02 Stats: 31 lines in 3 files changed: 27 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18716/head:pull/18716 PR: https://git.openjdk.org/jdk/pull/18716 From thartmann at openjdk.org Thu Apr 11 07:07:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 07:07:26 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 06:38:51 GMT, Fei Yang wrote: >> Okay, let's leave it then. > > @TobiHartmann : Yes, you are right. This issue also triggers on linux-riscv64 if I force allocating the frame pointer (x8) for `jump_target`. And I have prepared a similar fix. Could you please add it? Thanks. > [18716-riscv.diff.txt](https://github.com/openjdk/jdk/files/14942269/18716-riscv.diff.txt) Thanks for the review and the patch, @RealFYang! I added it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049045111 From thartmann at openjdk.org Thu Apr 11 07:07:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 07:07:27 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 06:43:46 GMT, Fei Yang wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjusted comments according to review > > src/hotspot/cpu/aarch64/aarch64.ad line 16193: > >> 16191: // TailJump below removes the return address. >> 16192: // Don't use rfp for 'jump_target' because a MachEpilogNode has already been >> 16193: // emitted just above the TailCall and it will reset rbp to the caller state. > > Suggestion: s/rbp/rfp/ Good catch! Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560518406 From fyang at openjdk.org Thu Apr 11 07:07:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 07:07:27 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 06:58:39 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 16193: >> >>> 16191: // TailJump below removes the return address. >>> 16192: // Don't use rfp for 'jump_target' because a MachEpilogNode has already been >>> 16193: // emitted just above the TailCall and it will reset rbp to the caller state. >> >> Suggestion: s/rbp/rfp/ > > Good catch! Will fix. But the comment here says `it will reset rbp to the caller state`. I think `rbp` is x86-specific, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560518883 From roland at openjdk.org Thu Apr 11 07:29:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 07:29:51 GMT Subject: RFR: 8328822: C2: "negative trip count?" assert failure in profile predicate code In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 08:48:19 GMT, Aleksey Shipilev wrote: >> The assert failure is caused by: >> >> ABS(min_jint) = min_jint >> >> >> Given the `ABS` is part of a floating computation, the fix I propose >> is to cast the value to float before the `ABS`. > > Looks good and simple enough to me. @shipilev @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18707#issuecomment-2049086855 From roland at openjdk.org Thu Apr 11 07:29:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 07:29:52 GMT Subject: Integrated: 8328822: C2: "negative trip count?" assert failure in profile predicate code In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 07:37:20 GMT, Roland Westrelin wrote: > The assert failure is caused by: > > ABS(min_jint) = min_jint > > > Given the `ABS` is part of a floating computation, the fix I propose > is to cast the value to float before the `ABS`. This pull request has now been integrated. Changeset: 2ceeb6c0 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/2ceeb6c00135310b7bdabacb92d26d81de525240 Stats: 65 lines in 2 files changed: 64 ins; 0 del; 1 mod 8328822: C2: "negative trip count?" assert failure in profile predicate code Reviewed-by: shade, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18707 From roland at openjdk.org Thu Apr 11 07:30:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 07:30:47 GMT Subject: RFR: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg [v2] In-Reply-To: References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> <-79vJfqi9JL5-ut-4ipu7hzvwiDZx_8aYB4dhOP-ODk=.c48c5dc6-c49c-4afd-9163-e7df9f39ba04@github.com> Message-ID: On Wed, 10 Apr 2024 18:16:27 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> +UnlockDiagnosticVMOptions > > Looks good. @iwanowww @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18685#issuecomment-2049084676 From roland at openjdk.org Thu Apr 11 07:30:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 07:30:48 GMT Subject: Integrated: 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg In-Reply-To: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> References: <1IchVXnMWtG_vBzoYomuETqne1hcfjjpKc8g8y1-Znc=.48be9cd0-4cab-4853-9c5f-993152c9a805@github.com> Message-ID: On Tue, 9 Apr 2024 09:48:54 GMT, Roland Westrelin wrote: > The crash occurs when a virtual call is devirtualized late. Inlining > is not attempted then. So no new inlining diagnostic message is > produced which causes the assert failure. There's some valuable > information that can be reported though (the call is > devirtualized). This pull request has now been integrated. Changeset: 7df49262 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/7df492627b933f48750985c26de69be3f86115cb Stats: 85 lines in 2 files changed: 85 ins; 0 del; 0 mod 8327741: JVM crash in hotspot/share/opto/compile.cpp - failed: missing inlining msg Reviewed-by: vlivanov, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18685 From rcastanedalo at openjdk.org Thu Apr 11 07:31:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Apr 2024 07:31:50 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v3] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 07:07:26 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added RISCV patch and fixed comment Looks good, thanks for fixing this! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18716#pullrequestreview-1993466220 From tholenstein at openjdk.org Thu Apr 11 07:35:43 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Apr 2024 07:35:43 GMT Subject: RFR: 8324950: IGV: save the state to a file [v20] In-Reply-To: References: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> Message-ID: On Wed, 10 Apr 2024 13:41:24 GMT, Roberto Casta?eda Lozano wrote: > When a graph file with several graphs is opened (e.g. [three-graphs.zip](https://github.com/openjdk/jdk/files/14932810/three-graphs.zip)), clicking on the tabs of the opened graphs does not highlight the corresponding entries in the Outline tree (as done when the graphs are freshly imported). Would it be possible to have the same behavior regardless of whether the displayed graphs are opened or imported? It should be fixed now. The entries in the Outline tree are computed lazily. So they don't exist until the folder is extended ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2049098247 From tholenstein at openjdk.org Thu Apr 11 07:35:44 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Apr 2024 07:35:44 GMT Subject: RFR: 8324950: IGV: save the state to a file [v22] In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 15:04:19 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14933957/example_graph.zip) and opening `graphs.xml` >> shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> >> save >> >> - `Save..` saves the current opened xml file. Create a new file if no file is opened. >> - `Save as...` save the current graphs as a copy to an xml file. >> Note: there... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - Fix: highlight nodes from imported tabs > - visibleNodes instead of hiddenNodes I updated the PR: save the visibleNodes instead of the hiddenNodes. This is way more memory efficient. Especially for large graphs. PR is ready for review again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2049102830 From thartmann at openjdk.org Thu Apr 11 07:45:45 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 07:45:45 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v3] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 07:07:26 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added RISCV patch and fixed comment Thanks for the review, Roberto! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049117089 From roland at openjdk.org Thu Apr 11 07:56:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 07:56:42 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Wed, 10 Apr 2024 14:42:19 GMT, Christian Hagedorn wrote: >>> > Should we go back to the previously suggested version without CastPP? >>> >>> Isn't there a risk with that one too? if `superklass` is a `TypeNode` and its input changes to a constant for instance. >> >> I'm afraid you're right. This could probably happen, too. >> >>> If doing this in `GraphKit` doesn't work well, maybe it should be done in `SubTypeCheckNode::Value`? This way it would apply all all stages of compilation? (That assumes we don't care of what happens if `ExpandSubTypeCheckAtParseTime` is true). >> >> I've first thought of doing it in `SubTypeCheckNode::Value()` but assumed we can get away with handling it in `GraphKit`. But as now figured out, this comes with new problems and does not seem to be safe. I will try to undo my current fix idea in `GraphKit` and do it in `SubTypeCheckNode::Value()` instead. This should work (not yet sure though what to do with `ExpandSubTypeCheckAtParseTime` and if it's easy to fix - otherwise, we could move forward with the proposal to remove it for good). > > I could not find a test that shows a benefit by doing the improved check with code from `try_improve()` inside `SubTypeCheckNode::Value()`. The original patch only showed a win for mainline when we have two identical `SubTypeCheckNodes` such that they can common up with an improved constant. But this cannot be achieved when having the code in `SubTypeCheckNode::Value()`. > > I therefore suggest to revert back to the original version to directly plug in a better constant, if we find one with `try_improve()`, and just skip the other non-constant cases with `LoadKlass` etc. > > For the Valhalla bug, I can do the more sophisticated fix to improve `SubTypeCheckNode::Value()` with code from `try_improve()` but it does not seem worth for mainline. What do you think? Sounds good. Thanks for giving it a try. Since you mention fixing this in valhalla, do you expect it makes a difference there but not in mainline? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1560594083 From roland at openjdk.org Thu Apr 11 08:25:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 08:25:56 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 [v2] In-Reply-To: References: Message-ID: > After range check elimination, a cast in the main loop becomes top > because the type of its input (that depends on the iv phi) and the > type recorded in the cast do not intersect. This is a case that's > expected to be caught by assert predicates but, in this particular > case, no assert predicate constant folds. > > The stride for the loop is -2. The iv phi type is `min+1..0` > > As a consequence, the init value for the main loop has type int. > > The range check that causes the issue is for array access: > > lArrFld[i11 + 1] = 6; > > > The main loop is unrolled once. The second access in the loop is at > `i11 - 1` which has type `min..-1`. The range check cast at that > access becomes top. The assert predicates operates on an init value > that has the shape: > > > (CastII (AddI pre_loop_iv -2) int) > > > and type int. > > That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. > > The assert predicate for the first iteration in the main loop is for > index: > > > (AddI (CastII (AddI pre_loop_iv -2) int) 1) > > > And for the second: > > > (AddI (CastII (AddI pre_loop_iv -2) int) -1) > > > Both have type int so the assert predicate can't constant fold. > > I initially fixed this by changing the type of the cast from int to > the type of the iv phi: > > > (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) > > > That allows the assert predicate for the second iteration to constant > fold. But I was then worried narrowing the type of the cast would > causes issues going forward so instead, I propose proceeding as in > 8282592 and have assert predicates skip over the CastII (that part of > 8282592 was later undone): > > > (AddI (AddI pre_loop_iv -2) 1) > > > which allows the assert predicate for the first iteration in the main > loop to constant fold. > > The change from 8282592 caused issues because we used to narrow the > type of a cast based on the condition that guards it. That was removed > by 8319372. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18724/files - new: https://git.openjdk.org/jdk/pull/18724/files/3531b4a2..3ce25154 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18724&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18724&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18724/head:pull/18724 PR: https://git.openjdk.org/jdk/pull/18724 From roland at openjdk.org Thu Apr 11 08:25:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 08:25:56 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 [v2] In-Reply-To: References: Message-ID: <4b7IVFrmcQyhGuy6ijyUsT3aGJVCWiuFDnFNFDSTd8A=.76f00032-d6ba-400d-88b6-50c7794a03fb@github.com> On Wed, 10 Apr 2024 14:11:08 GMT, Christian Hagedorn wrote: > I agree with that. I thought about same now that JDK-8319372 is in. > > Two minor comments, otherwise, looks good! Thanks for reviewing this. I pushed an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18724#issuecomment-2049176215 From aph at openjdk.org Thu Apr 11 08:28:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 11 Apr 2024 08:28:52 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v3] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 07:07:26 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added RISCV patch and fixed comment src/hotspot/cpu/aarch64/aarch64.ad line 16193: > 16191: // TailJump below removes the return address. > 16192: // Don't use rfp for 'jump_target' because a MachEpilogNode has already been > 16193: // emitted just above the TailCall and it will reset rfp to the caller state. Suggestion: // emitted just above the TailCall which has reset rfp to the caller state. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560634878 From rcastanedalo at openjdk.org Thu Apr 11 08:36:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Apr 2024 08:36:42 GMT Subject: RFR: 8324950: IGV: save the state to a file [v20] In-Reply-To: References: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> Message-ID: On Thu, 11 Apr 2024 07:31:04 GMT, Tobias Holenstein wrote: > It should be fixed now. The entries in the Outline tree are computed lazily. So they don't exist until the folder is extended Thanks, that works for me now! > I updated the PR: save the visibleNodes instead of the hiddenNodes. This is way more memory efficient. Especially for large graphs. One issue with that is that now graphs are opened with a fixed set of visible nodes rather than "all nodes are visible". This works for the graph that is directly shown when the file is open, but if you step into the next graph in the group, new nodes will be hidden. To reproduce this, you can open [one-graph.zip](https://github.com/openjdk/jdk/files/14943396/one-graph.zip) and click on "Show next graph of current group". In the next graph, nodes 5, 7, 25, 26, and 27 are hidden unexpectedly. One (hopefully simple) fix would be to show all nodes if `visibleNodes` is absent from the XML file or introduce an attribute to indicate that all nodes should be visible, something like: This would also lead to more compact XML files for graphs without hidden nodes. Another solution would be to express the graph state as a chain of actions (e.g. ``, ``, etc.). Then one could determine which chain of actions is more compact for a given graph state. This would also open for sending "commands" to IGV in the future, e.g. via gdb. But this is probably best left for future work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2049197703 From chagedorn at openjdk.org Thu Apr 11 08:41:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Apr 2024 08:41:43 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 08:25:56 GMT, Roland Westrelin wrote: >> After range check elimination, a cast in the main loop becomes top >> because the type of its input (that depends on the iv phi) and the >> type recorded in the cast do not intersect. This is a case that's >> expected to be caught by assert predicates but, in this particular >> case, no assert predicate constant folds. >> >> The stride for the loop is -2. The iv phi type is `min+1..0` >> >> As a consequence, the init value for the main loop has type int. >> >> The range check that causes the issue is for array access: >> >> lArrFld[i11 + 1] = 6; >> >> >> The main loop is unrolled once. The second access in the loop is at >> `i11 - 1` which has type `min..-1`. The range check cast at that >> access becomes top. The assert predicates operates on an init value >> that has the shape: >> >> >> (CastII (AddI pre_loop_iv -2) int) >> >> >> and type int. >> >> That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. >> >> The assert predicate for the first iteration in the main loop is for >> index: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) int) 1) >> >> >> And for the second: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) int) -1) >> >> >> Both have type int so the assert predicate can't constant fold. >> >> I initially fixed this by changing the type of the cast from int to >> the type of the iv phi: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) >> >> >> That allows the assert predicate for the second iteration to constant >> fold. But I was then worried narrowing the type of the cast would >> causes issues going forward so instead, I propose proceeding as in >> 8282592 and have assert predicates skip over the CastII (that part of >> 8282592 was later undone): >> >> >> (AddI (AddI pre_loop_iv -2) 1) >> >> >> which allows the assert predicate for the first iteration in the main >> loop to constant fold. >> >> The change from 8282592 caused issues because we used to narrow the >> type of a cast based on the condition that guards it. That was removed >> by 8319372. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the update, that looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18724#pullrequestreview-1993609601 From dfenacci at openjdk.org Thu Apr 11 10:24:41 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 11 Apr 2024 10:24:41 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! src/hotspot/share/opto/type.cpp line 441: > 439: // Map the boolean result of Type::cmp into a comparator result that CmpKey expects. > 440: auto type_cmp = [](const void* t1, const void* t2) -> int32_t { > 441: return Type::cmp((Type*) t1, (Type*) t2) ? 0 : 1; Wouldn't `return !Type::cmp((Type*) t1, (Type*) t2);` be enough? (though it might actually result in the same compiled code) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1560574540 From thartmann at openjdk.org Thu Apr 11 10:31:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 10:31:09 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: > Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: > https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 > > With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 > > > 028 mov R29, 0x0000ffff78cc0080 # ptr > > [...] > > 098 # pop frame 16 > ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) > add sp, sp, #16 > > [...] > > 0a0 br R29 # R12 holds method > > > As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). > > On x86, we use `no_rbp_RegP` instead: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 > > I implemented the same on AArch64. > > I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? > > I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 > > `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Comment adjustment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18716/files - new: https://git.openjdk.org/jdk/pull/18716/files/160ea006..7a25848c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18716&range=02-03 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18716/head:pull/18716 PR: https://git.openjdk.org/jdk/pull/18716 From thartmann at openjdk.org Thu Apr 11 10:31:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 10:31:10 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v3] In-Reply-To: References: Message-ID: <1AbxzYb7vzvw7LQj2rFZljecpekLGS-WFjcidODeC8s=.c7c17a23-6d14-488b-b418-6733950c9643@github.com> On Thu, 11 Apr 2024 08:26:34 GMT, Andrew Haley wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Added RISCV patch and fixed comment > > src/hotspot/cpu/aarch64/aarch64.ad line 16193: > >> 16191: // TailJump below removes the return address. >> 16192: // Don't use rfp for 'jump_target' because a MachEpilogNode has already been >> 16193: // emitted just above the TailCall and it will reset rfp to the caller state. > > Suggestion: > > // emitted just above the TailCall which has reset rfp to the caller state. Thanks. I updated the comments accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560793987 From thartmann at openjdk.org Thu Apr 11 10:31:10 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 10:31:10 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 06:59:11 GMT, Fei Yang wrote: >> Good catch! Will fix. > > But the comment here says `it will reset rbp to the caller state`. I think `rbp` is x86-specific, right? Right, I noticed in the meantime and updated my comment above. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18716#discussion_r1560794499 From mli at openjdk.org Thu Apr 11 10:49:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Apr 2024 10:49:45 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Wed, 10 Apr 2024 20:20:24 GMT, Andrew Haley wrote: >> Do you mean add an exhaustive test for the 32-bit range for the vector instrinsic or scalar one? >> The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. > >> The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. > > But it takes a long time because you use the @Run attribute, surely. If you ran that test just as a test, without the IR framework, it'd be fine. Yes, you mght be right. But previously @eme64 asked to change from a plain test to a test using IR framework. So I'm not sure what to do now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1560813613 From epeter at openjdk.org Thu Apr 11 10:55:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Apr 2024 10:55:42 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: On Wed, 10 Apr 2024 20:20:24 GMT, Andrew Haley wrote: >> Do you mean add an exhaustive test for the 32-bit range for the vector instrinsic or scalar one? >> The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. > >> The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. > > But it takes a long time because you use the @Run attribute, surely. If you ran that test just as a test, without the IR framework, it'd be fine. @theRealAph out of office, so don't have much time to think this through. But maybe we want both, a slower IR test which ensures we have the desired IR (with random input values), and also a non-IR test that is faster and checks the correct results more exhaustively? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1560819665 From mli at openjdk.org Thu Apr 11 10:56:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 11 Apr 2024 10:56:42 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: <3wamGp9toFZEr7IO54NC4VOU8dAfpL2WJyWTSNv0m_s=.ebec8482-610a-4f92-9f42-5fe79b41dd23@github.com> References: <3wamGp9toFZEr7IO54NC4VOU8dAfpL2WJyWTSNv0m_s=.ebec8482-610a-4f92-9f42-5fe79b41dd23@github.com> Message-ID: On Wed, 10 Apr 2024 13:07:47 GMT, Hamlin Li wrote: >> Ah, I see. I don't think it's safe for `vsub_fp`, `vdiv_fp`, etc to depend on some uncertain dynamic rounding mode on Java code entry. And it seems that the RISC-V spec doesn't even specify a default dynamic rounding mode in `frm` at the ISA level. This reminds me that we might be lacking some pieces of the puzzle. >> >> If we want to keep `RNE` (Round to Nearest) as a default dynamic rounding mode for Java code (like you do in this PR), we will also need to save and restore the floating-point control state perfering `RNE` when we enter and leave Java code. Something similar for aarch64: https://bugs.openjdk.org/browse/JDK-8319973. This will also help eliminate existing explict settings of dynamic rounding mode to `RNE` in file riscv_v.ad (grep "csrwi(CSR_FRM"), which should be good for performance. I would suggest we fix this in another PR before we continue with this one. Let me know if you are interested : -) > > Thanks for discussion. > Sure. Let me do some investigation and fix it first. tracked by https://bugs.openjdk.org/browse/JDK-8330094 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1560820723 From gcao at openjdk.org Thu Apr 11 11:27:00 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 11 Apr 2024 11:27:00 GMT Subject: RFR: 8330095: RISC-V: Remove obsolete vandn_vi instruction Message-ID: Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. ### Testing - [x] fastdebug build successfully ------------- Commit messages: - RISC-V: Remove obsolete vandn_vi instruction Changes: https://git.openjdk.org/jdk/pull/18737/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18737&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330095 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18737.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18737/head:pull/18737 PR: https://git.openjdk.org/jdk/pull/18737 From fyang at openjdk.org Thu Apr 11 11:27:01 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 11:27:01 GMT Subject: RFR: 8330095: RISC-V: Remove obsolete vandn_vi instruction In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 11:17:49 GMT, Gui Cao wrote: > Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. > > ### Testing > - [x] fastdebug build successfully LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18737#pullrequestreview-1993919936 From gcao at openjdk.org Thu Apr 11 11:27:01 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 11 Apr 2024 11:27:01 GMT Subject: RFR: 8330095: RISC-V: Remove obsolete vandn_vi instruction In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 11:17:49 GMT, Gui Cao wrote: > Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. > > ### Testing > - [x] fastdebug build successfully @robehn Could you please take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18737#issuecomment-2049465196 From galder at openjdk.org Thu Apr 11 11:51:42 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 11 Apr 2024 11:51:42 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 06:54:59 GMT, Dean Long wrote: > Sure, feel free to improve it. I've just commented in https://github.com/openjdk/jdk/pull/18642/files#r1560900520, what do you think? > I fixed the issue I had, so it should be good to test out now. Thanks. Ok, so how would we integrate your changes? Do I just merge the commits in your https://github.com/dean-long/jdk/tree/pr/17667 branch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2049523083 From galder at openjdk.org Thu Apr 11 12:15:58 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 11 Apr 2024 12:15:58 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same Message-ID: Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. I've run hotspot compiler tests successfully on x86_64. ------------- Commit messages: - Added int/long min/max optimization tests and renamed test - Adjust to coding style - Override MaxNode::Identity to detect optimize same nodes - MAX nodes should be 0 after optimization has been applied - Only apply optimization to mix/max nodes - Adjust test expectations following optimization - Add Identity optimization for when the inputs are the same - Adjust test to pass in same parameter to both sides - Added IR test for how the current code works Changes: https://git.openjdk.org/jdk/pull/18738/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18738&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323429 Stats: 132 lines in 5 files changed: 131 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18738.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18738/head:pull/18738 PR: https://git.openjdk.org/jdk/pull/18738 From roland at openjdk.org Thu Apr 11 12:28:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 12:28:42 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 12:10:38 GMT, Galder Zamarre?o wrote: > Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. > > It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. > > `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. > > I've run hotspot compiler tests successfully on x86_64. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18738#pullrequestreview-1994058981 From mdoerr at openjdk.org Thu Apr 11 12:38:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 11 Apr 2024 12:38:42 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 14:22:14 GMT, Martin Doerr wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > I don't think PPC64 or s390 are affected. These platforms don't have a frame pointer register. Both currently use a fixed register for the jump target (the same one as for inline caches). > The test passes on PPC64. Thanks for pinging me! > @TheRealMDoerr , @offamitkumar thanks for checking! > > > Both currently use a fixed register for the jump target (the same one as for inline caches). > > Looks to me as if no fixed register but `iRegP` is used for `jump_target`: > > https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/s390/s390.ad#L9505 > > But the `MachEpilog` code does look safe, as you said, there is no frame pointer register. > Oh, right. I had looked at the wrong operand. But it's still safe because there is no frame pointer register on these 2 platforms. The test has passed on PPC64 with both patches applied. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049600003 From thartmann at openjdk.org Thu Apr 11 12:47:45 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Apr 2024 12:47:45 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: <2JBeygIYbebsErqrUSBlwW1NRxa-lCDFxXRTOqslD6A=.66c06a7f-dc0d-4828-b391-41dec4703daf@github.com> On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment Thanks again for confirming, Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2049615330 From chagedorn at openjdk.org Thu Apr 11 13:16:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Apr 2024 13:16:43 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same In-Reply-To: References: Message-ID: <9Vsbiejiv5vNDem0-33IMcdhxz4IMVRABDWM0nrh2eE=.5af9c9c7-d30c-4201-8a68-67096133cc7e@github.com> On Thu, 11 Apr 2024 12:10:38 GMT, Galder Zamarre?o wrote: > Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. > > It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. > > `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. > > I've run hotspot compiler tests successfully on x86_64. Otherwise, the fix looks good! test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxOpt.java line 26: > 24: /** > 25: * @test > 26: * @bug 8287087 Suggestion: * @bug 8323429 test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxOpt.java line 27: > 25: * @test > 26: * @bug 8287087 > 27: * @summary ... You should add a summary. test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxOpt.java line 29: > 27: * @summary ... > 28: * @library /test/lib / > 29: * @requires vm.compiler2.enabled I don't think this `@requires` is required - we can also run the test without C2. test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxOpt.java line 52: > 50: private static int testIntMin(int v) { > 51: return Math.min(v, v); > 52: } You could also add `@Check` methods to sanity check that we actually get 42 back. This could be done like this (you can also have a look at [other examples](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/CheckedTestExample.java)): @Check(test = "testIntMin") public static void checkTest2(int result) { if (result != 42) { throw new RuntimeException("Incorrect result: " + result); } } test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 822: > 820: public static final String MAX = PREFIX + "MAX" + POSTFIX; > 821: static { > 822: beforeMatchingNameRegex(MAX, "Max(I|L)"); For completeness, we should probably also add `F` and `D` here. I've quickly checked the usages uf `MAX` and we only seem to be using it in `failOn` conditions. So, this should be safe to do. ------------- PR Review: https://git.openjdk.org/jdk/pull/18738#pullrequestreview-1994139787 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1560995816 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1560995511 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1560996229 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1561002832 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1560993923 From tholenstein at openjdk.org Thu Apr 11 13:26:01 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Apr 2024 13:26:01 GMT Subject: RFR: 8324950: IGV: save the state to a file [v23] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14933957/example_graph.zip) and opening `graphs.xml` > shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: there is no autosave and IGV also does not ask if you want to save changes when closing it. > > ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/43192aa8..d2a0ff75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=21-22 Stats: 197 lines in 8 files changed: 38 ins; 66 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From jkarthikeyan at openjdk.org Thu Apr 11 13:31:42 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 11 Apr 2024 13:31:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: <1PH75HtUP_ugSlwAmF1u_f9FUz9Gb8MPLVyZCtXQ6w8=.7a7dfa9c-d572-48bc-90a5-237ddf581c52@github.com> On Thu, 11 Apr 2024 07:37:19 GMT, Damon Fenacci wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > src/hotspot/share/opto/type.cpp line 441: > >> 439: // Map the boolean result of Type::cmp into a comparator result that CmpKey expects. >> 440: auto type_cmp = [](const void* t1, const void* t2) -> int32_t { >> 441: return Type::cmp((Type*) t1, (Type*) t2) ? 0 : 1; > > Wouldn't `return !Type::cmp((Type*) t1, (Type*) t2);` be enough? (though it might actually result in the same compiled code) I think this would work too, but I wanted to avoid the implicit boolean to integer conversion. I can make this change if it would be fine, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1561026656 From tholenstein at openjdk.org Thu Apr 11 13:40:45 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Apr 2024 13:40:45 GMT Subject: RFR: 8324950: IGV: save the state to a file [v20] In-Reply-To: References: <3whxRsxSB9_1GfXWyjsN_I3lAWcokSqFAetk4qlkyCg=.63bf448b-6ffa-43a0-bc64-e3cb0e9e7366@github.com> Message-ID: On Thu, 11 Apr 2024 08:33:58 GMT, Roberto Casta?eda Lozano wrote: > One issue with that is that now graphs are opened with a fixed set of visible nodes rather than "all nodes are visible". This works for the graph that is directly shown when the file is open, but if you step into the next graph in the group, new nodes will be hidden. To reproduce this, you can open [one-graph.zip](https://github.com/openjdk/jdk/files/14943396/one-graph.zip) and click on "Show next graph of current group". In the next graph, nodes 5, 7, 25, 26, and 27 are hidden unexpectedly. I opted for your proposal introducing an attribute that indicates if all nodes should be visible: - Show all nodes an ignore nodes in (should be empty anyways) - Show no nodes - Show only nodes "0" ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2049714393 From roland at openjdk.org Thu Apr 11 13:56:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Apr 2024 13:56:49 GMT Subject: RFR: 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly Message-ID: This is a minor issue that I ran into at some point with JDK-8275202: calling `PhaseValues::intcon()` is required so the node is properly entered in the GVN's hash table and its type is properly recorded in the GVN's type table. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/18742/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18742&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330106 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18742.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18742/head:pull/18742 PR: https://git.openjdk.org/jdk/pull/18742 From eosterlund at openjdk.org Thu Apr 11 14:56:47 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 Apr 2024 14:56:47 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub considers the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 03:55:33 GMT, Joshua Zhu wrote: >> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. >> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, >> even the use of a floating point may cause the maximum 2048 bits stack occupied. >> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. >> >> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 >> >> >> ...... >> 0x0000ffff684cfad8: stp x15, x18, [sp, #80] >> 0x0000ffff684cfadc: sub sp, sp, #0x100 >> 0x0000ffff684cfae0: str z16, [sp] >> 0x0000ffff684cfae4: add x1, x13, #0x10 >> 0x0000ffff684cfae8: mov x0, x16 >> ;; 0xFFFF803F5414 >> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 >> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 >> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfaf8: blr x8 >> 0x0000ffff684cfafc: mov x16, x0 >> 0x0000ffff684cfb00: ldr z16, [sp] >> 0x0000ffff684cfb04: add sp, sp, #0x100 >> 0x0000ffff684cfb08: ptrue p7.b >> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] >> ...... >> >> >> could be optimized into: >> >> >> ...... >> 0x0000ffff684cfa50: stp x15, x18, [sp, #80] >> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() >> 0x0000ffff684cfa58: add x1, x13, #0x10 >> 0x0000ffff684cfa5c: mov x0, x16 >> ;; 0xFFFF7FA942A8 >> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 >> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 >> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfa6c: blr x8 >> 0x0000ffff684cfa70: mov x16, x0 >> 0x0000ffff684cfa74: ldr d16, [sp], #16 >> 0x0000ffff684cfa78: ptrue p7.b >> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] >> ...... >> >> >> Besides the above benefit, when we know what size of register is live, >> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. >> >> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Add more output for easy debugging once the jtreg test case fails This looks good to me and seems to follow a similar design to what I did on x86_64 vectors. Thanks for doing this! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17977#pullrequestreview-1994438128 From kvn at openjdk.org Thu Apr 11 17:41:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Apr 2024 17:41:42 GMT Subject: RFR: 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:51:29 GMT, Roland Westrelin wrote: > This is a minor issue that I ran into at some point with JDK-8275202: > calling `PhaseValues::intcon()` is required so the node is properly > entered in the GVN's hash table and its type is properly recorded in > the GVN's type table. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18742#pullrequestreview-1994795918 From dlong at openjdk.org Thu Apr 11 17:44:44 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 Apr 2024 17:44:44 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 11:49:21 GMT, Galder Zamarre?o wrote: > Ok, so how would we integrate your changes? Do I just merge the commits in your https://github.com/dean-long/jdk/tree/pr/17667 branch? Yes, I think that should work. If that doesn't add me as a contributor, then I can be added manually. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2050190437 From luhenry at openjdk.org Thu Apr 11 21:02:41 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 11 Apr 2024 21:02:41 GMT Subject: RFR: 8330095: RISC-V: Remove obsolete vandn_vi instruction In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 11:17:49 GMT, Gui Cao wrote: > Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. > > ### Testing > - [x] fastdebug build successfully Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18737#pullrequestreview-1995427980 From matsaave at openjdk.org Thu Apr 11 21:19:49 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 11 Apr 2024 21:19:49 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow Message-ID: A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. ------------- Commit messages: - Corrected comments - Merge branch 'master' into membar_8327647 - Removed unneeded push/pop - Merge branch 'master' into membar_8327647 - Replace use of r0 with noreg - Added membars after load_field_entry() calls - 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow Changes: https://git.openjdk.org/jdk/pull/18477/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327647 Stats: 23 lines in 3 files changed: 14 ins; 5 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From duke at openjdk.org Thu Apr 11 21:19:49 2024 From: duke at openjdk.org (SUN Guoyun) Date: Thu, 11 Apr 2024 21:19:49 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:41:02 GMT, Matias Saavedra Silva wrote: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3085: > 3083: __ push(r0); > 3084: // R1: field offset, R2: TOS, R3: flags > 3085: load_resolved_field_entry(r2, r2, r0, r1, r3); It is useless to use r0 here, so can we change it to noreg and eliminate the use of push(r0)/pop(r0)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1538673162 From fyang at openjdk.org Thu Apr 11 21:19:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 21:19:50 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 06:54:17 GMT, SUN Guoyun wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3085: > >> 3083: __ push(r0); >> 3084: // R1: field offset, R2: TOS, R3: flags >> 3085: load_resolved_field_entry(r2, r2, r0, r1, r3); > > It is useless to use r0 here, so can we change it to noreg and eliminate the use of push(r0)/pop(r0)? I also noticed this today while looking at the code and I was testing the following change: [unnecessary-tos-load-v2.diff.txt](https://github.com/openjdk/jdk/files/14758318/unnecessary-tos-load-v2.diff.txt) @matias9927 : Can you this extra change while you are on it? I think we should fix this for both performance and correctness reasons. The 8-byte push/pop would violate the AArch64 & RISC-V ABI which specifies that alignment of SP should always be 16 bytes [1][2]. [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch32-and-aarch64 [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1539132938 From matsaave at openjdk.org Thu Apr 11 21:19:50 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 11 Apr 2024 21:19:50 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 12:29:26 GMT, Fei Yang wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3085: >> >>> 3083: __ push(r0); >>> 3084: // R1: field offset, R2: TOS, R3: flags >>> 3085: load_resolved_field_entry(r2, r2, r0, r1, r3); >> >> It is useless to use r0 here, so can we change it to noreg and eliminate the use of push(r0)/pop(r0)? > > I also noticed this today while looking at the code and I was testing the following change: > [unnecessary-tos-load-v2.diff.txt](https://github.com/openjdk/jdk/files/14758318/unnecessary-tos-load-v2.diff.txt) > > @matias9927 : Can you this extra change while you are on it? I think we should fix this for both performance and correctness reasons. The 8-byte push/pop would violate the AArch64 & RISC-V ABI which specifies that alignment of SP should always be 16 bytes [1][2]. > > [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch32-and-aarch64 > [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc Noted! I haven't been able to replicate the crash yet, but I think it will be worth moving forward with this patch as a matter of correctness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1559947691 From fyang at openjdk.org Thu Apr 11 21:19:50 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 Apr 2024 21:19:50 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 19:16:42 GMT, Matias Saavedra Silva wrote: >> I also noticed this today while looking at the code and I was testing the following change: >> [unnecessary-tos-load-v2.diff.txt](https://github.com/openjdk/jdk/files/14758318/unnecessary-tos-load-v2.diff.txt) >> >> @matias9927 : Can you this extra change while you are on it? I think we should fix this for both performance and correctness reasons. The 8-byte push/pop would violate the AArch64 & RISC-V ABI which specifies that alignment of SP should always be 16 bytes [1][2]. >> >> [1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/using-the-stack-in-aarch32-and-aarch64 >> [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Noted! I haven't been able to replicate the crash yet, but I think it will be worth moving forward with this patch as a matter of correctness. Note that `noreg` is currently not properly handled / considered in `load_resolved_field_entry`. And it doesn't look nice to me to only check if `tos_state` with `noreg` in this function. That's why I replaced call to `load_resolved_field_entry` with seperate loads in my propsed fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1560311323 From matsaave at openjdk.org Thu Apr 11 21:27:42 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 11 Apr 2024 21:27:42 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 01:25:01 GMT, Fei Yang wrote: >> Noted! I haven't been able to replicate the crash yet, but I think it will be worth moving forward with this patch as a matter of correctness. > > Note that `noreg` is currently not properly handled / considered in `load_resolved_field_entry`. And it doesn't look nice to me to only check if `tos_state` with `noreg` in this function. That's why I replaced call to `load_resolved_field_entry` with seperate loads in my propsed fix. I didn't notice your message until just now, but you're right. I would prefer to use `load_resolved_field_entry` when possible but there are places where fields are loaded individually so I think it's fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1561766415 From coleenp at openjdk.org Thu Apr 11 21:52:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 Apr 2024 21:52:42 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:41:02 GMT, Matias Saavedra Silva wrote: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Changes requested by coleenp (Reviewer). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2360: > 2358: if (tos_state != noreg) { > 2359: __ load_unsigned_byte(tos_state, Address(cache, in_bytes(ResolvedFieldEntry::type_offset()))); > 2360: } This handling of tos_state seems fine to me. Add a comment that the caller might not want to set type_offset as tos. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2559: > 2557: // Must prevent reordering of the following cp cache loads with bytecode load > 2558: __ membar(MacroAssembler::LoadLoad); > 2559: I'm wondering if this can be in load_field_entry at the end so we don't miss any callers. It might be a bit redundant with the ldar in the resolve_cache_and_index_for_field(), but that's for only the first time the field is resolved and in the interpreter, should not be an issue for performance. ------------- PR Review: https://git.openjdk.org/jdk/pull/18477#pullrequestreview-1995535220 PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1561790322 PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1561787370 From fyang at openjdk.org Fri Apr 12 02:16:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 12 Apr 2024 02:16:47 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: <87z7AJuQT6nk_zmOD9YTWnE7KSK0ivHYfK-kEyWnQhw=.901ef276-0f8c-4cdf-933b-a0eb45f14dc1@github.com> On Thu, 11 Apr 2024 21:25:25 GMT, Matias Saavedra Silva wrote: >> Note that `noreg` is currently not properly handled / considered in `load_resolved_field_entry`. And it doesn't look nice to me to only check if `tos_state` with `noreg` in this function. That's why I replaced call to `load_resolved_field_entry` with seperate loads in my propsed fix. > > I didn't notice your message until just now, but you're right. I would prefer to use `load_resolved_field_entry` when possible but there are places where fields are loaded individually so I think it's fine. All right. Another place in `TemplateTable::fast_accessfield` which might be worth turning into a `load_resolved_field_entry` call with `noreg` for `tos_state` for consistency [1][2]: [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L3169-L3170 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3139-L3140 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1561928814 From jzhu at openjdk.org Fri Apr 12 02:31:43 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Fri, 12 Apr 2024 02:31:43 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub considers the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 14:54:09 GMT, Erik ?sterlund wrote: >> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more output for easy debugging once the jtreg test case fails > > This looks good to me and seems to follow a similar design to what I did on x86_64 vectors. Thanks for doing this! Thanks a lot for the review! @fisk ------------- PR Comment: https://git.openjdk.org/jdk/pull/17977#issuecomment-2050854420 From jbhateja at openjdk.org Fri Apr 12 04:01:05 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 Apr 2024 04:01:05 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads Message-ID: - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. - For vector write access, this may prevent value forwarding, which may result into subsequent redundant loads, but preventing intensification failure will offset that cost. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads Changes: https://git.openjdk.org/jdk/pull/18749/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18749&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329555 Stats: 74 lines in 2 files changed: 73 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18749/head:pull/18749 PR: https://git.openjdk.org/jdk/pull/18749 From jbhateja at openjdk.org Fri Apr 12 04:03:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 12 Apr 2024 04:03:41 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads In-Reply-To: References: Message-ID: <4pASLVtC0nnzUqxQMZ95INhuEVXu36Ma_g7TbYO_kmA=.f24bf7f5-bd82-4204-a13b-27328b13d49f@github.com> On Fri, 12 Apr 2024 03:57:10 GMT, Jatin Bhateja wrote: > - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. > - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. > - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. > - For vector write access, this may prevent value forwarding, which may result into subsequent redundant loads, but preventing intensification failure will offset that cost. > > Kindly review and share your feedback. > > Best Regards, > Jatin Graph shape describing the problem. ![image](https://github.com/openjdk/jdk/assets/59989778/beb2c8f3-42d0-4015-bcab-5aa05540d793) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18749#issuecomment-2050927571 From thartmann at openjdk.org Fri Apr 12 06:17:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Apr 2024 06:17:41 GMT Subject: RFR: 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:51:29 GMT, Roland Westrelin wrote: > This is a minor issue that I ran into at some point with JDK-8275202: > calling `PhaseValues::intcon()` is required so the node is properly > entered in the GVN's hash table and its type is properly recorded in > the GVN's type table. Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18742#pullrequestreview-1995954072 From dfenacci at openjdk.org Fri Apr 12 07:07:42 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Apr 2024 07:07:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: <1PH75HtUP_ugSlwAmF1u_f9FUz9Gb8MPLVyZCtXQ6w8=.7a7dfa9c-d572-48bc-90a5-237ddf581c52@github.com> References: <1PH75HtUP_ugSlwAmF1u_f9FUz9Gb8MPLVyZCtXQ6w8=.7a7dfa9c-d572-48bc-90a5-237ddf581c52@github.com> Message-ID: On Thu, 11 Apr 2024 13:29:05 GMT, Jasmine Karthikeyan wrote: >> src/hotspot/share/opto/type.cpp line 441: >> >>> 439: // Map the boolean result of Type::cmp into a comparator result that CmpKey expects. >>> 440: auto type_cmp = [](const void* t1, const void* t2) -> int32_t { >>> 441: return Type::cmp((Type*) t1, (Type*) t2) ? 0 : 1; >> >> Wouldn't `return !Type::cmp((Type*) t1, (Type*) t2);` be enough? (though it might actually result in the same compiled code) > > I think this would work too, but I wanted to avoid the implicit boolean to integer conversion. I can make this change if it would be fine, though. Fair point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1562109544 From roland at openjdk.org Fri Apr 12 07:19:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 07:19:45 GMT Subject: RFR: 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 17:38:44 GMT, Vladimir Kozlov wrote: >> This is a minor issue that I ran into at some point with JDK-8275202: >> calling `PhaseValues::intcon()` is required so the node is properly >> entered in the GVN's hash table and its type is properly recorded in >> the GVN's type table. > > Good. @vnkozlov @TobiHartmann thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18742#issuecomment-2051159809 From roland at openjdk.org Fri Apr 12 07:19:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 07:19:46 GMT Subject: Integrated: 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:51:29 GMT, Roland Westrelin wrote: > This is a minor issue that I ran into at some point with JDK-8275202: > calling `PhaseValues::intcon()` is required so the node is properly > entered in the GVN's hash table and its type is properly recorded in > the GVN's type table. This pull request has now been integrated. Changeset: bde3fc0c Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/bde3fc0c03c87d1f2605ae6bb84c33fadb7aa865 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod 8330106: C2: VectorInsertNode::make() shouldn't call ConINode::make() directly Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18742 From thomas.stuefe at gmail.com Fri Apr 12 07:30:56 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 12 Apr 2024 09:30:56 +0200 Subject: Enable compiler memory limits by default? Message-ID: Hi, Issues like https://bugs.openjdk.org/browse/JDK-8330103 show that compiler memory consumption can be an issue. Since https://bugs.openjdk.org/browse/JDK-8318016, we have an optional per-compilation memory limit. If we reach that limit, one of two things (configurable) happens: we either assert or abort the compilation. These memory limits build on the compiler memory statistic added with https://bugs.openjdk.org/browse/JDK-8317683. Enabling memory limits also enables memory statistics. Some ideas: 1) We could enable a reasonable memory limit per default for debug builds. Preferably combined with the assert option. That way, we run all tests on a debug VM with memory limits enabled. If there are pathological compilations during testing, we will notice them. (I don't know if we would notice them today; even if testers let JVMs run with outside ulimits, these limits are typically very high to allow for the total expected memory consumption of the test JVM). Such a memory limit could be set at whatever we feel is pathological, e.g., several hundred MB. Even set at 1GB, we would hopefully see cases like 8318016 in our tests. 2) If we don't want (1), we could at least enable memory statistics by default for debug builds and print it out to hs-err files. 3) We could also enable memory limits in release builds and bail out of the compilations. A small cost is involved, probably negligible: on Arena enlargement, we increase several thread local counters. Unfortunately, there is a small risk, too, in that bailout paths in C2 may be broken, leading to follow-up errors. We fixed them all, I think, but there is a remaining risk. OTOH, using up excessive amounts of memory is also not optimal. What do you think? Would this make sense? If (1) makes sense to you, what limit would be reasonable? Cheers, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Fri Apr 12 07:32:55 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Apr 2024 07:32:55 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 Message-ID: See JBS description. This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. ------------- Commit messages: - start Changes: https://git.openjdk.org/jdk/pull/18740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18740&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330103 Stats: 31 lines in 1 file changed: 19 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/18740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18740/head:pull/18740 PR: https://git.openjdk.org/jdk/pull/18740 From stuefe at openjdk.org Fri Apr 12 07:32:55 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 12 Apr 2024 07:32:55 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:03:17 GMT, Thomas Stuefe wrote: > See JBS description. > > This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. Ping @cl4es ------------- PR Comment: https://git.openjdk.org/jdk/pull/18740#issuecomment-2051177805 From tobias.hartmann at oracle.com Fri Apr 12 07:52:11 2024 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 12 Apr 2024 09:52:11 +0200 Subject: Enable compiler memory limits by default? In-Reply-To: References: Message-ID: Hi Thomas, On 12.04.24 09:30, Thomas St?fe wrote: > Issues like https://bugs.openjdk.org/browse/JDK-8330103 > show that compiler memory consumption can be an issue. I think you linked the wrong issue here. Thanks, Tobias From thomas.stuefe at gmail.com Fri Apr 12 08:04:17 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 12 Apr 2024 10:04:17 +0200 Subject: Enable compiler memory limits by default? In-Reply-To: References: Message-ID: Indeed, thanks for noticing. I meant https://bugs.openjdk.org/browse/JDK-8327247 "C2 uses up to 2GB of RAM to compile complex string concat in extreme cases" Cheers, Thomas On Fri, Apr 12, 2024 at 9:52?AM Tobias Hartmann wrote: > Hi Thomas, > > On 12.04.24 09:30, Thomas St?fe wrote: > > Issues like https://bugs.openjdk.org/browse/JDK-8330103 > > show that compiler memory > consumption can be an issue. > > I think you linked the wrong issue here. > > Thanks, > Tobias > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gcao at openjdk.org Fri Apr 12 08:47:41 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 12 Apr 2024 08:47:41 GMT Subject: RFR: 8330095: RISC-V: Remove obsolete vandn_vi instruction In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 11:23:32 GMT, Fei Yang wrote: >> Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. >> >> ### Testing >> - [x] fastdebug build successfully > > LGTM. @RealFYang @luhenry : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18737#issuecomment-2051310245 From rcastanedalo at openjdk.org Fri Apr 12 08:59:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Apr 2024 08:59:43 GMT Subject: RFR: 8324950: IGV: save the state to a file [v23] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:26:01 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > add Thanks for addressing my questions and comments so far, Toby! One more high-level question: currently, it seems the graph state is restored only for graphs from opened graph files, but not for graphs that are imported (e.g. from other graph files or from the network). Would it be hard to restore also the state from imported graphs? This is probably not a high-priority use case, but would be nice to e.g. enable sending graphs from gdb that are opened directly and where e.g. one node is highlighted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2051331699 From chagedorn at openjdk.org Fri Apr 12 10:05:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 10:05:15 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v3] In-Reply-To: References: Message-ID: > While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: > > > abstract class A {} > class X extends A {} > > A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. > > However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): > https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 > https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 > > We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). > > This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Revert "using improved type for non-constants" + add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18515/files - new: https://git.openjdk.org/jdk/pull/18515/files/660c0dec..d7719279 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18515&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18515&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18515.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18515/head:pull/18515 PR: https://git.openjdk.org/jdk/pull/18515 From chagedorn at openjdk.org Fri Apr 12 10:05:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 10:05:15 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v2] In-Reply-To: References: <5uF4mGEHHBhL_V5pOPSXbggBpBBjrVd96S6s6GJUZCk=.c834b0d1-3d83-48ef-af6f-678a5e9d5702@github.com> Message-ID: On Thu, 11 Apr 2024 07:53:46 GMT, Roland Westrelin wrote: >> I could not find a test that shows a benefit by doing the improved check with code from `try_improve()` inside `SubTypeCheckNode::Value()`. The original patch only showed a win for mainline when we have two identical `SubTypeCheckNodes` such that they can common up with an improved constant. But this cannot be achieved when having the code in `SubTypeCheckNode::Value()`. >> >> I therefore suggest to revert back to the original version to directly plug in a better constant, if we find one with `try_improve()`, and just skip the other non-constant cases with `LoadKlass` etc. >> >> For the Valhalla bug, I can do the more sophisticated fix to improve `SubTypeCheckNode::Value()` with code from `try_improve()` but it does not seem worth for mainline. What do you think? > > Sounds good. Thanks for giving it a try. > Since you mention fixing this in valhalla, do you expect it makes a difference there but not in mainline? Ok great! I had another look at Valhalla and I think the constant improvement fix is all that is needed. I reverted the last commit to go back to the original patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18515#discussion_r1562334305 From roland at openjdk.org Fri Apr 12 10:05:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 10:05:15 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v3] In-Reply-To: References: Message-ID: <4O-4WBFHrb7LDQAdWzbhQrFHBAnDWtZNBfl1RBAOBaU=.d5445e1b-2c09-4da7-9ed0-53c6f94422ba@github.com> On Fri, 12 Apr 2024 10:01:38 GMT, Christian Hagedorn wrote: >> While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: >> >> >> abstract class A {} >> class X extends A {} >> >> A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. >> >> However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 >> >> We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). >> >> This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Revert "using improved type for non-constants" + add comment Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18515#pullrequestreview-1996386688 From chagedorn at openjdk.org Fri Apr 12 10:17:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 10:17:44 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v3] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 10:05:15 GMT, Christian Hagedorn wrote: >> While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: >> >> >> abstract class A {} >> class X extends A {} >> >> A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. >> >> However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 >> >> We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). >> >> This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Revert "using improved type for non-constants" + add comment Thanks Roland for your review and the discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18515#issuecomment-2051473937 From gcao at openjdk.org Fri Apr 12 11:44:45 2024 From: gcao at openjdk.org (Gui Cao) Date: Fri, 12 Apr 2024 11:44:45 GMT Subject: Integrated: 8330095: RISC-V: Remove obsolete vandn_vi instruction In-Reply-To: References: Message-ID: <_dLC6Caibb8BRwe7TpcEvftc4fqULL1reWr7tjKI0Zw=.5e44d5e0-066d-4aa1-ac8a-dd5ece260927@github.com> On Thu, 11 Apr 2024 11:17:49 GMT, Gui Cao wrote: > Hi, We notice that the `vandn_vi` instruction is defined in the current code and is not used anywhere, it is not available in the riscv-crypto Release-1.0.0 manual. The `vandn_vi` instruction is present in earlier riscv-crypto manual, but the `vandnvi` has been removed from the https://github.com/riscv/riscv-crypto/commit/82a02f09668adb18dfee5dfc45a0ce7d3af10103 commit. > > ### Testing > - [x] fastdebug build successfully This pull request has now been integrated. Changeset: 77a217df Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/77a217df6000190cf73a1ca42a42cbcec42fb60f Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8330095: RISC-V: Remove obsolete vandn_vi instruction Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/18737 From roland at openjdk.org Fri Apr 12 11:49:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 11:49:54 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false Message-ID: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> This is another small change from something I ran into while working on 8275202. `CMoveNode::Value` can be improved when the condition is known to be always true or false. That doesn't affect IGVN (as the `CMove` is removed in that case) but it can be useful for passes that propagates types such as CCP. In the IR tests, the backbranch of the loop is never taken when the root of the compilation is `test1`. With the change, CCP can eliminate it. Without, it can't. ------------- Commit messages: - test & fix Changes: https://git.openjdk.org/jdk/pull/18757/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18757&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330163 Stats: 78 lines in 2 files changed: 78 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18757/head:pull/18757 PR: https://git.openjdk.org/jdk/pull/18757 From bkilambi at openjdk.org Fri Apr 12 12:03:57 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 12 Apr 2024 12:03:57 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Revert to previous indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/1156ef39..71a86deb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From coleenp at openjdk.org Fri Apr 12 12:06:41 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 Apr 2024 12:06:41 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: <87z7AJuQT6nk_zmOD9YTWnE7KSK0ivHYfK-kEyWnQhw=.901ef276-0f8c-4cdf-933b-a0eb45f14dc1@github.com> References: <87z7AJuQT6nk_zmOD9YTWnE7KSK0ivHYfK-kEyWnQhw=.901ef276-0f8c-4cdf-933b-a0eb45f14dc1@github.com> Message-ID: On Fri, 12 Apr 2024 02:11:55 GMT, Fei Yang wrote: >> I didn't notice your message until just now, but you're right. I would prefer to use `load_resolved_field_entry` when possible but there are places where fields are loaded individually so I think it's fine. > > All right. Another place in `TemplateTable::fast_accessfield` which might be worth turning into a `load_resolved_field_entry` call with `noreg` for `tos_state` for consistency [1][2]: > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L3169-L3170 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3139-L3140 I think this change should be limited to fixing the membar issue and the push/pop issue and limit refactoring to a new change if more is wanted, but maybe this could call load_resolved_field_entry with noreg for tos-state too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1562456477 From roland at openjdk.org Fri Apr 12 12:30:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 12:30:54 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() Message-ID: Another set of changes from 8275202. There are cases in superword where new nodes are not assigned control. I believe they are harmless currently because superword is the last pass of optimizations. I also cleaned up the code so it always uses `register_new_node()`. There are a couple places where `intcon()` should be used. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/18760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18760&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330165 Stats: 40 lines in 1 file changed: 1 ins; 17 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/18760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18760/head:pull/18760 PR: https://git.openjdk.org/jdk/pull/18760 From dfenacci at openjdk.org Fri Apr 12 12:48:42 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 12 Apr 2024 12:48:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! Looks good (I also tested tier1-5). ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/18533#pullrequestreview-1996823520 From chagedorn at openjdk.org Fri Apr 12 12:48:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 12:48:42 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() In-Reply-To: References: Message-ID: <_lXinX6Mdd_z8o22ZobW1VHWOJXqRtjiRHYPKucIKrw=.ac2a7300-e788-4888-8df0-6d0130d8b691@github.com> On Fri, 12 Apr 2024 12:26:10 GMT, Roland Westrelin wrote: > Another set of changes from 8275202. There are cases in superword > where new nodes are not assigned control. I believe they are harmless > currently because superword is the last pass of optimizations. I also > cleaned up the code so it always uses `register_new_node()`. There are > a couple places where `intcon()` should be used. src/hotspot/share/opto/superword.cpp line 2557: > 2555: const TypeVect* vt = TypeVect::make(bt, vlen); > 2556: VectorNode* mask = new VectorMaskCmpNode(bol_test, cmp_in1, cmp_in2, bol_test_node, vt); > 2557: phase()->register_new_node(mask, phase()->get_ctrl(p->at(0))); Good refactoring. Since you're using this pattern quite often, I was just wondering, if we should have a separate method `register_new_node_with_ctrl_of()` (or something like that) that does: PhaseIdealLoop::register_new_node_with_ctrl_of(Node* new_node, Node* ctrl_of) { register_new_node(new_node, get_ctrl(ctrl_of)); } And then: phase()->register_new_node_with_ctrl_of(mask, p->at(0)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18760#discussion_r1562501615 From rcastanedalo at openjdk.org Fri Apr 12 13:18:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Apr 2024 13:18:01 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes Message-ID: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: import java.lang.invoke.VarHandle; import java.lang.invoke.MethodHandles; public class Example { static class Outer { Object f; } static final VarHandle fVarHandle; static { MethodHandles.Lookup l = MethodHandles.lookup(); try { fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); } catch (Exception e) { throw new Error(e); } } static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { return fVarHandle.compareAndSet(o, oldVal, newVal); } public static void main(String[] args) { for (int i = 0; i < 10_000; i++) { Outer o = new Outer(); Object oldVal = new Object(); o.f = oldVal; Object newVal = new Object(); testCompareAndSwap(o, oldVal, newVal); } } } Before this changeset, issuing this command: $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP gives the following dump: R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) After this changeset, we get: R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) Note the additional `barrier(strong )` field in the second dump. **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). ------------- Commit messages: - Dump barrier information for all Mach nodes, not just MachType ones, to include CompareAndSwap/WeakCompareAndSwap matches Changes: https://git.openjdk.org/jdk/pull/18754/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18754&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330153 Stats: 11 lines in 1 file changed: 6 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18754/head:pull/18754 PR: https://git.openjdk.org/jdk/pull/18754 From bulasevich at openjdk.org Fri Apr 12 13:26:45 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 12 Apr 2024 13:26:45 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 10:31:09 GMT, Tobias Hartmann wrote: >> Applying @danielogh's patch (see description of [JDK-8329258](https://bugs.openjdk.org/browse/JDK-8329258)) to enable `StressGCM` / `StressLCM` for stub compilations triggers a crash on AArch64. The problem is that the register allocator uses `R29` (`rfp`) which is usually used for the frame pointer to hold the `TailCall` `exc_target` when generating the `_slow_arraycopy_Java` stub: >> https://github.com/openjdk/jdk/blob/6dfb8120c270a76fcba5a5c3c9ad91da3282d5fa/src/hotspot/share/opto/generateOptoStub.cpp#L258-L264 >> >> With `StressGCM` / `StressLCM` the register initialization is scheduled early and `R29` is corrupted by the `MachEpilogNode` which is opaque to the register allocator and inserted right before the `TailCall`: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/share/opto/output.cpp#L320-L326 >> >> >> 028 mov R29, 0x0000ffff78cc0080 # ptr >> >> [...] >> >> 098 # pop frame 16 >> ldp lr, rfp, [sp,#0] <- Epilog kills rfp (and lr + sp) >> add sp, sp, #16 >> >> [...] >> >> 0a0 br R29 # R12 holds method >> >> >> As a result, we jump to a "garbage" location. See [bad code](https://bugs.openjdk.org/secure/attachment/108835/BAD_slow_arraycopy_Java.txt) vs. [good code](https://bugs.openjdk.org/secure/attachment/108834/GOOD_slow_arraycopy_Java.txt). >> >> On x86, we use `no_rbp_RegP` instead: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L2564-L2566 https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L12470 >> >> I implemented the same on AArch64. >> >> I think other platforms are affected as well but I don't have the hardware to test there. @offamitkumar (S390), @TheRealMDoerr (PPC), @RealFYang (RISC-V), @bulasevich (ARM32), could you please have a look? >> >> I also wondered if `R29` shouldn't be a callee-save (SOE) register in the C calling convention? >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/aarch64/aarch64.ad#L139-L140 On x86_64, `RBP` is SOE: >> https://github.com/openjdk/jdk/blob/b49ba426a721db5926ac1b45d573d468389d479c/src/hotspot/cpu/x86/x86_64.ad#L86-L87 >> >> `TestTailCallInArrayCopyStub.java` will only work once [JDK-8330016](https://bugs.openjdk.org/browse/JDK-8330016) is integrated. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Comment adjustment > I think other platforms are affected as well but I don't have the hardware to test there. > @bulasevich (ARM32), could you please have a look? Hi. I checked ARM32. R11 (FP) is a common register that is not just dedicated solely to the frame pointer. And with a given test and patch I can not reproduce SIGSEGV on ARM32 platform. So I think ARM32 is not affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2051756609 From tholenstein at openjdk.org Fri Apr 12 13:30:11 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 12 Apr 2024 13:30:11 GMT Subject: RFR: 8324950: IGV: save the state to a file [v24] In-Reply-To: References: Message-ID: <9SQYt9HRD_KG3Q2WfzXDcQvQaZrrEAZzcy-L8lC1ulc=.216b15c8-c887-41a1-b338-81e8c1d6b0e7@github.com> > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: re-add setInvokeLater to Parser.java and remove RequestProcessor from Server.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/d2a0ff75..5be2c7d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=22-23 Stats: 236 lines in 5 files changed: 76 ins; 104 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From chagedorn at openjdk.org Fri Apr 12 13:35:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 13:35:42 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false In-Reply-To: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> Message-ID: <7BhOe0d8HRgv2VA9gHARD7-07oNHvkzyZabEGp0HKOI=.33cba0e6-58c7-4f0e-ac4a-91c9c26eb6c4@github.com> On Fri, 12 Apr 2024 11:45:05 GMT, Roland Westrelin wrote: > This is another small change from something I ran into while working > on 8275202. `CMoveNode::Value` can be improved when the condition is > known to be always true or false. That doesn't affect IGVN (as the > `CMove` is removed in that case) but it can be useful for passes that > propagates types such as CCP. In the IR tests, the backbranch of the > loop is never taken when the root of the compilation is `test1`. With > the change, CCP can eliminate it. Without, it can't. Looks good to me. src/hotspot/share/opto/movenode.cpp line 176: > 174: } > 175: if (phase->type(in(Condition)) == TypeInt::ONE) { > 176: return phase->type(in(IfTrue))->filter(_type); // Always pick right(true) input Suggestion: return phase->type(in(IfFalse))->filter(_type); // Always pick left (false) input } if (phase->type(in(Condition)) == TypeInt::ONE) { return phase->type(in(IfTrue))->filter(_type); // Always pick right (true) input ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18757#pullrequestreview-1997063700 PR Review Comment: https://git.openjdk.org/jdk/pull/18757#discussion_r1562559591 From tholenstein at openjdk.org Fri Apr 12 13:40:42 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 12 Apr 2024 13:40:42 GMT Subject: RFR: 8324950: IGV: save the state to a file [v23] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 08:57:25 GMT, Roberto Casta?eda Lozano wrote: > Thanks for addressing my questions and comments so far, Toby! > > One more high-level question: currently, it seems the graph state is restored only for graphs from opened graph files, but not for graphs that are imported (e.g. from other graph files or from the network). Would it be hard to restore also the state from imported graphs? This is probably not a high-priority use case, but would be nice to e.g. enable sending graphs from gdb that are opened directly and where e.g. one node is highlighted. If graphs are sent over the network with `` they are automatically opened. But this feature is currently not available in C2. Regarding manually importing a graph I prefer to not automatically open the Tabs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2051778317 From tholenstein at openjdk.org Fri Apr 12 13:40:43 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 12 Apr 2024 13:40:43 GMT Subject: RFR: 8324950: IGV: save the state to a file [v24] In-Reply-To: <9SQYt9HRD_KG3Q2WfzXDcQvQaZrrEAZzcy-L8lC1ulc=.216b15c8-c887-41a1-b338-81e8c1d6b0e7@github.com> References: <9SQYt9HRD_KG3Q2WfzXDcQvQaZrrEAZzcy-L8lC1ulc=.216b15c8-c887-41a1-b338-81e8c1d6b0e7@github.com> Message-ID: On Fri, 12 Apr 2024 13:30:11 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > re-add setInvokeLater to Parser.java and remove RequestProcessor from Server.java I updated the code because I found a bug that sometimes I get a Error `event dispatch thread (EDT) is being executed on a non-EDT thread` - Therefore I re-added the required `invokeLater` to Parser.java. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2051782064 From jkarthikeyan at openjdk.org Fri Apr 12 14:06:42 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 12 Apr 2024 14:06:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! Great, thank you for the review and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18533#issuecomment-2051826574 From roland at openjdk.org Fri Apr 12 14:34:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 14:34:01 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> Message-ID: <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> > This is another small change from something I ran into while working > on 8275202. `CMoveNode::Value` can be improved when the condition is > known to be always true or false. That doesn't affect IGVN (as the > `CMove` is removed in that case) but it can be useful for passes that > propagates types such as CCP. In the IR tests, the backbranch of the > loop is never taken when the root of the compilation is `test1`. With > the change, CCP can eliminate it. Without, it can't. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/movenode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18757/files - new: https://git.openjdk.org/jdk/pull/18757/files/1442e178..c11fd9e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18757&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18757&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18757/head:pull/18757 PR: https://git.openjdk.org/jdk/pull/18757 From chagedorn at openjdk.org Fri Apr 12 14:43:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Apr 2024 14:43:41 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> Message-ID: On Fri, 12 Apr 2024 14:34:01 GMT, Roland Westrelin wrote: >> This is another small change from something I ran into while working >> on 8275202. `CMoveNode::Value` can be improved when the condition is >> known to be always true or false. That doesn't affect IGVN (as the >> `CMove` is removed in that case) but it can be useful for passes that >> propagates types such as CCP. In the IR tests, the backbranch of the >> loop is never taken when the root of the compilation is `test1`. With >> the change, CCP can eliminate it. Without, it can't. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/movenode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18757#pullrequestreview-1997414559 From roland at openjdk.org Fri Apr 12 14:58:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 14:58:56 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() [v2] In-Reply-To: References: Message-ID: > Another set of changes from 8275202. There are cases in superword > where new nodes are not assigned control. I believe they are harmless > currently because superword is the last pass of optimizations. I also > cleaned up the code so it always uses `register_new_node()`. There are > a couple places where `intcon()` should be used. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18760/files - new: https://git.openjdk.org/jdk/pull/18760/files/f85804cc..fabd851a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18760&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18760&range=00-01 Stats: 20 lines in 5 files changed: 3 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18760/head:pull/18760 PR: https://git.openjdk.org/jdk/pull/18760 From roland at openjdk.org Fri Apr 12 14:58:57 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Apr 2024 14:58:57 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() [v2] In-Reply-To: <_lXinX6Mdd_z8o22ZobW1VHWOJXqRtjiRHYPKucIKrw=.ac2a7300-e788-4888-8df0-6d0130d8b691@github.com> References: <_lXinX6Mdd_z8o22ZobW1VHWOJXqRtjiRHYPKucIKrw=.ac2a7300-e788-4888-8df0-6d0130d8b691@github.com> Message-ID: On Fri, 12 Apr 2024 12:45:36 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/superword.cpp line 2557: > >> 2555: const TypeVect* vt = TypeVect::make(bt, vlen); >> 2556: VectorNode* mask = new VectorMaskCmpNode(bol_test, cmp_in1, cmp_in2, bol_test_node, vt); >> 2557: phase()->register_new_node(mask, phase()->get_ctrl(p->at(0))); > > Good refactoring. Since you're using this pattern quite often, I was just wondering, if we should have a separate method `register_new_node_with_ctrl_of()` (or something like that) that does: > > PhaseIdealLoop::register_new_node_with_ctrl_of(Node* new_node, Node* ctrl_of) { > register_new_node(new_node, get_ctrl(ctrl_of)); > } > > And then: > > phase()->register_new_node_with_ctrl_of(mask, p->at(0)); Good idea. I updated the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18760#discussion_r1562670215 From kvn at openjdk.org Fri Apr 12 17:06:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 17:06:41 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v3] In-Reply-To: References: Message-ID: <2sRdjT4dSzCdpg6yliS2Wk0DHfGHJNXEjTfvyaA-nQU=.035d2682-938b-4892-92bd-a08148a725f5@github.com> On Fri, 12 Apr 2024 10:05:15 GMT, Christian Hagedorn wrote: >> While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: >> >> >> abstract class A {} >> class X extends A {} >> >> A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. >> >> However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 >> >> We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). >> >> This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Revert "using improved type for non-constants" + add comment Looks good. Good find! ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18515#pullrequestreview-1997991651 From kvn at openjdk.org Fri Apr 12 17:07:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 17:07:43 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> Message-ID: On Fri, 12 Apr 2024 14:34:01 GMT, Roland Westrelin wrote: >> This is another small change from something I ran into while working >> on 8275202. `CMoveNode::Value` can be improved when the condition is >> known to be always true or false. That doesn't affect IGVN (as the >> `CMove` is removed in that case) but it can be useful for passes that >> propagates types such as CCP. In the IR tests, the backbranch of the >> loop is never taken when the root of the compilation is `test1`. With >> the change, CCP can eliminate it. Without, it can't. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/movenode.cpp > > Co-authored-by: Christian Hagedorn Nice. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18757#pullrequestreview-1997999686 From kvn at openjdk.org Fri Apr 12 18:08:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 18:08:42 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes In-Reply-To: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Fri, 12 Apr 2024 10:30:17 GMT, Roberto Casta?eda Lozano wrote: > This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). > > The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: > > > import java.lang.invoke.VarHandle; > import java.lang.invoke.MethodHandles; > > public class Example { > static class Outer { > Object f; > } > > static final VarHandle fVarHandle; > static { > MethodHandles.Lookup l = MethodHandles.lookup(); > try { > fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); > } catch (Exception e) { > throw new Error(e); > } > } > > static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { > return fVarHandle.compareAndSet(o, oldVal, newVal); > } > > public static void main(String[] args) { > for (int i = 0; i < 10_000; i++) { > Outer o = new Outer(); > Object oldVal = new Object(); > o.f = oldVal; > Object newVal = new Object(); > testCompareAndSwap(o, oldVal, newVal); > } > } > } > > > Before this changeset, issuing this command: > > > $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP > > > gives the following dump: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) > > > After this changeset, we get: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68... Can you add a simple IR test to show case it? ------------- PR Review: https://git.openjdk.org/jdk/pull/18754#pullrequestreview-1998148779 From kvn at openjdk.org Fri Apr 12 18:30:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 18:30:41 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:03:17 GMT, Thomas Stuefe wrote: > See JBS description. > > This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. Looks fine. Can you give an example of new output? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18740#pullrequestreview-1998202306 From dlong at openjdk.org Fri Apr 12 19:28:42 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 12 Apr 2024 19:28:42 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 13:23:46 GMT, Boris Ulasevich wrote: > > I think other platforms are affected as well but I don't have the hardware to test there. > > @bulasevich (ARM32), could you please have a look? > > Hi. I checked ARM32. R11 (FP) is a common register that is not just dedicated solely to the frame pointer. And with a given test and patch I can not reproduce SIGSEGV on ARM32 platform. So I think ARM32 is not affected. It looks like ARM32 is safe because the rules use specific, "bound" registers. If they used something generic like iRegP, I think the callee-saved R11 would still be in danger of getting trashed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2052403066 From sgibbons at openjdk.org Fri Apr 12 20:05:07 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 12 Apr 2024 20:05:07 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception Message-ID: Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. ------------- Commit messages: - Add UnsafeCopyMemoryMark around arraycopy_avx3_large() Changes: https://git.openjdk.org/jdk/pull/18766/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18766&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330185 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18766/head:pull/18766 PR: https://git.openjdk.org/jdk/pull/18766 From qamai at openjdk.org Fri Apr 12 20:11:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Apr 2024 20:11:42 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> Message-ID: <079UKoZpe1eMTvJIIuKDqh6uSf1g3Cu8dUm5hkVddnA=.2cf0c555-5c29-4891-b4b6-866c11d7f080@github.com> On Fri, 12 Apr 2024 14:34:01 GMT, Roland Westrelin wrote: >> This is another small change from something I ran into while working >> on 8275202. `CMoveNode::Value` can be improved when the condition is >> known to be always true or false. That doesn't affect IGVN (as the >> `CMove` is removed in that case) but it can be useful for passes that >> propagates types such as CCP. In the IR tests, the backbranch of the >> loop is never taken when the root of the compilation is `test1`. With >> the change, CCP can eliminate it. Without, it can't. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/movenode.cpp > > Co-authored-by: Christian Hagedorn Can we check `Identity` during `PhaseCCP` instead? I see other inferences such as `AndINode` that may benefit from it. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2052456649 From kvn at openjdk.org Fri Apr 12 22:05:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 22:05:50 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. What about `generate_conjoint_copy_avx3_masked`? ------------- PR Review: https://git.openjdk.org/jdk/pull/18766#pullrequestreview-1998506993 From sgibbons at openjdk.org Fri Apr 12 22:15:46 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 12 Apr 2024 22:15:46 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 22:03:32 GMT, Vladimir Kozlov wrote: > What about `generate_conjoint_copy_avx3_masked`? There is protection within that procedure. See line 861. Although `ucme_exit_pc` is set and not used, there is no special case as in `generate_disjoint_copy_avx3_masked()`. All of the subordinate procedures within `generate_conjoint_copy_avx3_masked` are protected with a mark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18766#issuecomment-2052626778 From kvn at openjdk.org Fri Apr 12 22:21:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 22:21:51 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: <5YZOJDVt9DR0XQ8gFPcThAvroQ1Nv7CP2dDyV5TFc6w=.58667d75-96df-492e-86d8-d2cb06624f1f@github.com> On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. Yes, it looks like it was oversight during these two avx512 arraycopy methods implementation: [#61](https://github.com/openjdk/jdk/pull/61) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18766#issuecomment-2052631928 From kvn at openjdk.org Fri Apr 12 22:29:46 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 22:29:46 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18766#pullrequestreview-1998536495 From sgibbons at openjdk.org Fri Apr 12 22:50:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 12 Apr 2024 22:50:40 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18766#issuecomment-2052650618 From kvn at openjdk.org Fri Apr 12 23:33:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Apr 2024 23:33:40 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. I will wait second review before sponsoring. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18766#issuecomment-2052693606 From sviswanathan at openjdk.org Fri Apr 12 23:55:41 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 Apr 2024 23:55:41 GMT Subject: RFR: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. Looks good to me as well. Thanks for fixing this. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18766#pullrequestreview-1998747585 From sgibbons at openjdk.org Sat Apr 13 00:51:47 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 13 Apr 2024 00:51:47 GMT Subject: Integrated: 8330185: Potential uncaught unsafe memory copy exception In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 20:00:35 GMT, Scott Gibbons wrote: > Adding an `UnsafeCopyMemoryMark` in `generate_disjoint_copy_avx3_masked()` to protect against SIGBUS in `arraycopy_avx3_large()`. I discovered this by code inspection, and the missing memory mark is inconsistent with all other generators. I do not have a testcase to generate such an exception. I think this may have been a copy/paste error by the original contributor as evidenced by the variable `ucme_exit_pc` having been set but never used. > > I have not seen a VM crash that can be attributed to this, but adding the mark is the correct behavior for its prevention. This pull request has now been integrated. Changeset: b9ef9f66 Author: Scott Gibbons Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/b9ef9f667ef9d4052c9d6dfec763b94d331dc04d Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8330185: Potential uncaught unsafe memory copy exception Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18766 From stuefe at openjdk.org Sat Apr 13 05:57:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 13 Apr 2024 05:57:40 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 18:28:07 GMT, Vladimir Kozlov wrote: > Looks fine. Can you give an example of new output? Thanks, @vnkozlov . Certainly: [x.txt](https://github.com/openjdk/jdk/files/14965446/x.txt) The difference is subtle. Notice how we now see e.g. `StringCoding::countPositives` twice in the summary listing (lines 21 and 69), once for C1, once for C2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18740#issuecomment-2053516245 From kvn at openjdk.org Sat Apr 13 18:50:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 13 Apr 2024 18:50:41 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:03:17 GMT, Thomas Stuefe wrote: > See JBS description. > > This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18740#issuecomment-2053729285 From fjiang at openjdk.org Sun Apr 14 08:12:03 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 14 Apr 2024 08:12:03 GMT Subject: RFR: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 Message-ID: Hi, please review this fix that adds additional CMove match rules for the riscv port. [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. After this fix, the following five tests would pass without any crashes. Testing: - [x] compiler/eliminateAutobox/TestDoubleBoxing.java - [x] compiler/eliminateAutobox/TestFloatBoxing.java - [x] compiler/eliminateAutobox/TestLongBoxing.java - [x] compiler/eliminateAutobox/TestIntBoxing.java - [x] compiler/eliminateAutobox/TestShortBoxing.java - [ ] tier1~3 (linux-riscv64, release) ------------- Commit messages: - RISC-V: Add extra match rule for CMoveI and CMove after JDK-8316991 Changes: https://git.openjdk.org/jdk/pull/18774/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18774&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330213 Stats: 68 lines in 1 file changed: 68 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18774.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18774/head:pull/18774 PR: https://git.openjdk.org/jdk/pull/18774 From jzhu at openjdk.org Mon Apr 15 03:32:50 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Mon, 15 Apr 2024 03:32:50 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: <6iifh18HJLCDWzb32dhVfpTcMjzXoXtSlmtY8ZoYzHc=.5b4491d7-3836-4312-bff1-49ef68ec15fa@github.com> On Wed, 20 Mar 2024 03:55:33 GMT, Joshua Zhu wrote: >> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. >> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, >> even the use of a floating point may cause the maximum 2048 bits stack occupied. >> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. >> >> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 >> >> >> ...... >> 0x0000ffff684cfad8: stp x15, x18, [sp, #80] >> 0x0000ffff684cfadc: sub sp, sp, #0x100 >> 0x0000ffff684cfae0: str z16, [sp] >> 0x0000ffff684cfae4: add x1, x13, #0x10 >> 0x0000ffff684cfae8: mov x0, x16 >> ;; 0xFFFF803F5414 >> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 >> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 >> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfaf8: blr x8 >> 0x0000ffff684cfafc: mov x16, x0 >> 0x0000ffff684cfb00: ldr z16, [sp] >> 0x0000ffff684cfb04: add sp, sp, #0x100 >> 0x0000ffff684cfb08: ptrue p7.b >> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] >> ...... >> >> >> could be optimized into: >> >> >> ...... >> 0x0000ffff684cfa50: stp x15, x18, [sp, #80] >> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() >> 0x0000ffff684cfa58: add x1, x13, #0x10 >> 0x0000ffff684cfa5c: mov x0, x16 >> ;; 0xFFFF7FA942A8 >> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 >> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 >> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfa6c: blr x8 >> 0x0000ffff684cfa70: mov x16, x0 >> 0x0000ffff684cfa74: ldr d16, [sp], #16 >> 0x0000ffff684cfa78: ptrue p7.b >> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] >> ...... >> >> >> Besides the above benefit, when we know what size of register is live, >> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. >> >> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Add more output for easy debugging once the jtreg test case fails Waiting for another review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17977#issuecomment-2054722588 From fyang at openjdk.org Mon Apr 15 04:19:41 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 15 Apr 2024 04:19:41 GMT Subject: RFR: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 In-Reply-To: References: Message-ID: On Sun, 14 Apr 2024 07:58:48 GMT, Feilong Jiang wrote: > Hi, please review this fix that adds additional CMove match rules for the riscv port. > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. > > After this fix, the following five tests would pass without any crashes. > > Testing: > - [x] compiler/eliminateAutobox/TestDoubleBoxing.java > - [x] compiler/eliminateAutobox/TestFloatBoxing.java > - [x] compiler/eliminateAutobox/TestLongBoxing.java > - [x] compiler/eliminateAutobox/TestIntBoxing.java > - [x] compiler/eliminateAutobox/TestShortBoxing.java > - [x] tier1~3 (linux-riscv64, release) > - [x] hotspot:tier1 (linux-riscv64, fastdebug) Looks good. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18774#pullrequestreview-1999987087 From chagedorn at openjdk.org Mon Apr 15 06:28:48 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Apr 2024 06:28:48 GMT Subject: RFR: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class [v3] In-Reply-To: References: Message-ID: <4J2eEv0ElGzydIpCYhtNnaDXC81rJwp0pVZC2shxW_4=.abd74ff9-1f45-4d76-b883-caa5ec1a0289@github.com> On Fri, 12 Apr 2024 10:05:15 GMT, Christian Hagedorn wrote: >> While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: >> >> >> abstract class A {} >> class X extends A {} >> >> A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. >> >> However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 >> https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 >> >> We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). >> >> This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Revert "using improved type for non-constants" + add comment Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18515#issuecomment-2055552723 From chagedorn at openjdk.org Mon Apr 15 06:28:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Apr 2024 06:28:49 GMT Subject: Integrated: 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class In-Reply-To: References: Message-ID: <4pepgxx12lLzvn0rOUMwbDa8NdoBevtxlQl_qSuZIe0=.e5978c8a-2d93-48ad-b275-6204878e023a@github.com> On Wed, 27 Mar 2024 15:17:36 GMT, Christian Hagedorn wrote: > While working on a [Valhalla bug](https://bugs.openjdk.org/browse/JDK-8321734), I've noticed that a `SubTypeCheckNode` for a `checkcast` does not take a unique concrete sub class `X` of an abstract class `A` as klass constant in the sub type check. Instead, it uses the abstract klass constant: > > > abstract class A {} > class X extends A {} > > A x = (A)object; // Emits SubTypeCheckNode(object, A), but could have used X instead of A. > > However, the `CheckCastPP` result already uses the improved instance type ptr `X` (i.e. `toop` which was improved from `A` by calling `try_improve()` to get the unique concrete sub class): > https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3257-L3261 > https://github.com/openjdk/jdk/blob/614db2ea9e10346475eef34629eab54878aa482d/src/hotspot/share/opto/graphKit.cpp#L3363 > > We should also plug in a unique concrete sub class constant in the `SubTypeCheckNode` which could be beneficial to fold away redundant sub type checks (see test cases). > > This fix is required to completely fix the bug in Valhalla (this is only one of the broken cases). In Valhalla, the graph ends up being broken because a `CheckCastPP` node is folded because of an impossible type but the `SubTypeCheckNode` is not due to not using the improved unique concrete sub class constant for the `checkcast`. I don't think that there is currently a bug in mainline because of this limitation - it just blocks some optimizations. I'm therefore upstreaming this fix to mainline since it can be beneficial to have this fix here as well (see test cases). > > Thanks, > Christian This pull request has now been integrated. Changeset: b486709b Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/b486709b0627cfb4cf428a6508ef7c5b14e6da57 Stats: 85 lines in 2 files changed: 78 ins; 0 del; 7 mod 8328480: C2: SubTypeCheckNode in checkcast should use the klass constant of a unique concrete sub class Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18515 From epeter at openjdk.org Mon Apr 15 06:29:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 06:29:48 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v16] In-Reply-To: References: Message-ID: On Mon, 8 Apr 2024 18:42:43 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > formatting > > Co-authored-by: Christian Hagedorn @chhagedorn for the suggestions, I like the updates! @caojoshua approved again :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-2000103864 From chagedorn at openjdk.org Mon Apr 15 06:45:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Apr 2024 06:45:41 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() [v2] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 14:58:56 GMT, Roland Westrelin wrote: >> Another set of changes from 8275202. There are cases in superword >> where new nodes are not assigned control. I believe they are harmless >> currently because superword is the last pass of optimizations. I also >> cleaned up the code so it always uses `register_new_node()`. There are >> a couple places where `intcon()` should be used. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review That looks good to me, thanks for additionally updating other uses of this pattern! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18760#pullrequestreview-2000126959 From epeter at openjdk.org Mon Apr 15 06:52:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 06:52:47 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 22:48:01 GMT, Vladimir Kozlov wrote: >>> * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths. >> >> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered. > >> > ``` >> > * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths. >> > ``` >> >> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered. > > I completely agree with Roland. @vnkozlov > Can we detect presence of RangeCheck which may cause us to move some stores on fail path and bailout the optimization. I don't think it is frequent case. I assume you will get RC on each store or not at all ("main" part of counted loop). Am I wrong here? I don't remember, does C2 optimize RangeCheck nodes in linear code (it does in loops)? I know about 2 relevant optimizations that remove / move RangeChecks: - RCE (RangeCheck Elimination from loops): hoist all RangeCheck before the loop. That way, there are no RangeChecks left in the loop, and there would be no RangeChecks between the stores we are merging. - RangeCheck Smearing: this also applies in straight-line code, outside of loops. See `RangeCheckNode::Ideal`. Example: RangeCheck[i+0] Store[i+0] RangeCheck[i+1] <--- replaced with i+3 ("smearing" to cover all RC below) Store[i+1] RangeCheck[i+2] <--- removed Store[i+2] RangeCheck[i+3] <--- removed Store[i+3] becomes: RangeCheck[i+0] Store[i+0] RangeCheck[i+3] <--- the RangeCheck that remains between the first and the rest of the consecutive (and adjacent) stores. Store[i+1] Store[i+2] Store[i+3] I think the use-cases from @cl4es are often in straight-line code. Therefore we should cover the "smearing" case where exactly 1 RC remains in the sequence. What you can also see in `RangeCheckNode::Ideal`: if we ever trap (or often enough, I don't remember) in one of the RangeChecks, then we disable `phase->C->allow_range_check_smearing()`. Then we don't do the smearing, and all the RC remain in the sequence. At that point, my optimization would fail since it sees more than 1 RC in the sequence. Does that make sense? I should probably add this information in the comments, so that it is clear why we worry about a single RC at all. People are probably going to wonder like you: "I assume you will get RC on each store or not at all". ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2055667649 From rcastanedalo at openjdk.org Mon Apr 15 07:04:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 07:04:11 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: > This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). > > The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: > > > import java.lang.invoke.VarHandle; > import java.lang.invoke.MethodHandles; > > public class Example { > static class Outer { > Object f; > } > > static final VarHandle fVarHandle; > static { > MethodHandles.Lookup l = MethodHandles.lookup(); > try { > fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); > } catch (Exception e) { > throw new Error(e); > } > } > > static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { > return fVarHandle.compareAndSet(o, oldVal, newVal); > } > > public static void main(String[] args) { > for (int i = 0; i < 10_000; i++) { > Outer o = new Outer(); > Object oldVal = new Object(); > o.f = oldVal; > Object newVal = new Object(); > testCompareAndSwap(o, oldVal, newVal); > } > } > } > > > Before this changeset, issuing this command: > > > $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP > > > gives the following dump: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) > > > After this changeset, we get: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add example ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18754/files - new: https://git.openjdk.org/jdk/pull/18754/files/b856d41f..409a1ef4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18754&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18754&range=00-01 Stats: 80 lines in 2 files changed: 80 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18754/head:pull/18754 PR: https://git.openjdk.org/jdk/pull/18754 From rcastanedalo at openjdk.org Mon Apr 15 07:04:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 07:04:11 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Fri, 12 Apr 2024 18:05:42 GMT, Vladimir Kozlov wrote: > Can you add a simple IR test to show case it? Done, the new test fails before this change and passes afterwards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18754#issuecomment-2055722466 From rcastanedalo at openjdk.org Mon Apr 15 07:10:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 07:10:45 GMT Subject: RFR: 8324950: IGV: save the state to a file [v23] In-Reply-To: References: Message-ID: <2nCpYppXtjiIgnah_DL7faHmNOXmuS7XQAA8yYnIN4Y=.44e3508c-dc45-438c-a2e8-1d23ff2b25b0@github.com> On Fri, 12 Apr 2024 13:36:21 GMT, Tobias Holenstein wrote: > If graphs are sent over the network with they are automatically opened. But this feature is currently not available in C2. Regarding manually importing a graph I prefer to not automatically open the Tabs. I see, thanks for the clarification. I am fine with this decision, the main use case I see is for when graphs are imported over the network. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2055753979 From epeter at openjdk.org Mon Apr 15 07:10:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:10:48 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 12:03:57 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Revert to previous indentation A few more cosmetic things ;) src/hotspot/cpu/aarch64/aarch64_vector.ad line 2856: > 2854: %} > 2855: > 2856: // reduction addF Suggestion: I think comment could be removed, it seems redundant. src/hotspot/cpu/aarch64/aarch64_vector.ad line 2861: > 2859: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered > 2860: // add reduction). > 2861: predicate(Matcher::vector_length(n->in(2)) == 2 && !n->as_Reduction()->requires_strict_order()); Would it make sense to change `reduce_add2F_neon` to something like `reduce_non_strict_order_add2F_neon`, just so that it is a bit clearer when one reads the opto-assembly output? src/hotspot/share/opto/vectorIntrinsics.cpp line 1739: > 1737: Node* init = ReductionNode::make_identity_con_scalar(gvn(), opc, elem_bt); > 1738: Node* value = opd; > 1739: Suggestion: assert(mask != nullptr || !is_masked_op, "Masked op needs the mask value never null"); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18034#pullrequestreview-2000147984 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565263550 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565262784 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565271555 From epeter at openjdk.org Mon Apr 15 07:10:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:10:49 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 06:56:06 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to previous indentation > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2861: > >> 2859: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered >> 2860: // add reduction). >> 2861: predicate(Matcher::vector_length(n->in(2)) == 2 && !n->as_Reduction()->requires_strict_order()); > > Would it make sense to change `reduce_add2F_neon` to something like `reduce_non_strict_order_add2F_neon`, just so that it is a bit clearer when one reads the opto-assembly output? Similarly, I would put `strict_order` for the cases where that applies. > src/hotspot/share/opto/vectorIntrinsics.cpp line 1739: > >> 1737: Node* init = ReductionNode::make_identity_con_scalar(gvn(), opc, elem_bt); >> 1738: Node* value = opd; >> 1739: > > Suggestion: > > > assert(mask != nullptr || !is_masked_op, "Masked op needs the mask value never null"); This would restore the assert mentioned above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565265234 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565271912 From epeter at openjdk.org Mon Apr 15 07:10:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:10:50 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: On Tue, 5 Mar 2024 08:20:24 GMT, Bhavana Kilambi wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1740: >> >>> 1738: Node* value = nullptr; >>> 1739: if (mask == nullptr) { >>> 1740: assert(!is_masked_op, "Masked op needs the mask value never null"); >> >> This assert may be missed after your refactor. But it seems not really matter. > > Yes, the conditions of `mask != nullptr` should take care of that. It would not hurt to add an assert, see my other suggestion below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565269700 From epeter at openjdk.org Mon Apr 15 07:27:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:27:01 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Can you please fix the whitespace issues, it is a bit difficult to read now on GitHub ;) Issues with indentation Looking at `LoadVectorGatherNode::Identity`, I see that you first call `LoadVectorNode::Identity`, which actually is not defined, so it goes back to `LoadNode::Identity`. So far so good. This calls `MemNode::can_see_stored_value` to find a corresponding store which writes to the same address. And then it replaces our load with the input value of that store, so we can avoid the load. It seems that your code would now disallow such a case, because you always check that the `Ideal` node you get back is of the same type as the `this` node. Am I right about that? Is that intended? I think you checks should also not be done **after** we already create a new node, but ideally before we create the new node. That is the normal pattern I see everywhere. So then you would probably have to dig into `MemNode::can_see_stored_value` and other places, to see where you would need to do your additional checks. There are already some checks for vector-type there, so that would be one intuitive starting-point. I see that `LoadVectorNode` has no additional slots, but then `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` simply do `add_req`. Why not just compare these extra slots (as many as there are) in `LoadVectorNode::Identity`, as well as checking that their `Opcode` is the same? Would that not be simpler and accomplish the same? src/hotspot/share/opto/vectornode.cpp line 1148: > 1146: Node* StoreVectorMaskedNode::Identity(PhaseGVN* phase) { > 1147: Node* value = StoreVectorNode::Identity(phase); > 1148: if ((value != this) && (value->is_StoreVectorMasked()) && (in(MemNode::ValueIn + 1)->eqv_uncast(value->in(MemNode::ValueIn + 1)))) { `MemNode::ValueIn + 1` is there no direct value for it? If not: maybe create one? src/hotspot/share/opto/vectornode.cpp line 1166: > 1164: if ((value != this) && (value->is_LoadVectorGatherMasked()) && > 1165: (in(MemNode::ValueIn)->eqv_uncast(value->in(MemNode::ValueIn))) && > 1166: (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore)))) { Suggestion: if ((value != this) && (value->is_LoadVectorGatherMasked()) && (in(MemNode::ValueIn)->eqv_uncast(value->in(MemNode::ValueIn))) && (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore)))) { src/hotspot/share/opto/vectornode.cpp line 1176: > 1174: if ((value != this) && (value->is_StoreVectorScatterMasked()) && > 1175: (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore))) && > 1176: (in(MemNode::OffsetsMask)->eqv_uncast(value->in(MemNode::OffsetsMask)))) { Suggestion: if ((value != this) && (value->is_StoreVectorScatterMasked()) && (in(MemNode::OopStore)->eqv_uncast(value->in(MemNode::OopStore))) && (in(MemNode::OffsetsMask)->eqv_uncast(value->in(MemNode::OffsetsMask)))) { test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43: > 41: * @modules jdk.incubator.vector > 42: * > 43: * @run main/othervm -XX:UseAVX=3 compiler.vectorapi.VectorLoadGatherFoldingTest I would also make sure that you don't require platform specific things here. This could also run on any other platform that supports the Vector API, right? And you will need at least one run without any flags. test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43: > 41: * @modules jdk.incubator.vector > 42: * > 43: * @run main/othervm compiler.vectorapi.VectorGatherMaskFoldingTest Suggestion: * @run main compiler.vectorapi.VectorGatherMaskFoldingTest I think you only need the `othervm` if you also use flags in the `@run` statement. test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 46: > 44: */ > 45: > 46: public class VectorLoadGatherFoldingTest { I assume you will also do a similar test for `Store`? It would for example be nice to see IR tests that verify that some Stores/Loads are folded together (where it is ok), and others are not folded, because they have different masks. Would it make sense to make this a IR test? ------------- PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1951243369 Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1962674696 PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-1962719711 PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2011673208 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533442547 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540697985 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540697640 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533465824 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540692865 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1533464469 From dfenacci at openjdk.org Mon Apr 15 07:27:01 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Apr 2024 07:27:01 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Thu, 21 Mar 2024 08:46:27 GMT, Emanuel Peter wrote: > Can you please fix the whitespace issues, it is a bit difficult to read now on GitHub ;) Sorry, I let them slip in. Fixed. > I think you checks should also not be done **after** we already create a new node, but ideally before we create the new node. That is the normal pattern I see everywhere. So then you would probably have to dig into `MemNode::can_see_stored_value` and other places, to see where you would need to do your additional checks. There are already some checks for vector-type there, so that would be one intuitive starting-point. I've removed the `Identity` overrides and changed the checks in `MemNode::can_see_stored_value` as you suggested (`Identity` methods were also a bit too restrictive). I also had to add checks to `StoreNode::Identity` here https://github.com/openjdk/jdk/blob/7eb78e332080df3890b66ca04338a4ba69af45eb/src/hotspot/share/opto/memnode.cpp#L2799-L2814 > src/hotspot/share/opto/vectornode.cpp line 1148: > >> 1146: Node* StoreVectorMaskedNode::Identity(PhaseGVN* phase) { >> 1147: Node* value = StoreVectorNode::Identity(phase); >> 1148: if ((value != this) && (value->is_StoreVectorMasked()) && (in(MemNode::ValueIn + 1)->eqv_uncast(value->in(MemNode::ValueIn + 1)))) { > > `MemNode::ValueIn + 1` is there no direct value for it? If not: maybe create one? Thanks for the comments @eme64. I've created one more constant for the "+ 2" case (`OffsetsMask`) and reused the one there for the "+ 1" case (`OopStore`). > test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43: > >> 41: * @modules jdk.incubator.vector >> 42: * >> 43: * @run main/othervm -XX:UseAVX=3 compiler.vectorapi.VectorLoadGatherFoldingTest > > I would also make sure that you don't require platform specific things here. This could also run on any other platform that supports the Vector API, right? And you will need at least one run without any flags. You're right. I've relaxed the require (the max vector size shouldn't be relevant) and added `sve`. I'm not sure if there is a way to make it more generic (all platforms that support vectors?). > test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 46: > >> 44: */ >> 45: >> 46: public class VectorLoadGatherFoldingTest { > > I assume you will also do a similar test for `Store`? > It would for example be nice to see IR tests that verify that some Stores/Loads are folded together (where it is ok), and others are not folded, because they have different masks. > Would it make sense to make this a IR test? Yep, there are actually `Store` tests already but I forgot to adapt the name. Fixed now. I've tried to use the IR framework to check for folded/non folded nodes but couldn't make it reliable enough (in the end it didn't test more than the current test). So I decided to go back to the actual "regression" test which reproduces the original issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2011750958 PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2044298493 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540681271 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540682167 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540681799 From dfenacci at openjdk.org Mon Apr 15 07:27:01 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Apr 2024 07:27:01 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled Message-ID: # Issue When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. # Causes On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. # Solution `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. ------------- Commit messages: - JDK-8325520: fix load gather mask avx condition - JDK-8325520: fix tests for small species - JDK-8325520: add store tests - JDK-8325520: fix copyright notices - JDK-8325520: remove trailing whitespaces - JDK-8325520: use IR framework in tests - JDK-8325520: handle same offsets/masks in store identity - JDK-8325520: extend Store/LoadNode::Identity instead of overriding - JDK-8325520: fix test for no-vector case - Update src/hotspot/share/opto/vectornode.cpp - ... and 9 more: https://git.openjdk.org/jdk/compare/c5c866aa...e331b5c5 Changes: https://git.openjdk.org/jdk/pull/18347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325520 Stats: 1008 lines in 5 files changed: 1005 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From epeter at openjdk.org Mon Apr 15 07:27:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 09:14:57 GMT, Emanuel Peter wrote: > It seems that your code would now disallow such a case, because you always check that the Ideal node you get back is of the same type as the this node. Am I right about that? Is that intended? If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. > I've tried to use the IR framework to check for folded/non folded nodes but couldn't make it reliable enough (in the end it didn't test more than the current test). So I decided to go back to the actual "regression" test which reproduces the original issue. Do you know what is the issue with reliability for the IR rules? Why did it not always work? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2022281797 From dfenacci at openjdk.org Mon Apr 15 07:27:02 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 09:17:28 GMT, Emanuel Peter wrote: > If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. I've transformed the tests to add IR tests as well. The issue with them seems to be related with [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) (basically there are some missing cleanups when performing late inlining). So, for now the tests force a cleanup at every step (`-XX:+IncrementalInlineForceCleanup`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2044306155 From epeter at openjdk.org Mon Apr 15 07:27:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 07:20:39 GMT, Damon Fenacci wrote: >>> It seems that your code would now disallow such a case, because you always check that the Ideal node you get back is of the same type as the this node. Am I right about that? Is that intended? >> >> If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. >> >>> I've tried to use the IR framework to check for folded/non folded nodes but couldn't make it reliable enough (in the end it didn't test more than the current test). So I decided to go back to the actual "regression" test which reproduces the original issue. >> >> Do you know what is the issue with reliability for the IR rules? Why did it not always work? > >> If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. > > I've transformed the tests to add IR tests as well. The issue with them seems to be related with [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) (basically there are some missing cleanups when performing late inlining). So, for now the tests force a cleanup at every step (`-XX:+IncrementalInlineForceCleanup`). @dafedafe Nice, I think this already looks much better. Let me know if/when you want me to look at it again ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2055553058 From dfenacci at openjdk.org Mon Apr 15 07:27:02 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 07:20:39 GMT, Damon Fenacci wrote: >>> It seems that your code would now disallow such a case, because you always check that the Ideal node you get back is of the same type as the this node. Am I right about that? Is that intended? >> >> If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. >> >>> I've tried to use the IR framework to check for folded/non folded nodes but couldn't make it reliable enough (in the end it didn't test more than the current test). So I decided to go back to the actual "regression" test which reproduces the original issue. >> >> Do you know what is the issue with reliability for the IR rules? Why did it not always work? > >> If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually. > > I've transformed the tests to add IR tests as well. The issue with them seems to be related with [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) (basically there are some missing cleanups when performing late inlining). So, for now the tests force a cleanup at every step (`-XX:+IncrementalInlineForceCleanup`). > @dafedafe Nice, I think this already looks much better. Let me know if/when you want me to look at it again ;) Thanks a lot @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2055816672 From epeter at openjdk.org Mon Apr 15 07:27:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: <5B6paSAbvwgJ6QsSaKBrc_gWfXUnefRiuMOO43VWcL4=.89d2a2de-1046-4ff6-86b4-04d2b7ab71e0@github.com> On Wed, 27 Mar 2024 08:43:41 GMT, Damon Fenacci wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 43: >> >>> 41: * @modules jdk.incubator.vector >>> 42: * >>> 43: * @run main/othervm -XX:UseAVX=3 compiler.vectorapi.VectorLoadGatherFoldingTest >> >> I would also make sure that you don't require platform specific things here. This could also run on any other platform that supports the Vector API, right? And you will need at least one run without any flags. > > You're right. I've relaxed the require (the max vector size shouldn't be relevant) and added `sve`. I'm not sure if there is a way to make it more generic (all platforms that support vectors?). Do you even need to restrict the test to these platforms, or is it ok to run the tests on any platform? Because the Vector API can be used on any platform, we may just use scalar operation instead. But that needs to be tested as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540688549 From dfenacci at openjdk.org Mon Apr 15 07:27:02 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 15 Apr 2024 07:27:02 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: <5B6paSAbvwgJ6QsSaKBrc_gWfXUnefRiuMOO43VWcL4=.89d2a2de-1046-4ff6-86b4-04d2b7ab71e0@github.com> References: <5B6paSAbvwgJ6QsSaKBrc_gWfXUnefRiuMOO43VWcL4=.89d2a2de-1046-4ff6-86b4-04d2b7ab71e0@github.com> Message-ID: On Wed, 27 Mar 2024 08:49:03 GMT, Emanuel Peter wrote: >> You're right. I've relaxed the require (the max vector size shouldn't be relevant) and added `sve`. I'm not sure if there is a way to make it more generic (all platforms that support vectors?). > > Do you even need to restrict the test to these platforms, or is it ok to run the tests on any platform? > Because the Vector API can be used on any platform, we may just use scalar operation instead. But that needs to be tested as well. True. It might test the Vector API so often but in the end we need to test the scalar too. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1540697076 From chagedorn at openjdk.org Mon Apr 15 07:30:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Apr 2024 07:30:44 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! Overall a good improvement and makes it more intuitive. A few comments. src/hotspot/cpu/x86/x86.ad line 10010: > 10008: const MachNode* mask1 = static_cast(this->in(this->operand_index($src1))); > 10009: const MachNode* mask2 = static_cast(this->in(this->operand_index($src2))); > 10010: assert(Type::cmp(mask1->bottom_type(), mask2->bottom_type()), ""); While at it, you could add a message like "should be false" for good practice. src/hotspot/share/opto/node.cpp line 3014: > 3012: } > 3013: bool TypeNode::cmp(const Node& n) const { > 3014: return Type::cmp(_type, ((TypeNode&)n)._type); While at it, you can replace the cast with `as_Type()`: Suggestion: return Type::cmp(_type, (n.as_Type()->_type); src/hotspot/share/opto/type.cpp line 444: > 442: }; > 443: > 444: _shared_type_dict = new (shared_type_arena) Dict((CmpKey) type_cmp, (Hash) Type::uhash, shared_type_arena, 128); Couldn't you make `type_cmp` a `CmpKey` instead of `auto` and then remove the cast here? On the other hand, you could probably also just remove the `CmpKey` cast here but it might be more explicit when also changing the lambda type. src/hotspot/share/opto/type.hpp line 219: > 217: static const Type *make(enum TYPES); > 218: // Test for equivalence of types > 219: static int cmp( const Type *const t1, const Type *const t2 ); Was it required to remove the `consts` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/18533#pullrequestreview-2000181356 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565282766 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565287006 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565290581 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565293370 From redestad at openjdk.org Mon Apr 15 07:53:41 2024 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 15 Apr 2024 07:53:41 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:03:17 GMT, Thomas Stuefe wrote: > See JBS description. > > This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. Nice - thanks for doing this! ------------- Marked as reviewed by redestad (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18740#pullrequestreview-2000255710 From tholenstein at openjdk.org Mon Apr 15 07:58:02 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 15 Apr 2024 07:58:02 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Fix2: highlight nodes from imported tabs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/5be2c7d6..4fd0b3dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=23-24 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From epeter at openjdk.org Mon Apr 15 08:06:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 08:06:16 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v34] In-Reply-To: References: Message-ID: <6H2EcwspprCy-iXcjP68kKTXLBPQmg1Zul4yLV66NAU=.ce2ac7ec-064c-417c-bb3d-a05d23df49df@github.com> > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix copyright, add comment about RCE and RC smearing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/c85cce1d..d622c579 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=32-33 Stats: 20 lines in 2 files changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Mon Apr 15 08:56:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 08:56:44 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. @dafedafe Nice work! I left a few comments ;) I wonder if there cannot be some shared method now, since the code in both cases looks quite similar. src/hotspot/share/opto/memnode.cpp line 1173: > 1171: if (in_vt != out_vt) { > 1172: return nullptr; > 1173: } I see there is a vector type check here. Do we not need that for the code in `StoreNode::Identity`? "Normal" stores like `StoreB` and `StoreI` have the type implicit, but for vector nodes, this type is hidden in the `vect_type()`, so I suspect you need to check it. I imagine a scenario where we store a float-vector, and then read from the same address as int-vector. Is that ok, or would we need a ReinterpretCast node? I'm not sure, but it would be worth trying to create some tests to check that. src/hotspot/share/opto/memnode.cpp line 2834: > 2832: if (is_StoreVectorScatter()) { > 2833: const Node* offsets = as_StoreVectorScatter()->in(StoreVectorScatterNode::Offsets); > 2834: if (val->is_LoadVectorGather() && offsets->eqv_uncast(val->as_LoadVectorGather()->in(LoadVectorGatherNode::Offsets))) { Suggestion: const Node* offsets_store = as_StoreVectorScatter()->in(StoreVectorScatterNode::Offsets); const Node* offsets_load = val->as_LoadVectorGather()->in(LoadVectorGatherNode::Offsets); if (offsets_store->eqv_uncast(offsets_load)) { As explained below, `val->as_Load()->store_Opcode() == Opcode()` (with the fix I described) would imply that `val->is_LoadVectorGather()`. src/hotspot/share/opto/memnode.cpp line 2858: > 2856: } else { > 2857: result = mem; > 2858: } You already have the condition `val->as_Load()->store_Opcode() == Opcode()`. So once you have `is_StoreVectorScatter()`, I think `val->is_LoadVectorGather()` is implied, no? Oh wow. I think actually that this is another bug here: we only have `virtual int store_Opcode() const { return Op_StoreVector; }` for `LoadVectorNode`, but not for all masked/gather/scatter vector nodes! I think that should be fixed. That would also simplify your code. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2000347459 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1565402819 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1565385954 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1565382548 From epeter at openjdk.org Mon Apr 15 08:56:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 08:56:45 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 08:49:47 GMT, Emanuel Peter wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in >> https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 >> This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. >> >> # Solution >> `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. >> >> The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. >> >> Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. > > src/hotspot/share/opto/memnode.cpp line 1173: > >> 1171: if (in_vt != out_vt) { >> 1172: return nullptr; >> 1173: } > > I see there is a vector type check here. Do we not need that for the code in `StoreNode::Identity`? "Normal" stores like `StoreB` and `StoreI` have the type implicit, but for vector nodes, this type is hidden in the `vect_type()`, so I suspect you need to check it. > > I imagine a scenario where we store a float-vector, and then read from the same address as int-vector. Is that ok, or would we need a ReinterpretCast node? I'm not sure, but it would be worth trying to create some tests to check that. Can you actually do that: store a float-vector to an int-array? Or is that maybe only possible with Unsafe somehow? Or maybe completely impossible? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1565405665 From rcastanedalo at openjdk.org Mon Apr 15 09:03:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 09:03:43 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 07:58:02 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Fix2: highlight nodes from imported tabs There is an issue with saved difference graph states. If I open [diff.zip](https://github.com/openjdk/jdk/files/14976481/diff.zip) (which I just created by importing some graphs, opening one of them, and diffing it against another one), I get the following assertion error: [INFO] java.lang.AssertionError [INFO] at com.sun.hotspot.igv.util.RangeSliderModel.setPositions(RangeSliderModel.java:101) [INFO] at com.sun.hotspot.igv.coordinator.OutlineTopComponent.lambda$loadContext$2(OutlineTopComponent.java:481) [INFO] at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318) [INFO] at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773) [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720) [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714) [INFO] at java.base/java.security.AccessController.doPrivileged(AccessController.java:399) [INFO] at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86) [INFO] at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) [INFO] at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:136) [INFO] [catch] at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) [INFO] at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90) I imagine fully supporting saving and restoring the diff state would require quite a lot of additional complexity, both in IGV and in the XML files. Maybe this is not a very important use case, and we could just not support it? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-2000405455 From epeter at openjdk.org Mon Apr 15 09:40:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 09:40:40 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: <7cd6td6cZkTGrcJB0mxfuWES6qQy5XnlNQLzOZIAfCc=.08aa4a1a-8858-43d9-97e4-d5245f6f0fa9@github.com> On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Also: what if the `UseAPX` is enabled, but the hardware does not support the feature? Don't we usually then automatically disable the flag, if the feature is not present? We do that with `UseAVX` for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2056385811 From epeter at openjdk.org Mon Apr 15 09:40:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 09:40:41 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? And I agree with @dean-long : you need to have some test for this. At least some test should have the flag enabled. Something that stresses the registers, and verifies the results could be an idea. Also: I suspect you would want to put this flag into the IR-framework whitelist, since it has no effect on the IR, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2056391104 From epeter at openjdk.org Mon Apr 15 09:44:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 09:44:40 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Can the APX features be simulated, maybe even with SDE? Now you made the flag EXPERIMENTAL and by default false. What is the roadmap with this? It is generally not great to have default false flags, because the code underneath will just slowly rot and become broken. Is there a plan to eventually make it default true? What stops us from doing that already now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2056398463 From epeter at openjdk.org Mon Apr 15 09:57:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 09:57:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 07:15:14 GMT, Christian Hagedorn wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > src/hotspot/cpu/x86/x86.ad line 10010: > >> 10008: const MachNode* mask1 = static_cast(this->in(this->operand_index($src1))); >> 10009: const MachNode* mask2 = static_cast(this->in(this->operand_index($src2))); >> 10010: assert(Type::cmp(mask1->bottom_type(), mask2->bottom_type()), ""); > > While at it, you could add a message like "should be false" for good practice. Well, I would say it should be "types must be equal". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565503611 From epeter at openjdk.org Mon Apr 15 09:57:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 09:57:43 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! src/hotspot/share/opto/type.hpp line 219: > 217: static const Type *make(enum TYPES); > 218: // Test for equivalence of types > 219: static bool cmp(const Type* t1, const Type* t2); I wonder if this will be a bit confusing to people now. Usually, `cmp` methods tend to have an output that allows sorting, i.e. `0` for equals, and either larger or smaller than `0`, depending on which one is larger. For a `bool` return, the name `equals` would be more adequate. But this would also require lots of changes in the code-base, and maybe make backports harder. Actually, I'd be worried about backports now in general: Imagine someone now writes `if (Type::cmp(...))`. And someone else backports this, since the patch seems to cleanly apply. `if (Type::cmp(...))` is also correct C++ before your change. But the semantics now have changed: with your patch, the `if` succeeds if the types are equal. Without your patch, the if succeeds if the types are not equal. Yikes. This will lead to some very subtle bugs and annoying debugging in old JDK versions. Hence, I would think you have to rename the method to `equals` and change all usages accordingly. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565500661 From chagedorn at openjdk.org Mon Apr 15 10:24:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Apr 2024 10:24:42 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 09:53:11 GMT, Emanuel Peter wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > src/hotspot/share/opto/type.hpp line 219: > >> 217: static const Type *make(enum TYPES); >> 218: // Test for equivalence of types >> 219: static bool cmp(const Type* t1, const Type* t2); > > I wonder if this will be a bit confusing to people now. Usually, `cmp` methods tend to have an output that allows sorting, i.e. `0` for equals, and either larger or smaller than `0`, depending on which one is larger. > > For a `bool` return, the name `equals` would be more adequate. But this would also require lots of changes in the code-base, and maybe make backports harder. > > Actually, I'd be worried about backports now in general: > Imagine someone now writes `if (Type::cmp(...))`. And someone else backports this, since the patch seems to cleanly apply. `if (Type::cmp(...))` is also correct C++ before your change. > > But the semantics now have changed: with your patch, the `if` succeeds if the types are equal. Without your patch, the if succeeds if the types are not equal. Yikes. This will lead to some very subtle bugs and annoying debugging in old JDK versions. > > Hence, I would think you have to rename the method to `equals` and change all usages accordingly. > > What do you think? That's actually a good point about the backports. I was also wondering if `equals` is better due to the `cmp` convention of using ints but thought it's okay. But with backports in mind, it gives a stronger case to actually go with `equals` - so, I agree with your suggestion @eme64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565546151 From epeter at openjdk.org Mon Apr 15 10:33:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 10:33:44 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! src/hotspot/share/opto/type.hpp line 223: > 221: // Variant that drops the speculative part of the types > 222: bool higher_equal(const Type* t) const { > 223: return cmp(meet(t), t->remove_speculative()); Also: can you explain the comment above the method here? If it is true that `cmp` only gave back `0 or 1`, and now only a bool, then how does that tell us anything about "higher" or "lower" in the lattice? Are we not constrained to equal or not equal, where if we have not equal, we don't know if it is higher or lower? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1565557412 From bkilambi at openjdk.org Mon Apr 15 10:56:45 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 15 Apr 2024 10:56:45 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 06:56:48 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert to previous indentation > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2856: > >> 2854: %} >> 2855: >> 2856: // reduction addF > > Suggestion: > > > I think comment could be removed, it seems redundant. Well, under "Vector reduction add" title, we have sub titles for reductions of various types. For ex. "reduction addI" for integers, "reduction addL" for Longs etc. It might be good to follow this format and also keep "reduction addF" for floats and "reduction addD" for Doubles. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565585973 From epeter at openjdk.org Mon Apr 15 10:56:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 10:56:45 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 10:51:40 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 2856: >> >>> 2854: %} >>> 2855: >>> 2856: // reduction addF >> >> Suggestion: >> >> >> I think comment could be removed, it seems redundant. > > Well, under "Vector reduction add" title, we have sub titles for reductions of various types. For ex. "reduction addI" for integers, "reduction addL" for Longs etc. It might be good to follow this format and also keep "reduction addF" for floats and "reduction addD" for Doubles. What do you think? I would vote to eventually remove all of them, they really don't add value. But I don't care too much, so feel free to leave them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565588539 From bkilambi at openjdk.org Mon Apr 15 10:56:47 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 15 Apr 2024 10:56:47 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: References: Message-ID: <8JskXwV2uQd9XdsghcdUdIntfni3GeOlGgDrfeWakGE=.6c9d96b9-5cc2-4387-9cc4-8b67ccfb6acd@github.com> On Mon, 15 Apr 2024 06:58:32 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 2861: >> >>> 2859: // example, this rule can be reached from the VectorAPI (which allows for non-strictly ordered >>> 2860: // add reduction). >>> 2861: predicate(Matcher::vector_length(n->in(2)) == 2 && !n->as_Reduction()->requires_strict_order()); >> >> Would it make sense to change `reduce_add2F_neon` to something like `reduce_non_strict_order_add2F_neon`, just so that it is a bit clearer when one reads the opto-assembly output? > > Similarly, I would put `strict_order` for the cases where that applies. Hi, thank you for your comments. I will make these changes in the next PS. Maybe it's a good idea to add "non_strict_order" as not everyone might check the ad file to read comments and for a quick glance in an opto assembly output for example, it might be helpful to understand what these instructions mean. >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1739: >> >>> 1737: Node* init = ReductionNode::make_identity_con_scalar(gvn(), opc, elem_bt); >>> 1738: Node* value = opd; >>> 1739: >> >> Suggestion: >> >> >> assert(mask != nullptr || !is_masked_op, "Masked op needs the mask value never null"); > > This would restore the assert mentioned above. Thanks. Makes sense. I will update this in the next PS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565582824 PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565586373 From epeter at openjdk.org Mon Apr 15 10:56:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Apr 2024 10:56:47 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v5] In-Reply-To: <8JskXwV2uQd9XdsghcdUdIntfni3GeOlGgDrfeWakGE=.6c9d96b9-5cc2-4387-9cc4-8b67ccfb6acd@github.com> References: <8JskXwV2uQd9XdsghcdUdIntfni3GeOlGgDrfeWakGE=.6c9d96b9-5cc2-4387-9cc4-8b67ccfb6acd@github.com> Message-ID: On Mon, 15 Apr 2024 10:48:50 GMT, Bhavana Kilambi wrote: >> Similarly, I would put `strict_order` for the cases where that applies. > > Hi, thank you for your comments. I will make these changes in the next PS. Maybe it's a good idea to add "non_strict_order" as not everyone might check the ad file to read comments and for a quick glance in an opto assembly output for example, it might be helpful to understand what these instructions mean. Exactly! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1565585073 From galder at openjdk.org Mon Apr 15 12:31:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Apr 2024 12:31:59 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v2] In-Reply-To: References: Message-ID: <-7e1hzsSV0lNwIZvT6e0CNo9947mo_ZrJFct65az_kc=.b3fe2630-9048-44fc-8f71-4c44635f6859@github.com> > Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. > > It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. > > `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. > > I've run hotspot compiler tests successfully on x86_64. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Small IR test fixes * Fixed bug ID number. * Added test summary. * Removed unnecessary @requires. * Added @Check methods to verify optimizations return the expected result. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18738/files - new: https://git.openjdk.org/jdk/pull/18738/files/e8f64095..f3d20ced Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18738&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18738&range=00-01 Stats: 61 lines in 2 files changed: 57 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18738.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18738/head:pull/18738 PR: https://git.openjdk.org/jdk/pull/18738 From galder at openjdk.org Mon Apr 15 12:31:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Apr 2024 12:31:59 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v2] In-Reply-To: <9Vsbiejiv5vNDem0-33IMcdhxz4IMVRABDWM0nrh2eE=.5af9c9c7-d30c-4201-8a68-67096133cc7e@github.com> References: <9Vsbiejiv5vNDem0-33IMcdhxz4IMVRABDWM0nrh2eE=.5af9c9c7-d30c-4201-8a68-67096133cc7e@github.com> Message-ID: <8SpazGG3nFEcO6GZ22-lr5XFBdWWm6PCvCjg74BHYiA=.4311039d-d62d-4ef8-8212-88005fda9a1c@github.com> On Thu, 11 Apr 2024 13:13:10 GMT, Christian Hagedorn wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Small IR test fixes >> >> * Fixed bug ID number. >> * Added test summary. >> * Removed unnecessary @requires. >> * Added @Check methods to verify optimizations return the expected result. > > Otherwise, the fix looks good! @chhagedorn I've addressed all your comments in the latest commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18738#issuecomment-2056734706 From rcastanedalo at openjdk.org Mon Apr 15 14:34:08 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 14:34:08 GMT Subject: RFR: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes Message-ID: This (trivial?) cleanup reuses `MemNode::barrier_data()` (added recently by [JDK-8322692](https://bugs.openjdk.org/browse/JDK-8322692)) to compute the GC barrier data to be transferred from Ideal nodes to their corresponding Mach nodes in `Matcher::ReduceInst()`. **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). ------------- Commit messages: - Simplify transfer of GC barrier data from Ideal to Mach nodes Changes: https://git.openjdk.org/jdk/pull/18784/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18784&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330262 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18784/head:pull/18784 PR: https://git.openjdk.org/jdk/pull/18784 From eosterlund at openjdk.org Mon Apr 15 14:38:59 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 Apr 2024 14:38:59 GMT Subject: RFR: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 12:49:31 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) cleanup reuses `MemNode::barrier_data()` (added recently by [JDK-8322692](https://bugs.openjdk.org/browse/JDK-8322692)) to compute the GC barrier data to be transferred from Ideal nodes to their corresponding Mach nodes in `Matcher::ReduceInst()`. > > **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18784#pullrequestreview-2001297725 From rcastanedalo at openjdk.org Mon Apr 15 14:52:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Apr 2024 14:52:02 GMT Subject: RFR: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 14:36:15 GMT, Erik ?sterlund wrote: > Looks good. Thanks for reviewing, Erik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18784#issuecomment-2057046095 From stuefe at openjdk.org Mon Apr 15 15:03:46 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 15 Apr 2024 15:03:46 GMT Subject: Integrated: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 13:03:17 GMT, Thomas Stuefe wrote: > See JBS description. > > This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. This pull request has now been integrated. Changeset: ddc3921c Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/ddc3921cf98b9470f597ae9bb6a4f5a043e1544f Stats: 31 lines in 1 file changed: 19 ins; 3 del; 9 mod 8330103: Compiler memory statistics should keep separate records for C1 and C2 Reviewed-by: kvn, redestad ------------- PR: https://git.openjdk.org/jdk/pull/18740 From stuefe at openjdk.org Mon Apr 15 15:03:45 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 15 Apr 2024 15:03:45 GMT Subject: RFR: 8330103: Compiler memory statistics should keep separate records for C1 and C2 In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 07:51:23 GMT, Claes Redestad wrote: >> See JBS description. >> >> This simple enhancement changes compiler memory statistic such that we keep the record of the most recent compilation not only per method, but per method and per compiler. That way, recompiling method X with C2 will not overwrite memory statistics from a prior C1 compilation, and vice versa. > > Nice - thanks for doing this! Thanks @cl4es ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18740#issuecomment-2057072418 From galder at openjdk.org Mon Apr 15 15:49:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Apr 2024 15:49:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v8] In-Reply-To: References: Message-ID: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with four additional commits since the last revision: - Peek receiver without pop/push - require receiver_klass to be loaded for now - missing check for unloaded - suggested cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/b7ff1e6c..3af9cd9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=06-07 Stats: 37 lines in 1 file changed: 21 ins; 12 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Mon Apr 15 15:49:18 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 15 Apr 2024 15:49:18 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v7] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 17:41:44 GMT, Dean Long wrote: >>> Sure, feel free to improve it. >> >> I've just commented in https://github.com/openjdk/jdk/pull/18642/files#r1560900520, what do you think? >> >>> I fixed the issue I had, so it should be good to test out now. Thanks. >> >> Ok, so how would we integrate your changes? Do I just merge the commits in your https://github.com/dean-long/jdk/tree/pr/17667 branch? > >> Ok, so how would we integrate your changes? Do I just merge the commits in your https://github.com/dean-long/jdk/tree/pr/17667 branch? > > Yes, I think that should work. If that doesn't add me as a contributor, then I can be added manually. @dean-long I've added your commits on top of the PR, and added an additional commit to include https://github.com/openjdk/jdk/pull/18642#discussion_r1560900520. I've run compiler tests limiting to C1 on x86_64 and things looked good. I didn't test other architectures because the additional changes were limited to the shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2057175595 From duke at openjdk.org Mon Apr 15 15:53:06 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 15 Apr 2024 15:53:06 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v14] In-Reply-To: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> References: <0RAuJpGYev-zLd52TE7PCDkxPoXrRT0RNEzzepwVMhc=.46718148-27a3-4741-a09a-021caae72f9f@github.com> Message-ID: On Mon, 25 Mar 2024 06:19:42 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into licm >> - @run driver -> @run main >> - Add tests for add/sub reassociation >> - Merge branch 'master' into licm >> - Make inputs deterministic. Make size an arg. Fix comments. Formatting. >> - Update test to utilize @setup method for arguments >> - Merge branch 'master' into licm >> - Add correctness test for some random tests with random inputs >> - Add some correctness tests where we do reassociate >> - Remove unused TestInfo parameter. Have some tests exit mid-loop. >> - ... and 7 more: https://git.openjdk.org/jdk/compare/4f1aac95...32cb9c0d > > Code looks good, running testing now... Ping me again in 2 days if I don't report back by then ;) @eme64 Thanks for working through this with me. Could you sponsor this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-2057183390 From duke at openjdk.org Mon Apr 15 15:57:55 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 15 Apr 2024 15:57:55 GMT Subject: Integrated: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:41:53 GMT, Joshua Cao wrote: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. This pull request has now been integrated. Changeset: 140f5671 Author: Joshua Cao Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/140f56718bbbfc31bb0c39255c68568fad285a1f Stats: 997 lines in 4 files changed: 976 ins; 3 del; 18 mod 8323220: Reassociate loop invariants involved in Cmps and Add/Subs Reviewed-by: epeter, xliu, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17375 From kvn at openjdk.org Mon Apr 15 16:54:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 16:54:01 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() [v2] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 14:58:56 GMT, Roland Westrelin wrote: >> Another set of changes from 8275202. There are cases in superword >> where new nodes are not assigned control. I believe they are harmless >> currently because superword is the last pass of optimizations. I also >> cleaned up the code so it always uses `register_new_node()`. There are >> a couple places where `intcon()` should be used. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18760#pullrequestreview-2001690210 From kvn at openjdk.org Mon Apr 15 17:09:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 17:09:04 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 06:49:43 GMT, Emanuel Peter wrote: >>> > ``` >>> > * No RangeCheck smearing, or other CFG between the stores: `RC[0], store[0], RC[1], store[1], RC[2], store[2], RC[3], store[3]`. Not so simple. We can merge the 4 stores on the normal path, where all RC's pass. But we have to remove all old stores from that path. But the `RC[1], RC[2], RC[3]` false paths need some of those stores. So the only way I see is to duplicate all stores for the branches, so that we are sure that they sink out into the trap-paths. >>> > ``` >>> >>> I also think you need to duplicate stores. My opinion is that we want to stick with the simpler cases (your first and second bullets) unless it's obvious it doesn't cover all use cases. It's always possible to revisit the optimization down the road if it's observed that there are cases that are not covered. >> >> I completely agree with Roland. > > @vnkozlov >> Can we detect presence of RangeCheck which may cause us to move some stores on fail path and bailout the optimization. I don't think it is frequent case. I assume you will get RC on each store or not at all ("main" part of counted loop). Am I wrong here? I don't remember, does C2 optimize RangeCheck nodes in linear code (it does in loops)? > > I know about 2 relevant optimizations that remove / move RangeChecks: > - RCE (RangeCheck Elimination from loops): hoist all RangeCheck before the loop. That way, there are no RangeChecks left in the loop, and there would be no RangeChecks between the stores we are merging. > - RangeCheck Smearing: this also applies in straight-line code, outside of loops. See `RangeCheckNode::Ideal`. Example: > > RangeCheck[i+0] > Store[i+0] > RangeCheck[i+1] <--- replaced with i+3 ("smearing" to cover all RC below) > Store[i+1] > RangeCheck[i+2] <--- removed > Store[i+2] > RangeCheck[i+3] <--- removed > Store[i+3] > > becomes: > > RangeCheck[i+0] > Store[i+0] > RangeCheck[i+3] <--- the RangeCheck that remains between the first and the rest of the consecutive (and adjacent) stores. > Store[i+1] > Store[i+2] > Store[i+3] > > I think the use-cases from @cl4es are often in straight-line code. Therefore we should cover the "smearing" case where exactly 1 RC remains in the sequence. > > What you can also see in `RangeCheckNode::Ideal`: if we ever trap (or often enough, I don't remember) in one of the RangeChecks, then we disable `phase->C->allow_range_check_smearing()`. Then we don't do the smearing, and all the RC remain in the sequence. At that point, my optimization would fail since it sees more than 1 RC in the sequence. > > Does that make sense? I should probably add this information in the comments, so that it is clear why we worry about a single RC at all. People are probably going to wonder like you: "I assume you will get RC on each store or not at all". @eme64 thank you for looking on C2 RC optimizations. Now it is clear why you need to check for RC. I would only suggest to adjust your new comment about TC optimization to avoid confusion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2057411419 From kvn at openjdk.org Mon Apr 15 17:09:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 17:09:04 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v34] In-Reply-To: <6H2EcwspprCy-iXcjP68kKTXLBPQmg1Zul4yLV66NAU=.ce2ac7ec-064c-417c-bb3d-a05d23df49df@github.com> References: <6H2EcwspprCy-iXcjP68kKTXLBPQmg1Zul4yLV66NAU=.ce2ac7ec-064c-417c-bb3d-a05d23df49df@github.com> Message-ID: On Mon, 15 Apr 2024 08:06:16 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright, add comment about RCE and RC smearing src/hotspot/share/opto/memnode.cpp line 2898: > 2896: // > 2897: // Thus, it is a common pattern that in a long chain of adjacent stores there > 2898: // remains exactly one RangeCheck, between the first and the second store. `remains exactly one RangeCheck` is confusing because you still have RC for `[i + 0]`. So you have 2 RCs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1566169996 From kvn at openjdk.org Mon Apr 15 17:11:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 17:11:41 GMT Subject: RFR: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 12:49:31 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) cleanup reuses `MemNode::barrier_data()` (added recently by [JDK-8322692](https://bugs.openjdk.org/browse/JDK-8322692)) to compute the GC barrier data to be transferred from Ideal nodes to their corresponding Mach nodes in `Matcher::ReduceInst()`. > > **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18784#pullrequestreview-2001738869 From kvn at openjdk.org Mon Apr 15 17:47:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 17:47:01 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:41:14 GMT, Vladimir Kozlov wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > src/hotspot/cpu/x86/globals_x86.hpp line 236: > >> 234: "mitigations for the Intel JCC erratum") \ >> 235: \ >> 236: product(bool, UseAPX, false, EXPERIMENTAL, \ > > Spacing to `` @steveatgh, do you plan to add code to `vm_version_x86.*` to check for presence of AVX and setting this flag accordingly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1566220173 From kvn at openjdk.org Mon Apr 15 17:47:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Apr 2024 17:47:01 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:01:17 GMT, Steve Dohrmann wrote: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. I have few comments. src/hotspot/cpu/x86/assembler_x86.cpp line 669: > 667: // [base + disp] > 668: assert(((base_enc & 0x7) != 4), "illegal addressing mode"); > 669: if (disp == 0 && no_relocation && ((base_enc & 0x7) != 5)) { We loost information with this change. Can it be done as `is_r13_encoding(base_enc)` and `is_r12_enxoding(base_enc)`? src/hotspot/cpu/x86/assembler_x86.hpp line 790: > 788: int vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, > 789: VexSimdPrefix pre, VexOpcode opc, > 790: InstructionAttr *attributes, bool src_is_gpr = false); I saw a lot of usage of this method with `src_is_gpr` is `true`. What is actual most common case? src/hotspot/cpu/x86/assembler_x86.hpp line 796: > 794: > 795: int simd_prefix_and_encode(XMMRegister dst, XMMRegister nds, XMMRegister src, VexSimdPrefix pre, > 796: VexOpcode opc, InstructionAttr *attributes, bool src_is_gpr = false); Same question as for `vex_prefix_and_encode` src/hotspot/cpu/x86/globals_x86.hpp line 236: > 234: "mitigations for the Intel JCC erratum") \ > 235: \ > 236: product(bool, UseAPX, false, EXPERIMENTAL, \ Spacing to `` ------------- PR Review: https://git.openjdk.org/jdk/pull/18476#pullrequestreview-2001791276 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1566209314 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1566215878 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1566216494 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1566217674 From dlong at openjdk.org Mon Apr 15 20:21:02 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 15 Apr 2024 20:21:02 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v8] In-Reply-To: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> References: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> Message-ID: On Mon, 15 Apr 2024 15:49:18 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with four additional commits since the last revision: > > - Peek receiver without pop/push > - require receiver_klass to be loaded for now > - missing check for unloaded > - suggested cleanup Marked as reviewed by dlong (Reviewer). It would be a good idea to ask arm/ppc/riscv/s390 port maintainers to test your changes. Also, I don't see GHA checks running. Are they enabled in your repo? ------------- PR Review: https://git.openjdk.org/jdk/pull/17667#pullrequestreview-2002075529 PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2057729138 From joshcao at amazon.com Mon Apr 15 21:02:32 2024 From: joshcao at amazon.com (Cao, Joshua) Date: Mon, 15 Apr 2024 21:02:32 +0000 Subject: NoClassDefFoundError when using compiler replay mechanism for JDK-8329797 Message-ID: <8afd46e5d0264365a8de42cea7b39585@amazon.com> I am trying to reproduce the error in https://bugs.openjdk.org/browse/JDK-8329797 using the given replay file. I get the following error: ``` dev-dsk-joshcao-2b-df0d3645 [home/joshcao/src/dacapobench/release]$ ~/jdk/shenandoah/build/linux-x86_64-server-slowdebug/images/jdk/bin/java -cp dacapo-23.8-chopin-RC1/jar/lib/h2/h2-2.1.214.jar -XX:+UseShenandoahGC -XX:+ReplayCompiles -XX:ReplayDataFile=maxlreplay.log -jar dacapo-23.8-chopin-RC1.jar h2 java.lang.NoClassDefFoundError: org/h2/index/Index Error while parsing line 6 at position 33: org/h2/index/Index java.lang.NoClassDefFoundError: org/h2/index/Index Caused by: java.lang.ClassNotFoundException: org.h2.index.Index at jdk.internal.loader.BuiltinClassLoader.loadClass(java.base at 23-internal/BuiltinClassLoader.java:641) at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base at 23-internal/ClassLoaders.java:188) at java.lang.ClassLoader.loadClass(java.base at 23-internal/ClassLoader.java:528) Failed on org/h2/index/Index ``` I got the dacapo jar file from https://github.com/dacapobench/dacapobench/releases/tag/v23.9-RC1-chopin. The `h2` classes are located in `dacapo-23.8-chopin-RC1/jar/lib/h2/h2-2.1.214.jar`. I guess the replay mechanism cannot find those classes. Any tips on how to fix the `NoClassDefFoundError`? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmesnik at openjdk.org Mon Apr 15 23:07:04 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 15 Apr 2024 23:07:04 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v8] In-Reply-To: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> References: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> Message-ID: On Mon, 15 Apr 2024 15:49:18 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with four additional commits since the last revision: > > - Peek receiver without pop/push > - require receiver_klass to be loaded for now > - missing check for unloaded > - suggested cleanup Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/compiler/c1/TestNullArrayClone.java line 39: > 37: public class TestNullArrayClone { > 38: public static void main(String[] args) > 39: { please change style to use public static void main(String[] args) { instead of public static void main(String[] args) { in the test code. test/hotspot/jtreg/compiler/c1/TestNullArrayClone.java line 56: > 54: test(null); > 55: System.out.println("Expected NullPointerException to be thrown"); > 56: System.exit(97); Please throw RuntimeException instead of System.exit(). ------------- PR Review: https://git.openjdk.org/jdk/pull/17667#pullrequestreview-2002311597 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1566523477 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1566522981 From sviswanathan at openjdk.org Mon Apr 15 23:54:59 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 15 Apr 2024 23:54:59 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 03:57:10 GMT, Jatin Bhateja wrote: > - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. > - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. > - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. > - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 1040: > 1038: > 1039: bool mismatched_ms = from_ms->get_con() && !is_mask && arr_type != nullptr && arr_type->elem()->array_element_basic_type() != elem_bt; > 1040: BasicType mem_elem_bt = LITTLE_ENDIAN_ONLY(elem_bt) BIG_ENDIAN_ONLY(arr_type->elem()->array_element_basic_type()); mismatched_ms check is missing now for BIG_ENDIAN_ONLY path ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18749#discussion_r1566554997 From jkarthikeyan at openjdk.org Tue Apr 16 01:07:00 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 01:07:00 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 10:22:00 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/type.hpp line 219: >> >>> 217: static const Type *make(enum TYPES); >>> 218: // Test for equivalence of types >>> 219: static bool cmp(const Type* t1, const Type* t2); >> >> I wonder if this will be a bit confusing to people now. Usually, `cmp` methods tend to have an output that allows sorting, i.e. `0` for equals, and either larger or smaller than `0`, depending on which one is larger. >> >> For a `bool` return, the name `equals` would be more adequate. But this would also require lots of changes in the code-base, and maybe make backports harder. >> >> Actually, I'd be worried about backports now in general: >> Imagine someone now writes `if (Type::cmp(...))`. And someone else backports this, since the patch seems to cleanly apply. `if (Type::cmp(...))` is also correct C++ before your change. >> >> But the semantics now have changed: with your patch, the `if` succeeds if the types are equal. Without your patch, the if succeeds if the types are not equal. Yikes. This will lead to some very subtle bugs and annoying debugging in old JDK versions. >> >> Hence, I would think you have to rename the method to `equals` and change all usages accordingly. >> >> What do you think? > > That's actually a good point about the backports. I was also wondering if `equals` is better due to the `cmp` convention of using ints but thought it's okay. But with backports in mind, it gives a stronger case to actually go with `equals` - so, I agree with your suggestion @eme64. This is a good point, I hadn't thought about backports! In the top comment I was also thinking about naming it `Type::equals`, but I worried that it'd be too close to `Type::eq`. But with this detail I think it should be renamed as well, because of the change of semantics. That way at least it'll cause a compile error instead of silently breaking at runtime, which would be bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1566589411 From jkarthikeyan at openjdk.org Tue Apr 16 01:15:06 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 01:15:06 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 07:24:59 GMT, Christian Hagedorn wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > src/hotspot/share/opto/type.hpp line 219: > >> 217: static const Type *make(enum TYPES); >> 218: // Test for equivalence of types >> 219: static int cmp( const Type *const t1, const Type *const t2 ); > > Was it required to remove the `consts` here? Since the second `const` is referring to the `t1/t2` it means that the parameters can't be modified, which IIRC in the header would be functionally the same as not having the additional const. I changed it in type.cpp as well because looking at other parts of the code `const Type* val` is used a lot more often than `const Type* const val` for arguments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1566592983 From jkarthikeyan at openjdk.org Tue Apr 16 02:21:24 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 02:21:24 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Rename to Type::equals, changes from code review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18533/files - new: https://git.openjdk.org/jdk/pull/18533/files/63c17d53..46a4f3fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18533&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18533&range=00-01 Stats: 21 lines in 9 files changed: 0 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/18533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18533/head:pull/18533 PR: https://git.openjdk.org/jdk/pull/18533 From jkarthikeyan at openjdk.org Tue Apr 16 02:21:24 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 02:21:24 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 07:27:59 GMT, Christian Hagedorn wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to Type::equals, changes from code review > > Overall a good improvement and makes it more intuitive. A few comments. Thanks for the comments @chhagedorn and @eme64! I've pushed a commit that should address the points brought up in review, and renamed the function to `Type::equals`. > src/hotspot/share/opto/node.cpp line 3014: > >> 3012: } >> 3013: bool TypeNode::cmp(const Node& n) const { >> 3014: return Type::cmp(_type, ((TypeNode&)n)._type); > > While at it, you can replace the cast with `as_Type()`: > Suggestion: > > return Type::cmp(_type, (n.as_Type()->_type); Thanks for the suggestion! I've made this change. > src/hotspot/share/opto/type.cpp line 444: > >> 442: }; >> 443: >> 444: _shared_type_dict = new (shared_type_arena) Dict((CmpKey) type_cmp, (Hash) Type::uhash, shared_type_arena, 128); > > Couldn't you make `type_cmp` a `CmpKey` instead of `auto` and then remove the cast here? On the other hand, you could probably also just remove the `CmpKey` cast here but it might be more explicit when also changing the lambda type. I think that's a good idea, I've changed the type of the lambda to be `CmpKey` to make it clearer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18533#issuecomment-2058112906 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1566624863 PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1566625398 From jkarthikeyan at openjdk.org Tue Apr 16 02:21:24 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 02:21:24 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 09:54:57 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/x86.ad line 10010: >> >>> 10008: const MachNode* mask1 = static_cast(this->in(this->operand_index($src1))); >>> 10009: const MachNode* mask2 = static_cast(this->in(this->operand_index($src2))); >>> 10010: assert(Type::cmp(mask1->bottom_type(), mask2->bottom_type()), ""); >> >> While at it, you could add a message like "should be false" for good practice. > > Well, I would say it should be "types must be equal". This is a good idea, I've made this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1566624263 From chagedorn at openjdk.org Tue Apr 16 07:00:00 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Apr 2024 07:00:00 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v2] In-Reply-To: <-7e1hzsSV0lNwIZvT6e0CNo9947mo_ZrJFct65az_kc=.b3fe2630-9048-44fc-8f71-4c44635f6859@github.com> References: <-7e1hzsSV0lNwIZvT6e0CNo9947mo_ZrJFct65az_kc=.b3fe2630-9048-44fc-8f71-4c44635f6859@github.com> Message-ID: On Mon, 15 Apr 2024 12:31:59 GMT, Galder Zamarre?o wrote: >> Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. >> >> It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. >> >> `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. >> >> I've run hotspot compiler tests successfully on x86_64. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Small IR test fixes > > * Fixed bug ID number. > * Added test summary. > * Removed unnecessary @requires. > * Added @Check methods to verify optimizations return the expected result. Thanks for the update, looks good! I'll submit some testing. test/hotspot/jtreg/compiler/vectorization/runner/BasicDoubleOpTest.java line 243: > 241: @Test > 242: @IR(applyIfCPUFeatureOr = {"asimd", "true", "avx", "true"}, > 243: counts = {IRNode.MAX_VD, "0"}) Can be changed to `failOn` now. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18738#pullrequestreview-2002757165 PR Review Comment: https://git.openjdk.org/jdk/pull/18738#discussion_r1566816505 From epeter at openjdk.org Tue Apr 16 07:05:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Apr 2024 07:05:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve RC comment for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/d622c579..93bf2ddc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=33-34 Stats: 8 lines in 1 file changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From rcastanedalo at openjdk.org Tue Apr 16 07:18:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Apr 2024 07:18:00 GMT Subject: RFR: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:09:19 GMT, Vladimir Kozlov wrote: > Good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18784#issuecomment-2058395431 From dean.long at oracle.com Tue Apr 16 07:32:36 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 16 Apr 2024 00:32:36 -0700 Subject: NoClassDefFoundError when using compiler replay mechanism for JDK-8329797 In-Reply-To: <8afd46e5d0264365a8de42cea7b39585@amazon.com> References: <8afd46e5d0264365a8de42cea7b39585@amazon.com> Message-ID: <2dab7c7e-0537-4558-9df5-4745138ce31f@oracle.com> Replay doesn't run the app, so you'll need to extract all the embedded jar files that it needs and add them to the -cp path. dl On 4/15/24 2:02 PM, Cao, Joshua wrote: > I am trying to reproduce the error in > https://bugs.openjdk.org/browse/JDK-8329797 using the given replay > file. I get > the following error: > > ``` > dev-dsk-joshcao-2b-df0d3645 [home/joshcao/src/dacapobench/release]$ > ~/jdk/shenandoah/build/linux-x86_64-server-slowdebug/images/jdk/bin/java > -cp dacapo-23.8-chopin-RC1/jar/lib/h2/h2-2.1.214.jar > -XX:+UseShenandoahGC -XX:+ReplayCompiles > -XX:ReplayDataFile=maxlreplay.log -jar dacapo-23.8-chopin-RC1.jar h2 > java.lang.NoClassDefFoundError: org/h2/index/Index > Error while parsing line 6 at position 33: org/h2/index/Index > > java.lang.NoClassDefFoundError: org/h2/index/Index > Caused by: java.lang.ClassNotFoundException: org.h2.index.Index > ? ? ? ? at > jdk.internal.loader.BuiltinClassLoader.loadClass(java.base at 23-internal/BuiltinClassLoader.java:641) > ? ? ? ? at > jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(java.base at 23-internal/ClassLoaders.java:188) > ? ? ? ? at > java.lang.ClassLoader.loadClass(java.base at 23-internal/ClassLoader.java:528) > > Failed on org/h2/index/Index > ``` > > I got the dacapo jar file from > https://github.com/dacapobench/dacapobench/releases/tag/v23.9-RC1-chopin. > The > `h2` classes are located in > `dacapo-23.8-chopin-RC1/jar/lib/h2/h2-2.1.214.jar`. > I guess the replay mechanism cannot find those classes. > > > Any tips on how to fix the `NoClassDefFoundError`? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rehn at openjdk.org Tue Apr 16 10:22:02 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 16 Apr 2024 10:22:02 GMT Subject: RFR: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 In-Reply-To: References: Message-ID: On Sun, 14 Apr 2024 07:58:48 GMT, Feilong Jiang wrote: > Hi, please review this fix that adds additional CMove match rules for the riscv port. > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. > > After this fix, the following five tests would pass without any crashes. > > Testing: > - [x] compiler/eliminateAutobox/TestDoubleBoxing.java > - [x] compiler/eliminateAutobox/TestFloatBoxing.java > - [x] compiler/eliminateAutobox/TestLongBoxing.java > - [x] compiler/eliminateAutobox/TestIntBoxing.java > - [x] compiler/eliminateAutobox/TestShortBoxing.java > - [x] tier1~3 (linux-riscv64, release) > - [x] hotspot:tier1 (linux-riscv64, fastdebug) Seems fine to me. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18774#pullrequestreview-2003228523 From epeter at openjdk.org Tue Apr 16 11:00:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Apr 2024 11:00:01 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 02:21:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Rename to Type::equals, changes from code review src/hotspot/share/opto/type.hpp line 160: > 158: > 159: static int uhash( const Type *const t ); > 160: // Structural equality check. Assumes that equals() has already compared Is this even correct? Because we use `eq` inside `equals`, (before your patch, we used `eq` inside `cmp`). Should this not rather mean that we should have already done the `==`? What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1567155743 From epeter at openjdk.org Tue Apr 16 11:03:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Apr 2024 11:03:43 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 02:17:47 GMT, Jasmine Karthikeyan wrote: >> Overall a good improvement and makes it more intuitive. A few comments. > > Thanks for the comments @chhagedorn and @eme64! I've pushed a commit that should address the points brought up in review, and renamed the function to `Type::equals`. @jaskarth this looks good. I am running testing again now. @merykitty do you have an opinion on this? You have done quite some work on types. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18533#issuecomment-2058819203 From tholenstein at openjdk.org Tue Apr 16 13:39:18 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Apr 2024 13:39:18 GMT Subject: RFR: 8324950: IGV: save the state to a file [v26] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: open tabs only after all graphs are loaded into group ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/4fd0b3dc..8fb022aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=24-25 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From jkarthikeyan at openjdk.org Tue Apr 16 13:56:04 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 16 Apr 2024 13:56:04 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 10:56:56 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to Type::equals, changes from code review > > src/hotspot/share/opto/type.hpp line 160: > >> 158: >> 159: static int uhash( const Type *const t ); >> 160: // Structural equality check. Assumes that equals() has already compared > > Is this even correct? Because we use `eq` inside `equals`, (before your patch, we used `eq` inside `cmp`). > Should this not rather mean that we should have already done the `==`? > What do you think? I think it's still correct, because from my understanding it's referring to this line in `equals`: ```c++ if (t1->_base != t2->_base) { return false; // Missed badly } `equals` only calls `eq`after this, so `eq` knows that the underlying type of both classes are the same. So `TypeInt::eq`, for example, is able to safely cast the incoming `Type* t` to a `TypeInt*` to do a structural equality check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18533#discussion_r1567406437 From roland at openjdk.org Tue Apr 16 14:14:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:14:03 GMT Subject: RFR: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() [v2] In-Reply-To: References: Message-ID: <0WXvfAQzvx3t87SBKSxbb_rhSzgEHqd3Ur6JyRcwbw4=.1559f0cc-2f19-493f-b180-9e0839a263b2@github.com> On Mon, 15 Apr 2024 06:43:02 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > That looks good to me, thanks for additionally updating other uses of this pattern! @chhagedorn @vnkozlov thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18760#issuecomment-2059190775 From roland at openjdk.org Tue Apr 16 14:14:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:14:03 GMT Subject: Integrated: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 12:26:10 GMT, Roland Westrelin wrote: > Another set of changes from 8275202. There are cases in superword > where new nodes are not assigned control. I believe they are harmless > currently because superword is the last pass of optimizations. I also > cleaned up the code so it always uses `register_new_node()`. There are > a couple places where `intcon()` should be used. This pull request has now been integrated. Changeset: bfff02ee Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/bfff02eef68c80f623419a3f6ceb9fe3121b88f4 Stats: 51 lines in 5 files changed: 4 ins; 17 del; 30 mod 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node() Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18760 From roland at openjdk.org Tue Apr 16 14:14:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:14:47 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <079UKoZpe1eMTvJIIuKDqh6uSf1g3Cu8dUm5hkVddnA=.2cf0c555-5c29-4891-b4b6-866c11d7f080@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> <079UKoZpe1eMTvJIIuKDqh6uSf1g3Cu8dUm5hkVddnA=.2cf0c555-5c29-4891-b4b6-866c11d7f080@github.com> Message-ID: <2J2xtBtH3BYPMCvnnO67L44uvfUyBDtXdV12CgvfEeY=.048edff8-077d-4ece-9250-163486bc7a0e@github.com> On Fri, 12 Apr 2024 20:09:22 GMT, Quan Anh Mai wrote: > Can we check `Identity` during `PhaseCCP` instead? I see other inferences such as `AndINode` that may benefit from it. Can you give more details on the cases that are not covered by CCP currently? Can `Value` be extended instead of changing CCP? In any case, that would be out of the scope of this change, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2059196164 From roland at openjdk.org Tue Apr 16 14:17:03 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:17:03 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:24:54 GMT, Emanuel Peter wrote: >> Actually, shouldn't I have kept `-XX:-BackgroundCompilation` for this one? > > I think it would be great to have one run with absolutely no flags. @eme64 can you please explain why a run without flags make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1567441396 From roland at openjdk.org Tue Apr 16 14:43:21 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:43:21 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: Message-ID: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into JDK-8320649 - review - test fix - test fix - Merge branch 'master' into JDK-8320649 - whitespaces - review - Merge branch 'master' into JDK-8320649 - review - 32 bit build fix - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e ------------- Changes: https://git.openjdk.org/jdk/pull/16966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=14 Stats: 2682 lines in 39 files changed: 2612 ins; 29 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Tue Apr 16 14:49:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:49:06 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Thu, 28 Mar 2024 09:53:41 GMT, Emanuel Peter wrote: >>> @rwestrel Great, yes just launched it. Feel free to ask in a day or 2 if I don't report back by then! >> >> @eme64 any update on testing? > > @rwestrel thanks for asking. About 10% seems to still be scheduled and have not completed, on `macosx-x64`. But the rest seems fine. I'll re-review next week :) @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2059274785 From roland at openjdk.org Tue Apr 16 14:49:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 16 Apr 2024 14:49:06 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Thu, 4 Apr 2024 13:34:35 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> test fix > > src/hotspot/share/opto/type.cpp line 617: > >> 615: TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0); >> 616: >> 617: const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); > > Suggestion: > > const Type** fgetfromcache = (const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); Fixed in latest commit > src/hotspot/share/opto/type.cpp line 622: > >> 620: fgetfromcache[2] = TypeAryPtr::OOPS; >> 621: TypeTuple::make(3, fgetfromcache); >> 622: const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); > > Suggestion: > > const Type** fsvgetresult = (const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); Fixed in latest commit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1567492780 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1567492883 From tholenstein at openjdk.org Tue Apr 16 15:33:03 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Apr 2024 15:33:03 GMT Subject: RFR: 8324950: IGV: save the state to a file [v27] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: don't adjust zoom level when opening a tab ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/8fb022aa..b4fb2de4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=25-26 Stats: 18 lines in 1 file changed: 2 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Tue Apr 16 15:36:44 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Apr 2024 15:36:44 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: References: Message-ID: <8AsJ1ssqCodHDiDp8cn-GSIPHiUp5NHNaBH8oDZe6lI=.7261d6bb-90cb-4324-8232-62506f4b8e6f@github.com> On Mon, 15 Apr 2024 09:01:22 GMT, Roberto Casta?eda Lozano wrote: > There is an issue with saved difference graph states. If I open [diff.zip](https://github.com/openjdk/jdk/files/14976481/diff.zip) (which I just created by importing some graphs, opening one of them, and diffing it against another one), I get the following assertion error: > > ``` > [INFO] java.lang.AssertionError > [INFO] at com.sun.hotspot.igv.util.RangeSliderModel.setPositions(RangeSliderModel.java:101) > [INFO] at com.sun.hotspot.igv.coordinator.OutlineTopComponent.lambda$loadContext$2(OutlineTopComponent.java:481) > [INFO] at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318) > [INFO] at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773) > [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720) > [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714) > [INFO] at java.base/java.security.AccessController.doPrivileged(AccessController.java:399) > [INFO] at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86) > [INFO] at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) > [INFO] at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:136) > [INFO] [catch] at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) > [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) > [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) > [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) > [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) > [INFO] at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90) > ``` > > I imagine fully supporting saving and restoring the diff state would require quite a lot of additional complexity, both in IGV and in the XML files. Maybe this is not a very important use case, and we could just not support it? Thanks for catching that! I fixed it. Difference graph are supported as long as they are in the same group. A difference graph from two different groups is just not saved to the xml. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2059379353 From kvn at openjdk.org Tue Apr 16 15:40:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Apr 2024 15:40:05 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 07:05:29 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve RC comment for Vladimir New comment is good now. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2059387459 From tholenstein at openjdk.org Tue Apr 16 15:45:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Apr 2024 15:45:20 GMT Subject: RFR: 8324950: IGV: save the state to a file [v28] In-Reply-To: References: Message-ID: <9vVr_Jg0YDr_rWDf6AIPveEw0ML_vxMyvEWk8RIwt-U=.ab156b86-4e1b-4bb2-83c0-63dc9e5017bd@github.com> > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: support negative difference in XML ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/b4fb2de4..9196f12e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=26-27 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From qamai at openjdk.org Tue Apr 16 17:11:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Apr 2024 17:11:41 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <2J2xtBtH3BYPMCvnnO67L44uvfUyBDtXdV12CgvfEeY=.048edff8-077d-4ece-9250-163486bc7a0e@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> <079UKoZpe1eMTvJIIuKDqh6uSf1g3Cu8dUm5hkVddnA=.2cf0c555-5c29-4891-b4b6-866c11d7f080@github.com> <2J2xtBtH3BYPMCvnnO67L44uvfUyBDtXdV12CgvfEeY=.048edff8-077d-4ece-9250-163486bc7a0e@github.com> Message-ID: On Tue, 16 Apr 2024 14:12:10 GMT, Roland Westrelin wrote: >> Can we check `Identity` during `PhaseCCP` instead? I see other inferences such as `AndINode` that may benefit from it. >> >> Thanks. > >> Can we check `Identity` during `PhaseCCP` instead? I see other inferences such as `AndINode` that may benefit from it. > > Can you give more details on the cases that are not covered by CCP currently? Can `Value` be extended instead of changing CCP? > > In any case, that would be out of the scope of this change, right? @rwestrel > Can you give more details on the cases that are not covered by CCP currently? For example, given `a & b`, during CCP, if the current value of `a` is `-1`, then `AndINode::Value` would yield `TypeInt::INT` while we can return a more rigorous value of `b`. > Can Value be extended instead of changing CCP? In any case, that would be out of the scope of this change, right? I think that if the issue is simply improving `CMoveNode::Value` then you are right, it would be out of the scope. However, given the reasoning for the issue being that it can improve CCP, I think it would be more generalised if we can consult `Identity` during CCP instead, maybe simply changing: `const Type* new_type = n->Value(this);` into `const Type* new_type = n->Identity(this)->Value(this);` would be adequate. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/phaseX.cpp#L1810 Or even better, if we can incorporate `Identity` into `Value` of all nodes then it may have a positive impact on all use sites of `Value`, too. Cheers, Quan Anh ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2059557240 From qamai at openjdk.org Tue Apr 16 17:25:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Apr 2024 17:25:04 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 02:21:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Rename to Type::equals, changes from code review I think it is a really nice change. It is confusing to have a `cmp` function on a type with no order. Cheers, Quan Anh ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/18533#pullrequestreview-2004225430 From duke at openjdk.org Tue Apr 16 18:00:42 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 16 Apr 2024 18:00:42 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 19:44:26 GMT, Dean Long wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > How can we be confident that the encoding is correct? Would it be possible to write tests for this? Maybe one that disassembles it and compares the result to a 3rd party disassembler offline or in-process hsdis? Thank you @dean-long for the comment. I agree, automated testing is needed. I'm looking into a way to implement something such as you describe. A lot of manual testing was done prior to posting the PR but such testing is error-prone and is too much work if iterating an implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2059652778 From duke at openjdk.org Tue Apr 16 18:43:02 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 16 Apr 2024 18:43:02 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 09:42:22 GMT, Emanuel Peter wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Can the APX features be simulated, maybe even with SDE? > > Now you made the flag EXPERIMENTAL and by default false. What is the roadmap with this? It is generally not great to have default false flags, because the code underneath will just slowly rot and become broken. Is there a plan to eventually make it default true? What stops us from doing that already now? Thank you @eme64 for the comments. The functionality of the UseAPX flag is, as you point out, incomplete in this pull request. A subsequent PR (see JDK-8329030) will tie the logic of the flag in with a query of the hardware features. It was added in this PR thinking it could be useful for testing or debugging the encoding functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2059716310 From duke at openjdk.org Tue Apr 16 19:14:44 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 16 Apr 2024 19:14:44 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:39:42 GMT, Vladimir Kozlov wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > src/hotspot/cpu/x86/assembler_x86.hpp line 790: > >> 788: int vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, >> 789: VexSimdPrefix pre, VexOpcode opc, >> 790: InstructionAttr *attributes, bool src_is_gpr = false); > > I saw a lot of usage of this method with `src_is_gpr` is `true`. What is actual most common case? Thanks @vnkozlov. I believe false is the most common case. If I counted right, I see 400+ calls total and 37 calls with src_is_gpr set to true. > src/hotspot/cpu/x86/assembler_x86.hpp line 796: > >> 794: >> 795: int simd_prefix_and_encode(XMMRegister dst, XMMRegister nds, XMMRegister src, VexSimdPrefix pre, >> 796: VexOpcode opc, InstructionAttr *attributes, bool src_is_gpr = false); > > Same question as for `vex_prefix_and_encode` Thanks @vnkozlov. I believe false is the most common case here too. Again, if my counts are close, I see ~180 calls total and 12 calls with src_is_gpr = true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567831813 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567832045 From duke at openjdk.org Tue Apr 16 19:14:44 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 16 Apr 2024 19:14:44 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:43:35 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/globals_x86.hpp line 236: >> >>> 234: "mitigations for the Intel JCC erratum") \ >>> 235: \ >>> 236: product(bool, UseAPX, false, EXPERIMENTAL, \ >> >> Spacing to `` > > @steveatgh, do you plan to add code to `vm_version_x86.*` to check for presence of AVX and setting this flag accordingly? Thanks @vnkozlov. Spacing fixed locally. Regarding the UseAPX flag, yes, a subsequent PR (see JDK-8329030) will tie the logic of the flag in with querying the hardware features. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567832686 From duke at openjdk.org Tue Apr 16 19:58:59 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Tue, 16 Apr 2024 19:58:59 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 17:33:06 GMT, Vladimir Kozlov wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > src/hotspot/cpu/x86/assembler_x86.cpp line 669: > >> 667: // [base + disp] >> 668: assert(((base_enc & 0x7) != 4), "illegal addressing mode"); >> 669: if (disp == 0 && no_relocation && ((base_enc & 0x7) != 5)) { > > We loost information with this change. Can it be done as `is_r13_encoding(base_enc)` and `is_r12_enxoding(base_enc)`? Thanks for the comment. The "& 0x7" style was suggested to me by @sviswa7 as a efficient way to check for r12, r20, r28 in the assert, and for r13, r21, r29 in the if statement. I originally was comparing against each new APX register encoding. The style in the PR is concise but it can be done either way. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567874941 From kvn at openjdk.org Tue Apr 16 20:25:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Apr 2024 20:25:45 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 19:11:52 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.hpp line 790: >> >>> 788: int vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, >>> 789: VexSimdPrefix pre, VexOpcode opc, >>> 790: InstructionAttr *attributes, bool src_is_gpr = false); >> >> I saw a lot of usage of this method with `src_is_gpr` is `true`. What is actual most common case? > > Thanks @vnkozlov. I believe false is the most common case. If I counted right, I see 400+ calls total and 37 calls with src_is_gpr set to true. Good. Thank you for checking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567901184 From kvn at openjdk.org Tue Apr 16 20:41:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Apr 2024 20:41:47 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 19:55:45 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 669: >> >>> 667: // [base + disp] >>> 668: assert(((base_enc & 0x7) != 4), "illegal addressing mode"); >>> 669: if (disp == 0 && no_relocation && ((base_enc & 0x7) != 5)) { >> >> We loost information with this change. Can it be done as `is_r13_encoding(base_enc)` and `is_r12_enxoding(base_enc)`? > > Thanks for the comment. The "& 0x7" style was suggested to me by @sviswa7 as a efficient way to check for r12, r20, r28 in the assert, and for r13, r21, r29 in the if statement. I originally was comparing against each new APX register encoding. The style in the PR is concise but it can be done either way. What do you think? Got it - it is for few registers check now. What is common between rsp, r12, r20, r28 registers (except encoding)? R12 is used for heap base in compressed oops and RSP is RSP. What are r20 and r28? Why they can't be used in this addressing mode? Please add comments for all lines where you replaced checks for `r*->encoding()` to say for which registers you do a check and why. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567916701 From kvn at openjdk.org Tue Apr 16 20:41:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Apr 2024 20:41:47 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 19:12:32 GMT, Steve Dohrmann wrote: >> @steveatgh, do you plan to add code to `vm_version_x86.*` to check for presence of AVX and setting this flag accordingly? > > Thanks @vnkozlov. > Spacing fixed locally. > Regarding the UseAPX flag, yes, a subsequent PR (see JDK-8329030) will tie the logic of the flag in with querying the hardware features. Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1567903706 From joshcao at amazon.com Tue Apr 16 21:55:31 2024 From: joshcao at amazon.com (Cao, Joshua) Date: Tue, 16 Apr 2024 21:55:31 +0000 Subject: Adding a flag to a jtreg test Message-ID: <550d4f2da2bf4404a310d3688cd18c7b@amazon.com> I am writing a jtreg testcase that reproduces the issue in https://bugs.openjdk.org/browse/JDK-8329797 ``` diff --git a/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java b/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java index 80dda306d3b..97b92c63744 100644 --- a/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java +++ b/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java @@ -34,7 +34,7 @@ * @summary Test that if expressions are properly folded into min/max nodes * @requires os.arch != "riscv64" * @library /test/lib / - * @run main compiler.c2.irTests.TestIfMinMax + * @run main/othervm -XX:+UseShenandoahGC compiler.c2.irTests.TestIfMinMax */ public class TestIfMinMax { private static final Random RANDOM = Utils.getRandomInstance(); @@ -139,6 +139,31 @@ public long testMaxL2E(long a, long b) { return a <= b ? b : a; } + public class Dummy { + long l; + public Dummy(long l) { this.l = l; } + } + + @Setup + Object[] setupDummyArray() { + Dummy[] arr = new Dummy[512]; + for (int i = 0; i < 512; i++) { + arr[i] = new Dummy(RANDOM.nextLong()); + } + return new Object[] { arr }; + } + + @Test + @Arguments(setup = "setupDummyArray") + @IR(failOn = { IRNode.MAX_L }) + public long testMaxLAndBarrierInLoop(Dummy[] arr) { + long result = 0; + for (int i = 0; i < arr.length; ++i) { + result += Math.max(arr[i].l, 1); + } + return result; + } + @Setup static Object[] setupIntArrays() { int[] a = new int[512]; ``` If I run the test usually with `make CONF=linux-x86_64-server-fastdebug run-test TEST=test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java`, the program does not crash and the reproducer does not work. However, if I specify to use `-XX:+UseShenandoahGC` from the command line, the crash is successfully reproduced `make CONF=linux-x86_64-server-fastdebug run-test TEST=test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java JTREG="JAVA_OPTIONS=-XX:+UseShenandoahGC`. So I guess there is inconsistency between the command line JVM arguments and those passed through `@run`. Without the command line action, the logs show ``` /home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/java \\ -Dtest.vm.opts='-XX:MaxRAMPercentage=1.25 -Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current -Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp' \\ -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=1.25 -J-Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current -J-Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp' \\ -Dtest.compiler.opts= \\ -Dtest.java.opts= \\ -Dtest.jdk=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk \\ -Dcompile.jdk=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk \\ -Dtest.timeout.factor=4.0 \\ -Dtest.nativepath=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/test/hotspot/jtreg/native \\ -Dtest.root=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg \\ -Dtest.name=compiler/c2/irTests/TestIfMinMax.java \\ -Dtest.file=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java \\ -Dtest.src=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests \\ -Dtest.src.path=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests:/local/home/joshcao/jdk/jdk/test/lib:/local/home/joshcao/jdk/jdk/test/hotspot/jtreg \\ -Dtest.classes=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d \\ -Dtest.class.path=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/test/lib:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0 \\ -Dtest.class.path.prefix=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d:/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/test/lib:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0 \\ -XX:MaxRAMPercentage=1.25 \\ -Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current \\ -Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp \\ -Djava.library.path=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/test/hotspot/jtreg/native \\ -XX:+UseShenandoahGC \\ com.sun.javatest.regtest.agent.MainWrapper /local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/compiler/c2/irTests/TestIfMinMax.d/main.0.jta result: Passed. Execution successful ``` When specifying `-XX:+UseShendoahGC`, I see `-Dtest.java.opts=-XX:+UseShenandoahGC`. I would have expected that the `-XX:+UseShenandoahGC` would propagate to the test VM. Any recommendations on how to add Shenandoah flags to the jtreg test? -------------- next part -------------- An HTML attachment was scrubbed... URL: From fyang at openjdk.org Wed Apr 17 00:09:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 Apr 2024 00:09:24 GMT Subject: RFR: 8330419: Unused code in ConnectionGraph::specialize_castpp Message-ID: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> Please review this small code cleanup change. Noticed that `minus_one` local created in `ConnectionGraph::specialize_castpp` which is added by JDK-8316991 is never used. I think it should be safe to remove this. Also renamed `boll` to `bol` to be consistent in naming with other places where we create a `BoolNode`. Tersting: tier1 tested on linux-aarch64 (release & fastdebug) ------------- Commit messages: - cleanup Changes: https://git.openjdk.org/jdk/pull/18805/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18805&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330419 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18805/head:pull/18805 PR: https://git.openjdk.org/jdk/pull/18805 From duke at openjdk.org Wed Apr 17 00:47:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 17 Apr 2024 00:47:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix white space, add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/1ad335a3..3d62dce8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=00-01 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Wed Apr 17 00:47:29 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Wed, 17 Apr 2024 00:47:29 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: References: Message-ID: <29kqrbnmwtVnBNCChEKE2AHE9xYbP-GGNgt9s8B-05w=.feaa2739-4d42-466c-9909-73430bcf15a8@github.com> On Tue, 16 Apr 2024 20:39:08 GMT, Vladimir Kozlov wrote: >> Thanks for the comment. The "& 0x7" style was suggested to me by @sviswa7 as a efficient way to check for r12, r20, r28 in the assert, and for r13, r21, r29 in the if statement. I originally was comparing against each new APX register encoding. The style in the PR is concise but it can be done either way. What do you think? > > Got it - it is for few registers check now. What is common between rsp, r12, r20, r28 registers (except encoding)? > R12 is used for heap base in compressed oops and RSP is RSP. What are r20 and r28? Why they can't be used in this addressing mode? > > Please add comments for all lines where you replaced checks for `r*->encoding()` to say for which registers you do a check and why. The reason registers 12, 20, and 28 are asserted out of that else at line 668 is they are handled earlier in an else around line 646. ` } else if ((base_enc & 0x7) == 4) { // [rsp + disp] ` I added a comment to this effect and have also added comments in the 3 other places in the function where the replacement was done, indicating the registers involved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1568077107 From fjiang at openjdk.org Wed Apr 17 00:53:49 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 17 Apr 2024 00:53:49 GMT Subject: RFR: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 04:17:08 GMT, Fei Yang wrote: >> Hi, please review this fix that adds additional CMove match rules for the riscv port. >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. >> >> After this fix, the following five tests would pass without any crashes. >> >> Testing: >> - [x] compiler/eliminateAutobox/TestDoubleBoxing.java >> - [x] compiler/eliminateAutobox/TestFloatBoxing.java >> - [x] compiler/eliminateAutobox/TestLongBoxing.java >> - [x] compiler/eliminateAutobox/TestIntBoxing.java >> - [x] compiler/eliminateAutobox/TestShortBoxing.java >> - [x] tier1~3 (linux-riscv64, release) >> - [x] hotspot:tier1 (linux-riscv64, fastdebug) > > Looks good. Thanks! @RealFYang @robehn -- Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18774#issuecomment-2060142289 From fjiang at openjdk.org Wed Apr 17 00:53:50 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 17 Apr 2024 00:53:50 GMT Subject: RFR: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 In-Reply-To: References: Message-ID: On Sun, 14 Apr 2024 07:58:48 GMT, Feilong Jiang wrote: > Hi, please review this fix that adds additional CMove match rules for the riscv port. > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. > > After this fix, the following five tests would pass without any crashes. > > Testing: > - [x] compiler/eliminateAutobox/TestDoubleBoxing.java > - [x] compiler/eliminateAutobox/TestFloatBoxing.java > - [x] compiler/eliminateAutobox/TestLongBoxing.java > - [x] compiler/eliminateAutobox/TestIntBoxing.java > - [x] compiler/eliminateAutobox/TestShortBoxing.java > - [x] tier1~3 (linux-riscv64, release) > - [x] hotspot:tier1 (linux-riscv64, fastdebug) windows-x64 build failure seems not related. It is riscv only modification, ------------- PR Comment: https://git.openjdk.org/jdk/pull/18774#issuecomment-2060143324 From fjiang at openjdk.org Wed Apr 17 00:53:50 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 17 Apr 2024 00:53:50 GMT Subject: Integrated: 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 In-Reply-To: References: Message-ID: On Sun, 14 Apr 2024 07:58:48 GMT, Feilong Jiang wrote: > Hi, please review this fix that adds additional CMove match rules for the riscv port. > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) introduces more conditional moves which reduce merges used by CmpP/CmpN. However, there is no match rule for CMove with CmpP/N on riscv, resulting in the `bad AD file` crash. > > After this fix, the following five tests would pass without any crashes. > > Testing: > - [x] compiler/eliminateAutobox/TestDoubleBoxing.java > - [x] compiler/eliminateAutobox/TestFloatBoxing.java > - [x] compiler/eliminateAutobox/TestLongBoxing.java > - [x] compiler/eliminateAutobox/TestIntBoxing.java > - [x] compiler/eliminateAutobox/TestShortBoxing.java > - [x] tier1~3 (linux-riscv64, release) > - [x] hotspot:tier1 (linux-riscv64, fastdebug) This pull request has now been integrated. Changeset: c8702ede Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/c8702ede97437e0197340a559987ca321f67c15b Stats: 68 lines in 1 file changed: 68 ins; 0 del; 0 mod 8330213: RISC-V: C2: assert(false) failed: bad AD file after JDK-8316991 Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/18774 From kvn at openjdk.org Wed Apr 17 02:14:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Apr 2024 02:14:47 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: <29kqrbnmwtVnBNCChEKE2AHE9xYbP-GGNgt9s8B-05w=.feaa2739-4d42-466c-9909-73430bcf15a8@github.com> References: <29kqrbnmwtVnBNCChEKE2AHE9xYbP-GGNgt9s8B-05w=.feaa2739-4d42-466c-9909-73430bcf15a8@github.com> Message-ID: On Wed, 17 Apr 2024 00:44:09 GMT, Steve Dohrmann wrote: >> Got it - it is for few registers check now. What is common between rsp, r12, r20, r28 registers (except encoding)? >> R12 is used for heap base in compressed oops and RSP is RSP. What are r20 and r28? Why they can't be used in this addressing mode? >> >> Please add comments for all lines where you replaced checks for `r*->encoding()` to say for which registers you do a check and why. > > The reason registers 12, 20, and 28 are asserted out of that else at line 668 is they are handled earlier in an else around line 646. > > ` } else if ((base_enc & 0x7) == 4) { > // [rsp + disp] > ` > > I added a comment to this effect and have also added comments in the 3 other places in the function where the replacement was done, indicating the registers involved. Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1568121201 From amitkumar at openjdk.org Wed Apr 17 03:26:59 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 17 Apr 2024 03:26:59 GMT Subject: RFR: 8330011: [s390x] update block-comments to make code consistent In-Reply-To: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> References: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> Message-ID: On Wed, 10 Apr 2024 10:03:20 GMT, Amit Kumar wrote: > It doesn't (shouldn't) affect the runtime. So I haven't run any test. But builds I have performed. @RealLucy could you review this one ? Also this is trivial change one review should be enough ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18710#issuecomment-2060275285 From lucy at openjdk.org Wed Apr 17 06:28:59 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 17 Apr 2024 06:28:59 GMT Subject: RFR: 8330011: [s390x] update block-comments to make code consistent In-Reply-To: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> References: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> Message-ID: On Wed, 10 Apr 2024 10:03:20 GMT, Amit Kumar wrote: > It doesn't (shouldn't) affect the runtime. So I haven't run any test. But builds I have performed. LGTM. It's a trivial change in my opinion: only comment texts are changed. One C++ statement is reformatted. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18710#pullrequestreview-2005145502 From christian.hagedorn at oracle.com Wed Apr 17 06:34:42 2024 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 17 Apr 2024 08:34:42 +0200 Subject: Adding a flag to a jtreg test In-Reply-To: <550d4f2da2bf4404a310d3688cd18c7b@amazon.com> References: <550d4f2da2bf4404a310d3688cd18c7b@amazon.com> Message-ID: <02d48fc8-d6fb-4352-b565-44331e4ee384@oracle.com> Hi Joshua For an IR test, you need to pass VM flags with `runWithFlags()` or `addFlags()` [1] (also see [2] as en example). This will pass the flags to the dedicated test VM that is spawned to run the tests. Options specified in the `@run main/othervm` line are ignored. Apart from explicitly adding flags with IR framework methods (e.g. `runWithFlags()`), the test VM only takes VM flags that are passed from the outside with -javaoptions and -vmoptions to jtreg which allows us to run the test with more flags in the CI (the IR framework additionally reads the -Dtest.java.opts and -Dtest.vm.opts properties to set up the test VM flags). It is generally recommended to use `@run driver` for IR tests to ignore other flags and not to stress the driver VM which only spawns the test VM and performs IR matching afterward. Best regards, Christian [1] https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md#23-test-vm-flags-and-scenarios [2] https://github.com/openjdk/jdk/blob/2fe2f3aff82f41a3b7942861e29ccbd3bcc68661/test/hotspot/jtreg/compiler/c2/irTests/TestSkeletonPredicates.java#L39-L40 On 16.04.24 23:55, Cao, Joshua wrote: > I am writing a jtreg testcase that reproduces the issue in > > https://bugs.openjdk.org/browse/JDK-8329797 > > > > ``` > > diff --git a/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java > b/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java > index 80dda306d3b..97b92c63744 100644 > --- a/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java > +++ b/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java > @@ -34,7 +34,7 @@ > ? * @summary Test that if expressions are properly folded into min/max nodes > ? * @requires os.arch != "riscv64" > ? * @library /test/lib / > - * @run main compiler.c2.irTests.TestIfMinMax > + * @run main/othervm -XX:+UseShenandoahGC compiler.c2.irTests.TestIfMinMax > ? */ > ?public class TestIfMinMax { > ? ? ?private static final Random RANDOM = Utils.getRandomInstance(); > @@ -139,6 +139,31 @@ public long testMaxL2E(long a, long b) { > ? ? ? ? ?return a <= b ? b : a; > ? ? ?} > > +? ? public class Dummy { > +? ? ? ? long l; > +? ? ? ? public Dummy(long l) { this.l = l; } > +? ? } > + > +? ? @Setup > +? ? Object[] setupDummyArray() { > +? ? ? ? Dummy[] arr = new Dummy[512]; > +? ? ? ? for (int i = 0; i < 512; i++) { > +? ? ? ? ? ? arr[i] = new Dummy(RANDOM.nextLong()); > +? ? ? ? } > +? ? ? ? return new Object[] { arr }; > +? ? } > + > +? ? @Test > +? ? @Arguments(setup = "setupDummyArray") > +? ? @IR(failOn = { IRNode.MAX_L }) > +? ? public long testMaxLAndBarrierInLoop(Dummy[] arr) { > +? ? ? ? long result = 0; > +? ? ? ? for (int i = 0; i < arr.length; ++i) { > +? ? ? ? ? ? result += Math.max(arr[i].l, 1); > +? ? ? ? } > +? ? ? ? return result; > +? ? } > + > ? ? ?@Setup > ? ? ?static Object[] setupIntArrays() { > ? ? ? ? ?int[] a = new int[512]; > ``` > > If I?run the test usually with `make CONF=linux-x86_64-server-fastdebug run-test > TEST=test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java`, the program > does not crash and the reproducer does not work. However, if I specify to use > `-XX:+UseShenandoahGC` from the command line, the crash is successfully > reproduced `make CONF=linux-x86_64-server-fastdebug run-test > TEST=test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java > JTREG="JAVA_OPTIONS=-XX:+UseShenandoahGC`. So I guess there is inconsistency > between the command line JVM arguments and those passed through `@run`. > > > Without the command line action, the logs show > > > ``` > > /home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/java \\ > ??????? -Dtest.vm.opts='-XX:MaxRAMPercentage=1.25 > -Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current > -Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp' \\ > ??????? -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=1.25 > -J-Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current > -J-Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp' \\ > ??????? -Dtest.compiler.opts= \\ > ??????? -Dtest.java.opts= \\ > ??????? > -Dtest.jdk=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk \\ > ??????? > -Dcompile.jdk=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/jdk \\ > ??????? -Dtest.timeout.factor=4.0 \\ > ??????? > -Dtest.nativepath=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/test/hotspot/jtreg/native \\ > ??????? -Dtest.root=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg \\ > ??????? -Dtest.name=compiler/c2/irTests/TestIfMinMax.java \\ > ??????? > -Dtest.file=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java \\ > ??????? > -Dtest.src=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests \\ > ??????? > -Dtest.src.path=/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests:/local/home/joshcao/jdk/jdk/test/lib:/local/home/joshcao/jdk/jdk/test/hotspot/jtreg \\ > ??????? > -Dtest.classes=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d \\ > ??????? > -Dtest.class.path=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/test/lib:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0 \\ > ??????? > -Dtest.class.path.prefix=/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/compiler/c2/irTests/TestIfMinMax.d:/local/home/joshcao/jdk/jdk/test/hotspot/jtreg/compiler/c2/irTests:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0/test/lib:/local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/classes/0 \\ > ??????? -XX:MaxRAMPercentage=1.25 \\ > ??????? -Dtest.boot.jdk=/home/joshcao/.sdkman/candidates/java/current \\ > ??????? > -Djava.io.tmpdir=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/tmp \\ > ??????? > -Djava.library.path=/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/images/test/hotspot/jtreg/native \\ > ??????? -XX:+UseShenandoahGC \\ > ??????? com.sun.javatest.regtest.agent.MainWrapper > /local/home/joshcao/jdk/jdk/build/linux-x86_64-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_compiler_c2_irTests_TestIfMinMax_java/compiler/c2/irTests/TestIfMinMax.d/main.0.jta > result: Passed. Execution successful > ``` > > > When specifying `-XX:+UseShendoahGC`, I see > `-Dtest.java.opts=-XX:+UseShenandoahGC`. I would have expected that the > `-XX:+UseShenandoahGC` would propagate to the test VM. Any recommendations on > how to add Shenandoah flags to the jtreg test? > > From amitkumar at openjdk.org Wed Apr 17 06:37:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 17 Apr 2024 06:37:01 GMT Subject: RFR: 8330011: [s390x] update block-comments to make code consistent In-Reply-To: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> References: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> Message-ID: On Wed, 10 Apr 2024 10:03:20 GMT, Amit Kumar wrote: > It doesn't (shouldn't) affect the runtime. So I haven't run any test. But builds I have performed. Thanks Lutz for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18710#issuecomment-2060480361 From amitkumar at openjdk.org Wed Apr 17 06:37:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 17 Apr 2024 06:37:01 GMT Subject: Integrated: 8330011: [s390x] update block-comments to make code consistent In-Reply-To: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> References: <1ajcIqkPMh6uLecAhY2u5rZZmqhaHIRGtgD16VEJeu0=.567f452d-095b-40b3-9acd-229b5369ad57@github.com> Message-ID: <5pBG90WxVlXCwGeP5UDya_9PBvktgrf2iIvDPZaFOHY=.be070a4b-50f7-4383-b362-5337dff54a7c@github.com> On Wed, 10 Apr 2024 10:03:20 GMT, Amit Kumar wrote: > It doesn't (shouldn't) affect the runtime. So I haven't run any test. But builds I have performed. This pull request has now been integrated. Changeset: 01bda278 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/01bda278d6a498ca89c0bc5218680cd51a04e9d3 Stats: 27 lines in 2 files changed: 2 ins; 0 del; 25 mod 8330011: [s390x] update block-comments to make code consistent Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/18710 From rcastanedalo at openjdk.org Wed Apr 17 06:41:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Apr 2024 06:41:02 GMT Subject: Integrated: 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes In-Reply-To: References: Message-ID: On Mon, 15 Apr 2024 12:49:31 GMT, Roberto Casta?eda Lozano wrote: > This (trivial?) cleanup reuses `MemNode::barrier_data()` (added recently by [JDK-8322692](https://bugs.openjdk.org/browse/JDK-8322692)) to compute the GC barrier data to be transferred from Ideal nodes to their corresponding Mach nodes in `Matcher::ReduceInst()`. > > **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64; release and debug mode). This pull request has now been integrated. Changeset: 9d63fee4 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/9d63fee49c3b365e19cf492412a6b6d8c9633964 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod 8330262: C2: simplify transfer of GC barrier data from Ideal to Mach nodes Reviewed-by: eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18784 From rcastanedalo at openjdk.org Wed Apr 17 07:08:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Apr 2024 07:08:02 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: <8AsJ1ssqCodHDiDp8cn-GSIPHiUp5NHNaBH8oDZe6lI=.7261d6bb-90cb-4324-8232-62506f4b8e6f@github.com> References: <8AsJ1ssqCodHDiDp8cn-GSIPHiUp5NHNaBH8oDZe6lI=.7261d6bb-90cb-4324-8232-62506f4b8e6f@github.com> Message-ID: <606-XJdbtoFsY1IE48qvkaC6LgLZkaLKc0JIGlZJSMU=.c8f80bdb-a75a-453f-97b8-e0e08bbec28d@github.com> On Tue, 16 Apr 2024 15:33:20 GMT, Tobias Holenstein wrote: > Thanks for catching that! I fixed it. Difference graph are supported as long as they are in the same group. A difference graph from two different groups is just not saved to the xml. Great, that works fine now, as far as I can see. A minor related issue is that difference graphs from two different groups are not closed when the workspace is cleared. Is this expected? I would intuitively expect them to be closed as well, but I guess one can also argue that they do not belong to the workspace. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2060526662 From tholenstein at openjdk.org Wed Apr 17 07:36:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Apr 2024 07:36:16 GMT Subject: RFR: 8324950: IGV: save the state to a file [v29] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: close all instances of EditorTopComponent when closing workspace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/9196f12e..da669a90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=27-28 Stats: 13 lines in 2 files changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Wed Apr 17 07:36:16 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Apr 2024 07:36:16 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: <606-XJdbtoFsY1IE48qvkaC6LgLZkaLKc0JIGlZJSMU=.c8f80bdb-a75a-453f-97b8-e0e08bbec28d@github.com> References: <8AsJ1ssqCodHDiDp8cn-GSIPHiUp5NHNaBH8oDZe6lI=.7261d6bb-90cb-4324-8232-62506f4b8e6f@github.com> <606-XJdbtoFsY1IE48qvkaC6LgLZkaLKc0JIGlZJSMU=.c8f80bdb-a75a-453f-97b8-e0e08bbec28d@github.com> Message-ID: On Wed, 17 Apr 2024 07:04:57 GMT, Roberto Casta?eda Lozano wrote: > > Thanks for catching that! I fixed it. Difference graph are supported as long as they are in the same group. A difference graph from two different groups is just not saved to the xml. > > Great, that works fine now, as far as I can see. A minor related issue is that difference graphs from two different groups are not closed when the workspace is cleared. Is this expected? I would intuitively expect them to be closed as well, but I guess one can also argue that they do not belong to the workspace. Right. Implementation wise difference graphs from two groups are kind of a special case. But I agree the user would expect all tabs to be closed. Since we only ever have one workspace at the time, we can simply close all tabs when the workspace is closed - that's what I added in the latest commit. I think this is a good enough solution ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2060575448 From rcastanedalo at openjdk.org Wed Apr 17 08:04:46 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Apr 2024 08:04:46 GMT Subject: RFR: 8324950: IGV: save the state to a file [v29] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 07:36:16 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > close all instances of EditorTopComponent when closing workspace Looks good! src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 460: > 458: > 459: /** > 460: * Loads and opens the given a graph contexts (opened graphs and visible nodes). Suggestion: * Loads and opens the given graph context (opened graphs and visible nodes). src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 537: > 535: SwingUtilities.invokeLater(() -> { > 536: for (Node child : manager.getRootContext().getChildren().getNodes(true)) { > 537: // Nodes a lazily created. By expanding and collapsing they are all initialized Suggestion: // Nodes are lazily created. By expanding and collapsing they are all initialized src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/serialization/GraphParser.java line 2: > 1: /* > 2: * Copyright (c) 2012, 2015, Oracle and/or its affiliates. All rights reserved. This copyright header change is unnecessary. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-2005293813 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1568384710 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1568385671 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1568390983 From galder at openjdk.org Wed Apr 17 08:16:17 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 08:16:17 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Fix style and throw RuntimeException instead of System.exit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/3af9cd9e..ad6c51bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=07-08 Stats: 10 lines in 1 file changed: 0 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Wed Apr 17 08:16:17 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 08:16:17 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v8] In-Reply-To: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> References: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> Message-ID: On Mon, 15 Apr 2024 15:49:18 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with four additional commits since the last revision: > > - Peek receiver without pop/push > - require receiver_klass to be loaded for now > - missing check for unloaded > - suggested cleanup > Re: Also, I don't see GHA checks running. Are they enabled in your repo? Sorry, I had disabled the workflows on my fork because I was close to running out of free resources last month. I didn't realise it would have an impact here. I've re-enabled them and they are running now. @lmesnik I've pushed a commit to address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2060665813 From galder at openjdk.org Wed Apr 17 08:16:17 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 08:16:17 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v8] In-Reply-To: References: <833p5_HfgEo-EnpWYT5M-8HqD2XxsNgqy9TEogeIroc=.0e738dac-5f5a-449a-98f7-5e179c35883a@github.com> Message-ID: On Mon, 15 Apr 2024 20:17:27 GMT, Dean Long wrote: > It would be a good idea to ask arm/ppc/riscv/s390 port maintainers to test your changes. Sounds good, I'll update here when I've done so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2060667061 From roland at openjdk.org Wed Apr 17 08:24:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 08:24:10 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> <079UKoZpe1eMTvJIIuKDqh6uSf1g3Cu8dUm5hkVddnA=.2cf0c555-5c29-4891-b4b6-866c11d7f080@github.com> <2J2xtBtH3BYPMCvnnO67L44uvfUyBDtXdV12CgvfEeY=.048edff8-077d-4ece-9250-163486bc7a0e@github.com> Message-ID: On Tue, 16 Apr 2024 17:08:46 GMT, Quan Anh Mai wrote: > For example, given `a & b`, during CCP, if the current value of `a` is `-1`, then `AndINode::Value` would yield `TypeInt::INT` while we can return a more rigorous value of `b`. Ok. But in that case `AndINode::Value` could be improved, right? Are there cases you can think of where calling `Identity` would result in a type that's narrower than what `Value` could compute. > I think that if the issue is simply improving `CMoveNode::Value` then you are right, it would be out of the scope. However, given the reasoning for the issue being that it can improve CCP, I think it would be more generalised if we can consult `Identity` during CCP instead, maybe simply changing: `const Type* new_type = n->Value(this);` into `const Type* new_type = n->Identity(this)->Value(this);` would be adequate. I think it's debatable whether it's the right thing to do and, in any case, it would be quite a bit of extra work. So I intend to go with the small improvement to `CMoveNode::Value`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2060682334 From tholenstein at openjdk.org Wed Apr 17 09:05:37 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Apr 2024 09:05:37 GMT Subject: RFR: 8324950: IGV: save the state to a file [v30] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java Co-authored-by: Roberto Casta?eda Lozano - Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/da669a90..5e055abf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=28-29 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Wed Apr 17 09:11:13 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Apr 2024 09:11:13 GMT Subject: RFR: 8324950: IGV: save the state to a file [v31] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update GraphParser.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/5e055abf..a4b48c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=29-30 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From rcastanedalo at openjdk.org Wed Apr 17 09:15:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Apr 2024 09:15:01 GMT Subject: RFR: 8324950: IGV: save the state to a file [v31] In-Reply-To: References: Message-ID: <26gcDmeciECtvVAAPWzReMIiXsiPI0dYqh2XPNCHy2Q=.a18ce9ae-947b-499f-88ea-1eaa847a89aa@github.com> On Wed, 17 Apr 2024 09:11:13 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update GraphParser.java Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-2005475212 From galder at openjdk.org Wed Apr 17 09:23:01 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 09:23:01 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 08:16:17 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix style and throw RuntimeException instead of System.exit FAO @bulasevich @TheRealMDoerr @RealFYang @RealLucy I've created [JDK-8330472](https://bugs.openjdk.org/browse/JDK-8330472) to get the changes in this PR tested in arm/ppc/riscv/s390 architectures, to make sure no regressions are introduced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2060802202 From qamai at openjdk.org Wed Apr 17 09:42:00 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 17 Apr 2024 09:42:00 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> Message-ID: On Fri, 12 Apr 2024 14:34:01 GMT, Roland Westrelin wrote: >> This is another small change from something I ran into while working >> on 8275202. `CMoveNode::Value` can be improved when the condition is >> known to be always true or false. That doesn't affect IGVN (as the >> `CMove` is removed in that case) but it can be useful for passes that >> propagates types such as CCP. In the IR tests, the backbranch of the >> loop is never taken when the root of the compilation is `test1`. With >> the change, CCP can eliminate it. Without, it can't. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/movenode.cpp > > Co-authored-by: Christian Hagedorn I see, please go ahead, thanks a lot for your response. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2060837944 From mdoerr at openjdk.org Wed Apr 17 10:12:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 17 Apr 2024 10:12:05 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 08:16:17 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix style and throw RuntimeException instead of System.exit src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 304: > 302: // clear rest of allocated space > 303: if (zero_array) { > 304: const Register len_zero = len; hotspot uses 2 spaces indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1568599376 From mdoerr at openjdk.org Wed Apr 17 10:22:45 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 17 Apr 2024 10:22:45 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 09:20:02 GMT, Galder Zamarre?o wrote: > FAO @bulasevich @TheRealMDoerr @RealFYang @RealLucy I've created [JDK-8330472](https://bugs.openjdk.org/browse/JDK-8330472) to port the changes here to arm/ppc/riscv/s390. Also, the changes in this PR have been in made in such way that they only affect architectures on which the intrinsic is implemented. Would you also be able to test the changes in this PR to make sure no regressions are introduced on these archs? Thanks! I'll test it. We will need separate issues for these platforms because they are maintained by different people. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2060924304 From mdoerr at openjdk.org Wed Apr 17 10:22:46 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 17 Apr 2024 10:22:46 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 08:16:17 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Fix style and throw RuntimeException instead of System.exit src/hotspot/share/c1/c1_GraphBuilder.cpp line 3656: > 3654: case vmIntrinsics::_getCharStringU : append_char_access(callee, false); return; > 3655: case vmIntrinsics::_putCharStringU : append_char_access(callee, true); return; > 3656: case vmIntrinsicID::_clone : append_alloc_array_copy(callee); return; Why do you use `vmIntrinsicID` while all others use `vmIntrinsics`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1568609663 From roland at openjdk.org Wed Apr 17 10:50:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 10:50:23 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int Message-ID: This fixes 3 calls to ABS with a min int argument. I think all of them are harmless: - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The check is for a stride of 1 or -1. - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the computation of `scaled_iters_long`, the stride is passed to `ABS()` and then implicitly casted to long. I now cast the stride to long before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` overflows the int range for all values of `LoopStripMiningIter` except 0 or 1. Those values are handled early on in that method. So for a min in stride: ``` (jlong)scaled_iters != scaled_iters_long ``` is always true and the method returns early. - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the computation of `short_scaled_iters` also calls `ABS()` with the stride as argument. But the result of that computation is only used if the test for: ``` (jlong)scaled_iters != scaled_iters_long ``` doesn't cause an early return of the method. I reordered statmements so the `ABS()` calls happens after that test which will cause an early return if the stride is min int. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/18813/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18813&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330158 Stats: 11 lines in 1 file changed: 6 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18813.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18813/head:pull/18813 PR: https://git.openjdk.org/jdk/pull/18813 From roland at openjdk.org Wed Apr 17 10:50:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 10:50:51 GMT Subject: RFR: 8330163: C2: improve CMoveNode::Value() when condition is always true or false [v2] In-Reply-To: References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> <5SOFzNeQdFyNxfdsN8zfewJXj1n5272OHSoqinL51L4=.d82dbb55-b414-4275-a3d9-9c45b70ad96a@github.com> Message-ID: On Fri, 12 Apr 2024 14:40:55 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/movenode.cpp >> >> Co-authored-by: Christian Hagedorn > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18757#issuecomment-2060971643 From roland at openjdk.org Wed Apr 17 10:50:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 10:50:52 GMT Subject: Integrated: 8330163: C2: improve CMoveNode::Value() when condition is always true or false In-Reply-To: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> References: <9EpIABqvg9qB0Qoli_j63OWlQM-HYNhbnPwIXYF6xG4=.6afb1ced-bd82-40b8-8fc4-686a677c5157@github.com> Message-ID: On Fri, 12 Apr 2024 11:45:05 GMT, Roland Westrelin wrote: > This is another small change from something I ran into while working > on 8275202. `CMoveNode::Value` can be improved when the condition is > known to be always true or false. That doesn't affect IGVN (as the > `CMove` is removed in that case) but it can be useful for passes that > propagates types such as CCP. In the IR tests, the backbranch of the > loop is never taken when the root of the compilation is `test1`. With > the change, CCP can eliminate it. Without, it can't. This pull request has now been integrated. Changeset: 9445047d Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/9445047d059a87d49ed0923b438d2ec49340d78e Stats: 78 lines in 2 files changed: 78 ins; 0 del; 0 mod 8330163: C2: improve CMoveNode::Value() when condition is always true or false Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18757 From galder at openjdk.org Wed Apr 17 11:08:01 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 11:08:01 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v3] In-Reply-To: References: Message-ID: > Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. > > It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. > > `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. > > I've run hotspot compiler tests successfully on x86_64. Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Use failOn instead of counts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18738/files - new: https://git.openjdk.org/jdk/pull/18738/files/f3d20ced..7f6e4b5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18738&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18738&range=01-02 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18738.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18738/head:pull/18738 PR: https://git.openjdk.org/jdk/pull/18738 From galder at openjdk.org Wed Apr 17 11:08:01 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 17 Apr 2024 11:08:01 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v2] In-Reply-To: References: <-7e1hzsSV0lNwIZvT6e0CNo9947mo_ZrJFct65az_kc=.b3fe2630-9048-44fc-8f71-4c44635f6859@github.com> Message-ID: <5oZ47lXTdh_8yyB1pLuFU679jwRwmsxP_wiCjfgTyjw=.dbbd6577-380c-4114-b693-841f2a337416@github.com> On Tue, 16 Apr 2024 06:56:57 GMT, Christian Hagedorn wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Small IR test fixes >> >> * Fixed bug ID number. >> * Added test summary. >> * Removed unnecessary @requires. >> * Added @Check methods to verify optimizations return the expected result. > > Thanks for the update, looks good! I'll submit some testing. @chhagedorn I've made the `failOn` change and pushed a commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18738#issuecomment-2061001578 From yzheng at openjdk.org Wed Apr 17 12:11:14 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 17 Apr 2024 12:11:14 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v4] In-Reply-To: References: Message-ID: > Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/870a6127..c5d521dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=02-03 Stats: 53 lines in 10 files changed: 3 ins; 6 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226 From chagedorn at openjdk.org Wed Apr 17 12:23:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Apr 2024 12:23:02 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v3] In-Reply-To: References: Message-ID: <3GGBYPlmvwL1gMb_dBiiFpbx7x8F3QW6Qz_sv4Kra14=.a1d6b362-d94f-4c57-b73e-a65ed4e59cf5@github.com> On Wed, 17 Apr 2024 11:08:01 GMT, Galder Zamarre?o wrote: >> Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. >> >> It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. >> >> `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. >> >> I've run hotspot compiler tests successfully on x86_64. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Use failOn instead of counts Testing passed, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18738#pullrequestreview-2005890959 From yzheng at openjdk.org Wed Apr 17 12:35:54 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 17 Apr 2024 12:35:54 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: > Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/c5d521dc..72ba58ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18226&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226 From yzheng at openjdk.org Wed Apr 17 12:40:01 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 17 Apr 2024 12:40:01 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > `multiply_to_len` seems to be used by `generate_squareToLen` as well for aarch64 and riscv but `zlen` is still passed in a register. > > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 > > I think it might work anyway but it might be better to adapt them if only for completeness. @dafedafe @dean-long please take a look and let me know if there are further issues, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061162283 From epeter at openjdk.org Wed Apr 17 13:34:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Apr 2024 13:34:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v13] In-Reply-To: References: Message-ID: On Fri, 5 Apr 2024 13:22:47 GMT, Roland Westrelin wrote: >> And you further repeat the comments below. I fear that if someone would eventually make changes, they would not update all comments, and then the comments diverge. > > That one doesn't duplicate the one above `transform_get_subgraph()`. It's supposed to show what change was just made to the graph by `replace_current_exit_of_get_with_halt()`. Same for the one below, it's expected to show an incremental change. It's hard to show what changes without keeping the entire structure of the code I think. Maybe you can have them side-by side, so it is easier to visually parse the difference? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1568851037 From dfenacci at openjdk.org Wed Apr 17 13:49:05 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 17 Apr 2024 13:49:05 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3] In-Reply-To: References: Message-ID: On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > `multiply_to_len` seems to be used by `generate_squareToLen` as well for aarch64 and riscv but `zlen` is still passed in a register. > > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 > > I think it might work anyway but it might be better to adapt them if only for completeness. > @dafedafe @dean-long please take a look and let me know if there are further issues, thanks! Thanks @mur47x111! I noticed that you found even a few more `zlen` usages ? Did you test the change against all affected platforms? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061301421 From epeter at openjdk.org Wed Apr 17 13:49:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Apr 2024 13:49:11 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8320649 > - review > - test fix > - test fix > - Merge branch 'master' into JDK-8320649 > - whitespaces > - review > - Merge branch 'master' into JDK-8320649 > - review > - 32 bit build fix > - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e src/hotspot/share/opto/loopnode.cpp line 5293: > 5291: // ScopedValue object: either the ScopedValueGetResult and ScopedValueGetHitsInCache are from the same > 5292: // ScopedValue.get() and we remove the ScopedValueGetResult because it's only useful to optimize > 5293: // ScopedValue.get() where the slow path is taken. Or They are from difference ScopedValue.get() and we Suggestion: // ScopedValue.get() where the slow path is taken. Or they are from different ScopedValue.get() and we ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1568875169 From epeter at openjdk.org Wed Apr 17 13:55:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Apr 2024 13:55:05 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Tue, 16 Apr 2024 14:46:22 GMT, Roland Westrelin wrote: >> @rwestrel thanks for asking. About 10% seems to still be scheduled and have not completed, on `macosx-x64`. But the rest seems fine. I'll re-review next week :) > > @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. Otherwise your replies sound good, thanks @rwestrel ! I'm a little sick this week, and with a headache it is a bit difficult to real a long PR seriously. I hope I can do it soon though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2061317091 From epeter at openjdk.org Wed Apr 17 13:59:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Apr 2024 13:59:00 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 [v2] In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 08:25:56 GMT, Roland Westrelin wrote: >> After range check elimination, a cast in the main loop becomes top >> because the type of its input (that depends on the iv phi) and the >> type recorded in the cast do not intersect. This is a case that's >> expected to be caught by assert predicates but, in this particular >> case, no assert predicate constant folds. >> >> The stride for the loop is -2. The iv phi type is `min+1..0` >> >> As a consequence, the init value for the main loop has type int. >> >> The range check that causes the issue is for array access: >> >> lArrFld[i11 + 1] = 6; >> >> >> The main loop is unrolled once. The second access in the loop is at >> `i11 - 1` which has type `min..-1`. The range check cast at that >> access becomes top. The assert predicates operates on an init value >> that has the shape: >> >> >> (CastII (AddI pre_loop_iv -2) int) >> >> >> and type int. >> >> That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. >> >> The assert predicate for the first iteration in the main loop is for >> index: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) int) 1) >> >> >> And for the second: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) int) -1) >> >> >> Both have type int so the assert predicate can't constant fold. >> >> I initially fixed this by changing the type of the cast from int to >> the type of the iv phi: >> >> >> (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) >> >> >> That allows the assert predicate for the second iteration to constant >> fold. But I was then worried narrowing the type of the cast would >> causes issues going forward so instead, I propose proceeding as in >> 8282592 and have assert predicates skip over the CastII (that part of >> 8282592 was later undone): >> >> >> (AddI (AddI pre_loop_iv -2) 1) >> >> >> which allows the assert predicate for the first iteration in the main >> loop to constant fold. >> >> The change from 8282592 caused issues because we used to narrow the >> type of a cast based on the condition that guards it. That was removed >> by 8319372. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Looks reasonable. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18724#pullrequestreview-2006127965 From kxu at openjdk.org Wed Apr 17 13:59:11 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 17 Apr 2024 13:59:11 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v2] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: add pseudocode for subgraphs before/after the transformation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/2230c7a6..efc270ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=00-01 Stats: 20 lines in 1 file changed: 20 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From epeter at openjdk.org Wed Apr 17 14:03:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Apr 2024 14:03:01 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 01:38:28 GMT, Jasmine Karthikeyan wrote: > This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. > I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. > > Thoughts and reviews would be appreciated! Looks reasonable, thanks for the fix! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18734#pullrequestreview-2006137292 From matsaave at openjdk.org Wed Apr 17 15:17:25 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 17 Apr 2024 15:17:25 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v2] In-Reply-To: References: Message-ID: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Moved membar inside load_field_entry ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18477/files - new: https://git.openjdk.org/jdk/pull/18477/files/60925369..4f976405 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=00-01 Stats: 26 lines in 5 files changed: 6 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From roland at openjdk.org Wed Apr 17 15:28:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 15:28:04 GMT Subject: RFR: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 [v2] In-Reply-To: References: Message-ID: <5UfQPl60NurAe3qQlztE4Y1lGp-CggEcq5fPIwPSGNc=.342ebca9-97b6-44df-bfa4-b85898dda62d@github.com> On Thu, 11 Apr 2024 08:39:20 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks for the update, that looks good! @chhagedorn @eme64 thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18724#issuecomment-2061537793 From roland at openjdk.org Wed Apr 17 15:28:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 15:28:05 GMT Subject: Integrated: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 13:41:11 GMT, Roland Westrelin wrote: > After range check elimination, a cast in the main loop becomes top > because the type of its input (that depends on the iv phi) and the > type recorded in the cast do not intersect. This is a case that's > expected to be caught by assert predicates but, in this particular > case, no assert predicate constant folds. > > The stride for the loop is -2. The iv phi type is `min+1..0` > > As a consequence, the init value for the main loop has type int. > > The range check that causes the issue is for array access: > > lArrFld[i11 + 1] = 6; > > > The main loop is unrolled once. The second access in the loop is at > `i11 - 1` which has type `min..-1`. The range check cast at that > access becomes top. The assert predicates operates on an init value > that has the shape: > > > (CastII (AddI pre_loop_iv -2) int) > > > and type int. > > That `CastII` is inserted by `PhaseIdealLoop::cast_incr_before_loop()`. > > The assert predicate for the first iteration in the main loop is for > index: > > > (AddI (CastII (AddI pre_loop_iv -2) int) 1) > > > And for the second: > > > (AddI (CastII (AddI pre_loop_iv -2) int) -1) > > > Both have type int so the assert predicate can't constant fold. > > I initially fixed this by changing the type of the cast from int to > the type of the iv phi: > > > (AddI (CastII (AddI pre_loop_iv -2) min+1..0) -1) > > > That allows the assert predicate for the second iteration to constant > fold. But I was then worried narrowing the type of the cast would > causes issues going forward so instead, I propose proceeding as in > 8282592 and have assert predicates skip over the CastII (that part of > 8282592 was later undone): > > > (AddI (AddI pre_loop_iv -2) 1) > > > which allows the assert predicate for the first iteration in the main > loop to constant fold. > > The change from 8282592 caused issues because we used to narrow the > type of a cast based on the condition that guards it. That was removed > by 8319372. This pull request has now been integrated. Changeset: 9fd78022 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/9fd78022b19149ade40f92749f0b585ecfd41410 Stats: 67 lines in 2 files changed: 67 ins; 0 del; 0 mod 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902 Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/18724 From dfenacci at openjdk.org Wed Apr 17 15:35:02 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 17 Apr 2024 15:35:02 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: <-mzy0DMp5bkWncrCLpJB6ItkQFGmEB2BEyj-TWfPQsQ=.93e17425-e4af-4afd-9f07-5d52770842d2@github.com> On Thu, 11 Apr 2024 01:38:28 GMT, Jasmine Karthikeyan wrote: > This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. > I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. > > Thoughts and reviews would be appreciated! Thanks for fixing this @jaskarth. Quick question: I noticed that by setting `b[i] = 1;` now a few tests always set 1 on the right-hand-side of the max/min operations. Could it possibly limit their scope? ------------- PR Review: https://git.openjdk.org/jdk/pull/18734#pullrequestreview-2006378673 From kxu at openjdk.org Wed Apr 17 15:37:30 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 17 Apr 2024 15:37:30 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: > Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. > > Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: update comments to clarify on type casting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18489/files - new: https://git.openjdk.org/jdk/pull/18489/files/efc270ec..dcd55681 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18489&range=01-02 Stats: 9 lines in 1 file changed: 7 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18489.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489 PR: https://git.openjdk.org/jdk/pull/18489 From coleenp at openjdk.org Wed Apr 17 15:37:43 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 17 Apr 2024 15:37:43 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v2] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 15:17:25 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Moved membar inside load_field_entry I think this looks really good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18477#pullrequestreview-2006388061 From bkilambi at openjdk.org Wed Apr 17 15:46:30 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 17 Apr 2024 15:46:30 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v6] In-Reply-To: References: Message-ID: <8xmxstkq7D_wMBI-BhUcJzoJOn2bWcsUuQtXXIv4YMk=.a1a5baee-e2af-40b2-9df9-67642d90d565@github.com> > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Address some more review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/71a86deb..f38dae21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=04-05 Stats: 11 lines in 3 files changed: 5 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From shade at openjdk.org Wed Apr 17 15:47:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Apr 2024 15:47:42 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. All right, I agree with this reasoning. Have you tried running tests with #18751 applied? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18813#pullrequestreview-2006422011 From roland at openjdk.org Wed Apr 17 15:52:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Apr 2024 15:52:59 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. Thanks for reviewing this. > Have you tried running tests with #18751 applied? I only ran the particular test that you mentioned in the bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2061620378 From shade at openjdk.org Wed Apr 17 16:03:00 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 17 Apr 2024 16:03:00 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: <8NRY-k0RMK3tXf7TYkI9H1TVCO-PrbR0x4FMlqFypQg=.96717c57-bfb8-4c24-9b19-bb3d45d1fc8d@github.com> On Wed, 17 Apr 2024 15:49:59 GMT, Roland Westrelin wrote: > I only ran the particular test that you mentioned in the bug. All right, let me run tests with #18751 applied and see if we have any surprises. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2061645615 From kvn at openjdk.org Wed Apr 17 16:24:48 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Apr 2024 16:24:48 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. two comments src/hotspot/share/opto/loopnode.cpp line 2969: > 2967: int scaled_iters = (int)scaled_iters_long; > 2968: if ((jlong)scaled_iters != scaled_iters_long) { > 2969: // Remove outer loop and safepoint (too few iterations) Please put more extended comment here. What you have in PR description would be nice. src/hotspot/share/opto/loopnode.cpp line 2973: > 2971: return; > 2972: } > 2973: int short_scaled_iters = LoopStripMiningIterShortLoop * ABS(stride); So stride is not MIN_INT here but the expression still can overflow. Should we use `jlong` for expression and `short_scaled_iters`? `iter_estimate` is `jlong`. ------------- PR Review: https://git.openjdk.org/jdk/pull/18813#pullrequestreview-2006487913 PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1569112810 PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1569123977 From duke at openjdk.org Wed Apr 17 18:49:26 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 17 Apr 2024 18:49:26 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) Message-ID: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. Passing hotspot tier1 locally on Linux machine. ------------- Commit messages: - 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) Changes: https://git.openjdk.org/jdk/pull/18824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329797 Stats: 48 lines in 5 files changed: 44 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18824/head:pull/18824 PR: https://git.openjdk.org/jdk/pull/18824 From dlong at openjdk.org Wed Apr 17 18:55:41 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 18:55:41 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. src/hotspot/share/opto/loopnode.cpp line 2973: > 2971: return; > 2972: } > 2973: int short_scaled_iters = LoopStripMiningIterShortLoop * ABS(stride); Isn't it true that `stride` can be MIN_INT here, if LoopStripMiningIter == 1? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1569331870 From matsaave at openjdk.org Wed Apr 17 19:17:17 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 17 Apr 2024 19:17:17 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v3] In-Reply-To: References: Message-ID: <9C3CW5yKaGY_rT2ISPfExOqUpI8o_xuh1NxFa1GI-fM=.c049c62f-4132-48a5-ac6c-38d2fe8fdfa7@github.com> > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Reverted change on arm32 - Merge branch 'master' into membar_8327647 - Moved membar inside load_field_entry - Corrected comments - Merge branch 'master' into membar_8327647 - Removed unneeded push/pop - Merge branch 'master' into membar_8327647 - Replace use of r0 with noreg - Added membars after load_field_entry() calls - 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18477/files - new: https://git.openjdk.org/jdk/pull/18477/files/4f976405..f612f947 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=01-02 Stats: 20681 lines in 356 files changed: 8945 ins; 10674 del; 1062 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From dlong at openjdk.org Wed Apr 17 19:36:01 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 19:36:01 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > 4668: const Register tmp5 = r15; > 4669: const Register tmp6 = r16; > 4670: const Register tmp7 = r17; Why not minimize changes and continue to use r5 for tmp0? I see no need for r17 or to reassign all the other tmp registers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569401544 From dlong at openjdk.org Wed Apr 17 19:48:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 19:48:02 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693: > 4691: const Register xlen = r1; > 4692: const Register z = r2; > 4693: const Register zlen = r3; LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as the 4th arg, even though the value is unused. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702: > 4700: const Register tmp5 = r15; > 4701: const Register tmp6 = r16; > 4702: const Register tmp7 = r17; No need for r17 or sorting tmps. Make tmp0 r3, or r6, r7, etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569419199 PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569420732 From dlong at openjdk.org Wed Apr 17 19:52:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 19:52:03 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: <4ghKQMWdpqPyMQzdFJH-IFlAwyoicv7CjNwub7XsvT8=.d8419240-b77b-4410-9da4-f1e0df9c022a@github.com> On Wed, 17 Apr 2024 19:45:02 GMT, Dean Long wrote: >> Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702: > >> 4700: const Register tmp5 = r15; >> 4701: const Register tmp6 = r16; >> 4702: const Register tmp7 = r17; > > No need for r17 or sorting tmps. Make tmp0 r3, or r6, r7, etc. Also, I don't see why the code below saves and restores r4/r5. Maybe @theRealAph knows? Aren't all these registers killed across a runtime call? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569427241 From dlong at openjdk.org Wed Apr 17 20:08:00 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 20:08:00 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6662: > 6660: push(tmp5); > 6661: > 6662: push(xlen); There may be an opportunity here (separate RFE?) to get rid of the save/restore for these. I don't think it's necessary if this is called as part of a C2 stub. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569452818 From dlong at openjdk.org Wed Apr 17 20:14:42 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 20:14:42 GMT Subject: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. I think you'll want to ask port maintainers for aarch64/arm/ppc/riscv/s390 to review and test those changes. There may be some opportunities for minor improvements, but those could be done later. For example, we are computing `zlen` for the squareToLen stub even though the value is unused. And both x86 and aarch64 seem to have unneeded save/restore code, even though I think all these registers are killed when called by a C2 runtime stub. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2062138149 From dlong at openjdk.org Wed Apr 17 23:32:56 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 17 Apr 2024 23:32:56 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. There's also a less obvious use of an abs() idiom in LoopLimitNode::Ideal, when it does 2579 stride_p = -stride_con; if stride_con is negative. Does it make sense to fix that as part of this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2062664375 From fyang at openjdk.org Wed Apr 17 23:40:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 Apr 2024 23:40:15 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v3] In-Reply-To: <9C3CW5yKaGY_rT2ISPfExOqUpI8o_xuh1NxFa1GI-fM=.c049c62f-4132-48a5-ac6c-38d2fe8fdfa7@github.com> References: <9C3CW5yKaGY_rT2ISPfExOqUpI8o_xuh1NxFa1GI-fM=.c049c62f-4132-48a5-ac6c-38d2fe8fdfa7@github.com> Message-ID: On Wed, 17 Apr 2024 19:17:17 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Reverted change on arm32 > - Merge branch 'master' into membar_8327647 > - Moved membar inside load_field_entry > - Corrected comments > - Merge branch 'master' into membar_8327647 > - Removed unneeded push/pop > - Merge branch 'master' into membar_8327647 > - Replace use of r0 with noreg > - Added membars after load_field_entry() calls > - 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow The riscv part is still lacking one change: diff --git a/src/hotspot/cpu/riscv/templateTable_riscv.cpp b/src/hotspot/cpu/riscv/templateTable_riscv.cpp index 58f57f32b2f..e27b35e793f 100644 --- a/src/hotspot/cpu/riscv/templateTable_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateTable_riscv.cpp @@ -2272,7 +2272,9 @@ void TemplateTable::load_resolved_field_entry(Register obj, __ load_unsigned_byte(flags, Address(cache, in_bytes(ResolvedFieldEntry::flags_offset()))); // TOS state - __ load_unsigned_byte(tos_state, Address(cache, in_bytes(ResolvedFieldEntry::type_offset()))); + if (tos_state != noreg) { + __ load_unsigned_byte(tos_state, Address(cache, in_bytes(ResolvedFieldEntry::type_offset()))); + } // Klass overwrite register if (is_static) { src/hotspot/cpu/riscv/templateTable_riscv.cpp line 3040: > 3038: __ load_field_entry(x12, x11); > 3039: > 3040: // X11: field offset, X12: TOS, X13: flags Suggestion: `// X11: field offset, X12: field holder, X13: flags` ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18477#pullrequestreview-2007541639 PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1569670754 From chagedorn at openjdk.org Thu Apr 18 06:03:56 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 06:03:56 GMT Subject: RFR: 8330419: Unused code in ConnectionGraph::specialize_castpp In-Reply-To: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> References: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> Message-ID: <72Q4nA_0HyrcVMMPJM4kKzMSNHGiF5RVLnlfyZDXT_g=.381fe7c3-67bf-4752-bc63-4b2ae7b3b9be@github.com> On Wed, 17 Apr 2024 00:05:23 GMT, Fei Yang wrote: > Please review this small code cleanup change. > > Noticed that `minus_one` local created in `ConnectionGraph::specialize_castpp` which is introduced by [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) is never used. I think it should be safe to remove this. Also renamed `boll` to `bol` to be consistent in naming with other places where we create a `BoolNode`. > > Tersting: tier1 tested on linux-aarch64 (release & fastdebug) Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18805#pullrequestreview-2007920559 From chagedorn at openjdk.org Thu Apr 18 06:22:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 06:22:58 GMT Subject: RFR: 8324950: IGV: save the state to a file [v31] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 09:11:13 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update GraphParser.java That is a nice feature! I've tried the following out which does not seem to be working as expected: 1) Open graphs.xml -> opens with extracted nodes as shown in PR description 2) Click "Clear workspace" 3) Open graphs.xml again -> this time, it opens the entire graph without selection. Same result when I completely restart IGV in step 2) instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2063084948 From epeter at openjdk.org Thu Apr 18 08:06:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 08:06:12 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v6] In-Reply-To: <8xmxstkq7D_wMBI-BhUcJzoJOn2bWcsUuQtXXIv4YMk=.a1a5baee-e2af-40b2-9df9-67642d90d565@github.com> References: <8xmxstkq7D_wMBI-BhUcJzoJOn2bWcsUuQtXXIv4YMk=.a1a5baee-e2af-40b2-9df9-67642d90d565@github.com> Message-ID: <9zThozzY0xAekz17NJ2PIwa-37r8M95MM_E4lJl-Kao=.124dfbb1-814e-4ae7-8c48-da3b37d5bb42@github.com> On Wed, 17 Apr 2024 15:46:30 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address some more review comments src/hotspot/cpu/aarch64/aarch64_vector.ad line 2858: > 2856: // reduction addF > 2857: > 2858: instruct reduce_non_strict_order_add2F_neon(vRegF dst, vRegF fsrc, vReg vsrc) %{ Now that you have changed the name of the method, you should also change the `format` in all of the methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1570212535 From bkilambi at openjdk.org Thu Apr 18 08:16:05 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Apr 2024 08:16:05 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v6] In-Reply-To: <9zThozzY0xAekz17NJ2PIwa-37r8M95MM_E4lJl-Kao=.124dfbb1-814e-4ae7-8c48-da3b37d5bb42@github.com> References: <8xmxstkq7D_wMBI-BhUcJzoJOn2bWcsUuQtXXIv4YMk=.a1a5baee-e2af-40b2-9df9-67642d90d565@github.com> <9zThozzY0xAekz17NJ2PIwa-37r8M95MM_E4lJl-Kao=.124dfbb1-814e-4ae7-8c48-da3b37d5bb42@github.com> Message-ID: On Thu, 18 Apr 2024 08:03:24 GMT, Emanuel Peter wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address some more review comments > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 2858: > >> 2856: // reduction addF >> 2857: >> 2858: instruct reduce_non_strict_order_add2F_neon(vRegF dst, vRegF fsrc, vReg vsrc) %{ > > Now that you have changed the name of the method, you should also change the `format` in all of the methods. Oh no! Sorry I missed that. Will do that right away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1570228658 From tholenstein at openjdk.org Thu Apr 18 08:19:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 08:19:21 GMT Subject: RFR: 8324950: IGV: save the state to a file [v32] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: open tabs in deterministic order (List instead of Set) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/a4b48c43..67e0ce00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=30-31 Stats: 17 lines in 4 files changed: 5 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From duke at openjdk.org Thu Apr 18 08:39:35 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Thu, 18 Apr 2024 08:39:35 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v6] In-Reply-To: References: Message-ID: > Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). > > ### Correctness checks > > Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. > > ### Performance results on T-Head board > > Enabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | > | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | > > Disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | > |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| > |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| > |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| > |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| > |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| > |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| > |Adler32.TestAdler32.testAdler32Update|5012|thrpt|25|100.246|1.119|ops/ms| > |Adler32.TestAdler32.testAdler32Update|8192|t... ArsenyBochkarev has updated the pull request incrementally with 12 additional commits since the last revision: - Use mv instead of li - Prettify function - Remove unnecessary zeroing of vtemp1, vtemp2 - Remove unnecessary zeroing of v4, ..., v27 - Remove unnecessary assert - Move similar unroll code to a function - Fix comment - Dispose of unnecessary arguments in accum function - Accelerate vectorization - Use two vredsum instead of vadd + vwredsum - Make use of more vector registers - Dispose of most of vsetivli instructions - Prettify loop remainder - ... and 2 more: https://git.openjdk.org/jdk/compare/8a74349c...3cf649c9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18382/files - new: https://git.openjdk.org/jdk/pull/18382/files/8a74349c..3cf649c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18382&range=04-05 Stats: 113 lines in 1 file changed: 41 ins; 52 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/18382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18382/head:pull/18382 PR: https://git.openjdk.org/jdk/pull/18382 From duke at openjdk.org Thu Apr 18 08:55:15 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Thu, 18 Apr 2024 08:55:15 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v2] In-Reply-To: <7Cx4jgZ678c3UAcArxmIyr-qm9xB136mRybsaOEtWv0=.ce17294a-41b7-45f3-97e2-489851a51fb4@github.com> References: <7Cx4jgZ678c3UAcArxmIyr-qm9xB136mRybsaOEtWv0=.ce17294a-41b7-45f3-97e2-489851a51fb4@github.com> Message-ID: On Wed, 10 Apr 2024 07:31:28 GMT, Fei Yang wrote: >> I witnessed performance regression on unmatched board when count > 2048. >> JMH numbers: >> >> Before: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32.testAdler32Update 64 thrpt 25 1050.761 ? 54.862 ops/ms >> TestAdler32.testAdler32Update 128 thrpt 25 953.858 ? 42.102 ops/ms >> TestAdler32.testAdler32Update 256 thrpt 25 821.011 ? 21.154 ops/ms >> TestAdler32.testAdler32Update 512 thrpt 25 624.207 ? 19.724 ops/ms >> TestAdler32.testAdler32Update 1024 thrpt 25 436.040 ? 5.875 ops/ms >> TestAdler32.testAdler32Update 2048 thrpt 25 265.020 ? 3.058 ops/ms >> TestAdler32.testAdler32Update 5012 thrpt 25 124.934 ? 0.799 ops/ms >> TestAdler32.testAdler32Update 8192 thrpt 25 70.026 ? 0.243 ops/ms >> TestAdler32.testAdler32Update 16384 thrpt 25 35.885 ? 0.055 ops/ms >> TestAdler32.testAdler32Update 32768 thrpt 25 16.883 ? 0.027 ops/ms >> TestAdler32.testAdler32Update 65536 thrpt 25 7.648 ? 0.006 ops/ms >> >> After: >> Benchmark (count) Mode Cnt Score Error Units >> TestAdler32.testAdler32Update 64 thrpt 25 4360.280 ? 39.921 ops/ms >> TestAdler32.testAdler32Update 128 thrpt 25 2766.595 ? 16.027 ops/ms >> TestAdler32.testAdler32Update 256 thrpt 25 1634.373 ? 5.412 ops/ms >> TestAdler32.testAdler32Update 512 thrpt 25 880.028 ? 1.463 ops/ms >> TestAdler32.testAdler32Update 1024 thrpt 25 457.724 ? 0.296 ops/ms >> TestAdler32.testAdler32Update 2048 thrpt 25 233.605 ? 0.072 ops/ms >> TestAdler32.testAdler32Update 5012 thrpt 25 96.610 ? 0.020 ops/ms >> TestAdler32.testAdler32Update 8192 thrpt 25 59.275 ? 0.012 ops/ms >> TestAdler32.testAdler32Update 16384 thrpt 25 29.726 ? 0.004 ops/ms >> TestAdler32.testAdler32Update 32768 thrpt 25 14.736 ? 0.009 ops/ms >> TestAdler32.testAdler32Update 65536 thrpt 25 6.658 ? 0.002 ops/ms > >> @RealFYang Hi, thanks for pointing out! To achieve additional acceleration, I did a vectorization and re-measured performance on Kendryte K230 with RVV 1.0 enabled: > > That's great to hear! I was not aware that it could run a full-featured Linux system. > May I ask what kind of Linux distro are you running with? > >> It seems to me that there's a huge room for improvement in the current implementation. > > Have you finished improving this with RVV 1.0? I can take another look when that is done. > >> BTW, the data I used as a comparison from T-Head board was recorded a few months ago. Is it the code generation that has improved significantly? Or it's just me making some kind of mistake in measurements? > > I am not sure what you mean. But I don't think there is a big change in this part? Hi @RealFYang! Sorry for such a late reply. I was able to improve vectorization, and did the performance measurements for RVV 0.7.1 on LicheePi4 (the code in `stubGenerator` was functionally identical, but some encodings modifications were made in `assembler_riscv` file): Intrinsic enabled: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------------ | ---------- | ---------- | ------ | --------- | ------- | -------- | | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 7342.196 | 3.364 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 4520.467 | 3.239 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 2555.269 | 0.929 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1355.723 | 1.178 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 705.539 | 0.626 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 360.281 | 0.131 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 148.970 | 0.079 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 180.018 | 0.153 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 90.414 | 0.136 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 59.876 | 0.263 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 35.046 | 0.074 | ops/ms | Intrinsic disabled: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------------ | ---------- | ---------- | ------ | --------- | ------- | -------- | | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 1319.132 | 8.605 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 1240.402 | 7.998 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1106.121 | 2.723 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 905.468 | 19.780 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 684.968 | 2.665 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 451.938 | 1.047 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 228.727 | 0.238 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 150.421 | 1.016 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 79.323 | 0.364 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 40.986 | 0.122 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 19.969 | 0.194 | ops/ms | As for Kendryte K230, I'm not able to do a full-size measurements at the moment, but I have numbers for 32768 and 65536 input lengths: Intrinsic enabled: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------------ | ---------- | ---------- | ------ | --------- | ------- | -------- | | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 34.023 | 0.093 | ops/ms | | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 17.723 | 0.042 | ops/ms | Results for disabled intrinsic are [here](https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255). So, @RealFYang can you take another look, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18382#issuecomment-2063362137 From epeter at openjdk.org Thu Apr 18 09:08:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 09:08:15 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 13:19:11 GMT, Christian Hagedorn wrote: > This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. > > #### Background > > The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. > > Thanks, > Christian Nice refactoring! I left a few comments/suggestions. src/hotspot/share/opto/loopTransform.cpp line 1450: > 1448: if (m != nullptr) { > 1449: wq.push(m); > 1450: } This code now looks almost identical to `TemplateAssertionPredicateExpressionNode::find_opaque_loop_nodes`, except that there we just are looking for one init/stride, and here we count them. Can we refactor this? Idea: have a callback/lambda we call on init/stride nodes. That callback can then count, or give a "terminate" return: `enum Action { Continue, Terminate };`. Or maybe you just use the count method also for checking if there are any init/stride. That is slightly more expensive, but maybe it does not matter. Or you give your count method a parameter that simply terminates early, and a return value true/false if we found any. Lots of options. src/hotspot/share/opto/predicates.cpp line 337: > 335: > 336: // Check if this node belongs a Template Assertion Predicate Expression (including OpaqueLoop* nodes). > 337: bool TemplateAssertionPredicateExpressionNode::find_opaque_loop_nodes(Node* node) { `find` usually tells me that you are going to return what you "find". You could name it more like what you have in the description: `is_in_expression` or `belongs_to_expression`. Also, looks like the single use of this is in `is_valid`, and that just wraps it. Is that intended? src/hotspot/share/opto/predicates.cpp line 354: > 352: } > 353: > 354: void TemplateAssertionPredicateExpressionNode::push_non_null_inputs(Unique_Node_List& list, const Node* node) { Why not make this a method in Node? `node->push_non_null_inputs(list)`. If that can be part of the header file, then it would even be efficiently inlined, I assume. We could then use it all over the place! Well, you should probably indicate that you are not traversing `in(0)`... not sure what would be an adequate name. src/hotspot/share/opto/predicates.cpp line 367: > 365: } > 366: > 367: void TemplateAssertionPredicateExpressionNode::push_outputs(Unique_Node_List& list, const Node* node) { Why not make this a method in `Node`? `node->push_outs(list)`. src/hotspot/share/opto/predicates.hpp line 297: > 295: // - Two: A OpaqueLoopInitNode could be part of two Template Assertion Predicates. > 296: // - One: In all other cases. > 297: class TemplateAssertionPredicateExpressionNode : public StackObj { I have a slight irritation that this has a `...Node` suffix. Indicates that it is a subclass of `Node`, which is not correct. But probably it is still a good name, so you can leave it. src/hotspot/share/opto/predicates.hpp line 314: > 312: public: > 313: // Check whether the provided node is part of a Template Assertion Predicate Expression or not. > 314: static bool is_valid(Node* node) { I would rename this too. To keep it similar to `is_maybe_in_template_assertion_predicate_expression`, name it `is_in_template_assertion_predicate_expression`. Hmm. You could also shorten the names, since we know from the context that we are talking about TAPE: is_in_expression is_maybe_in_expression Also: why is there the `find_opaque_loop_nodes` method? Is it even used anywhere else? src/hotspot/share/opto/predicates.hpp line 320: > 318: // Check if the opcode of node could be found in a Template Assertion Predicate Expression. > 319: // This also provides a fast check whether a node is unrelated. > 320: static bool valid_opcode(const Node* node) { I'm not a fan of this name, it implies that nodes could be "valid" or "invalid". For one, I think this should start with `is_...`. This would be very long, but at least more accurate: `is_maybe_in_template_assertion_predicate_expression`. src/hotspot/share/opto/predicates.hpp line 344: > 342: // Expression Node belongs to. > 343: template > 344: void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) { Suggestion: void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) const { Would that work? src/hotspot/share/opto/predicates.hpp line 360: > 358: } > 359: assert(template_counter <= 2, "a node cannot be part of more than two templates"); > 360: assert(template_counter <= 1 || _node->is_OpaqueLoopInit(), "only OpaqueLoopInit nodes can be part of two templates"); Can this be true? Maybe there are some implicit assumptions about `_node`. But if it was for example an `AddI`, then this node could be used all over the place, and certainly could be used by many TAPE. Am I wrong? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18723#pullrequestreview-2008170891 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570266730 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570252881 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570304152 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570302342 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570317355 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570243984 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570233668 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570289388 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570311331 From epeter at openjdk.org Thu Apr 18 09:08:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 09:08:15 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates In-Reply-To: References: Message-ID: <3PFvJiUnIewzcPe_9LHMNyNRdVidXStvXNfapaueoKs=.783ca074-c94c-40d5-acda-b66ff55c94a7@github.com> On Thu, 18 Apr 2024 08:36:08 GMT, Emanuel Peter wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > src/hotspot/share/opto/loopTransform.cpp line 1450: > >> 1448: if (m != nullptr) { >> 1449: wq.push(m); >> 1450: } > > This code now looks almost identical to `TemplateAssertionPredicateExpressionNode::find_opaque_loop_nodes`, except that there we just are looking for one init/stride, and here we count them. Can we refactor this? > > Idea: have a callback/lambda we call on init/stride nodes. That callback can then count, or give a "terminate" return: `enum Action { Continue, Terminate };`. > > Or maybe you just use the count method also for checking if there are any init/stride. That is slightly more expensive, but maybe it does not matter. Or you give your count method a parameter that simply terminates early, and a return value true/false if we found any. Lots of options. Ah wait. That is what we used to do: count to check if there are any. That is what `subgraph_has_opaque` did. > src/hotspot/share/opto/predicates.hpp line 344: > >> 342: // Expression Node belongs to. >> 343: template >> 344: void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) { > > Suggestion: > > void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) const { > > Would that work? I think the name `ApplyToTemplateFunction` / `apply_to_template_function` is more noisy than necessary. I would just do `Callback` and `callback`. In the for-each context it is clear what this means. > src/hotspot/share/opto/predicates.hpp line 360: > >> 358: } >> 359: assert(template_counter <= 2, "a node cannot be part of more than two templates"); >> 360: assert(template_counter <= 1 || _node->is_OpaqueLoopInit(), "only OpaqueLoopInit nodes can be part of two templates"); > > Can this be true? Maybe there are some implicit assumptions about `_node`. But if it was for example an `AddI`, then this node could be used all over the place, and certainly could be used by many TAPE. Am I wrong? If these asserts are correct, you probably want to add some comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570281819 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570298607 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570312418 From galder at openjdk.org Thu Apr 18 09:11:40 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Apr 2024 09:11:40 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: - Use vmIntrinsics instead of vmIntrinsicID - Fix formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/ad6c51bf..2d8854d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=08-09 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Thu Apr 18 09:11:40 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Apr 2024 09:11:40 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:20:21 GMT, Martin Doerr wrote: >> FAO @bulasevich @TheRealMDoerr @RealFYang @RealLucy I've created [JDK-8330472](https://bugs.openjdk.org/browse/JDK-8330472) to port the changes here to arm/ppc/riscv/s390. Also, the changes in this PR have been in made in such way that they only affect architectures on which the intrinsic is implemented. Would you also be able to test the changes in this PR to make sure no regressions are introduced on these archs? > >> FAO @bulasevich @TheRealMDoerr @RealFYang @RealLucy I've created [JDK-8330472](https://bugs.openjdk.org/browse/JDK-8330472) to port the changes here to arm/ppc/riscv/s390. Also, the changes in this PR have been in made in such way that they only affect architectures on which the intrinsic is implemented. Would you also be able to test the changes in this PR to make sure no regressions are introduced on these archs? > > Thanks! I'll test it. We will need separate issues for these platforms because they are maintained by different people. @TheRealMDoerr I've pushed a couple of commits to address your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2063393420 From galder at openjdk.org Thu Apr 18 09:32:04 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 18 Apr 2024 09:32:04 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v3] In-Reply-To: <3GGBYPlmvwL1gMb_dBiiFpbx7x8F3QW6Qz_sv4Kra14=.a1d6b362-d94f-4c57-b73e-a65ed4e59cf5@github.com> References: <3GGBYPlmvwL1gMb_dBiiFpbx7x8F3QW6Qz_sv4Kra14=.a1d6b362-d94f-4c57-b73e-a65ed4e59cf5@github.com> Message-ID: <-pqVKz8E2e4pgmF3Pu9kZBuMy6onSI8v-FlpKwhrPjY=.3625a13a-a236-45f4-b63b-d899af374e69@github.com> On Wed, 17 Apr 2024 12:19:52 GMT, Christian Hagedorn wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Use failOn instead of counts > > Testing passed, looks good! @chhagedorn @rwestrel CI shows an error in `java/util/HashMap/WhiteBoxResizeTest.java` that doesn't look related to these changes: 2024-04-17T11:39:45.0144128Z STDOUT: 2024-04-17T11:39:45.0144587Z Error occurred during initialization of VM 2024-04-17T11:39:45.0145397Z Could not reserve enough space for 2097152KB object heap What do we do about it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18738#issuecomment-2063433762 From mdoerr at openjdk.org Thu Apr 18 09:35:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 18 Apr 2024 09:35:11 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:11:40 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Use vmIntrinsics instead of vmIntrinsicID > - Fix formatting Thanks for cleaning this up! Tests have passed on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2063439694 From epeter at openjdk.org Thu Apr 18 09:43:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 09:43:02 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: On Wed, 17 Apr 2024 15:37:30 GMT, Kangcheng Xu wrote: >> Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized. >> >> Additionally, this ticket contributes to the resolution of [JDK-8275913](https://bugs.openjdk.org/browse/JDK-8275913). Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > update comments to clarify on type casting Nice work! I like it, but you need to improve the comments and tests. src/hotspot/share/opto/loopnode.cpp line 3921: > 3919: // int a = init2 > 3920: // for (int i = init; i < limit; i += stride) { > 3921: // a = init2 + (i - init) * (stride2 / stride) I like that you are putting comments here, I think it will help. But they seem not quite correct. If `stride2 = 2` and `stride = 4`, then the division would be rounded down to zero. Can you be more precise about the order of operations, and rounding issues? test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 24: > 22: */ > 23: > 24: package compiler.c2.irTests; Putting IR tests into the `irTests` directory is what we did at the beginning, when we assumed IR tests would not be widely adopted. But now it makes more sense to put this test where it belongs "thematically". I suggest you put it under `compiler/loopopts`, or even in a new subdirectory: `compiler/loopopts/parallel_iv`. Also the name of this test could be more expressive: `TestLongParallelIvInIntCountedLoop.java` test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 60: > 58: } > 59: > 60: private static int testIntCountedLoopWithIntIVZero(int stop) { Why do you not have a `@Test` for every test? Are you sure that these will even be compiled currently? test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 63: > 61: int a = 0; > 62: for (int i = 0; i < stop; i++) { > 63: a += 0; // we unfortunately have to repeat ourselves because the operand has to be a constant I don't understand your comment. Why is this test interesting? test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 102: > 100: long a = 0; > 101: for (int i = 0; i < stop; i++) { > 102: a += Long.MAX_VALUE; Can you also try a value that is just slightly over int_max and one slightly below int_min? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18489#pullrequestreview-2008308135 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570354038 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570329596 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570335355 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570337160 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570339249 From epeter at openjdk.org Thu Apr 18 09:43:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 09:43:03 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> Message-ID: <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> On Thu, 18 Apr 2024 09:25:58 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> update comments to clarify on type casting > > src/hotspot/share/opto/loopnode.cpp line 3921: > >> 3919: // int a = init2 >> 3920: // for (int i = init; i < limit; i += stride) { >> 3921: // a = init2 + (i - init) * (stride2 / stride) > > I like that you are putting comments here, I think it will help. But they seem not quite correct. > If `stride2 = 2` and `stride = 4`, then the division would be rounded down to zero. > Can you be more precise about the order of operations, and rounding issues? Can you also be consistent with the names all the way through your comments? I suggest you just only use `stride_con`, and not `stride`. You can use `i` and `a`, if you want. But then it would be helpful if you had two lines with identical expressions, but where you make the transition from `i` to `phi`. > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 60: > >> 58: } >> 59: >> 60: private static int testIntCountedLoopWithIntIVZero(int stop) { > > Why do you not have a `@Test` for every test? Are you sure that these will even be compiled currently? And why no IR rules for these? > test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 102: > >> 100: long a = 0; >> 101: for (int i = 0; i < stop; i++) { >> 102: a += Long.MAX_VALUE; > > Can you also try a value that is just slightly over int_max and one slightly below int_min? Generally, it would be nice if you had more cases where we are checking overflows. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570358316 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570335960 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570371373 From epeter at openjdk.org Thu Apr 18 09:43:03 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 09:43:03 GMT Subject: RFR: 8328528: C2 should optimize long-typed parallel iv in an int counted loop [v3] In-Reply-To: <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> References: <4kAmwISMCRKsdUxHZsI9SdO-8rLy7a3GbFXd2ERlpi0=.8f87e9e4-cb37-43b6-b17e-a6b424e60c83@github.com> <8zWNeJWcumovt4jcMMCbbhfQJVKDypVM2nR6xRUGx3U=.760cf413-7503-4ab2-a2c2-955f430ee4b4@github.com> Message-ID: <0YMwCJtOCiJU6gDibC6awo-iowi3wFuOKPM32sHkGRA=.34e4fec1-ffb9-4ac8-ac2e-35a1c9494020@github.com> On Thu, 18 Apr 2024 09:28:28 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 3921: >> >>> 3919: // int a = init2 >>> 3920: // for (int i = init; i < limit; i += stride) { >>> 3921: // a = init2 + (i - init) * (stride2 / stride) >> >> I like that you are putting comments here, I think it will help. But they seem not quite correct. >> If `stride2 = 2` and `stride = 4`, then the division would be rounded down to zero. >> Can you be more precise about the order of operations, and rounding issues? > > Can you also be consistent with the names all the way through your comments? I suggest you just only use `stride_con`, and not `stride`. You can use `i` and `a`, if you want. But then it would be helpful if you had two lines with identical expressions, but where you make the transition from `i` to `phi`. Ah. It seems that we require `stride2 / stride` to be a lossless division in the code. A comment about that limitation would be helpful. And I think you should also check if there are tests that cover cases where the division would be lossy. >> test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 60: >> >>> 58: } >>> 59: >>> 60: private static int testIntCountedLoopWithIntIVZero(int stop) { >> >> Why do you not have a `@Test` for every test? Are you sure that these will even be compiled currently? > > And why no IR rules for these? You definately need more tests with IR rules. >> test/hotspot/jtreg/compiler/c2/irTests/TestCountedLoopIV.java line 102: >> >>> 100: long a = 0; >>> 101: for (int i = 0; i < stop; i++) { >>> 102: a += Long.MAX_VALUE; >> >> Can you also try a value that is just slightly over int_max and one slightly below int_min? > > Generally, it would be nice if you had more cases where we are checking overflows. And some with negative strides would be great too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570364776 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570374047 PR Review Comment: https://git.openjdk.org/jdk/pull/18489#discussion_r1570372407 From roland at openjdk.org Thu Apr 18 10:12:05 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Apr 2024 10:12:05 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 18:53:13 GMT, Dean Long wrote: >> This fixes 3 calls to ABS with a min int argument. I think all of them >> are harmless: >> >> - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The >> check is for a stride of 1 or -1. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the >> computation of `scaled_iters_long`, the stride is passed to `ABS()` >> and then implicitly casted to long. I now cast the stride to long >> before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` >> overflows the int range for all values of `LoopStripMiningIter` >> except 0 or 1. Those values are handled early on in that method. So >> for a min in stride: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> is always true and the method returns early. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the >> computation of `short_scaled_iters` also calls `ABS()` with the >> stride as argument. But the result of that computation is only used >> if the test for: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> doesn't cause an early return of the method. I reordered statements >> so the `ABS()` calls happens after that test which will cause an early >> return if the stride is min int. > > src/hotspot/share/opto/loopnode.cpp line 2973: > >> 2971: return; >> 2972: } >> 2973: int short_scaled_iters = LoopStripMiningIterShortLoop * ABS(stride); > > Isn't it true that `stride` can be MIN_INT here, if LoopStripMiningIter == 1? There's a test for ` LoopStripMiningIter == 1` earlier in the method that causes the method to return. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1570421185 From epeter at openjdk.org Thu Apr 18 10:20:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 10:20:07 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8320649 > - review > - test fix > - test fix > - Merge branch 'master' into JDK-8320649 > - whitespaces > - review > - Merge branch 'master' into JDK-8320649 > - review > - 32 bit build fix > - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e Review of your tests. test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. I like the tests, there is a lot of material here. A few more ideas: - have two scoped values, and then have a sequence of `get` and `getValue` calls on them, in some random mix. And check that everything gets commoned, and the result is correct. - have a method that directly uses `get`, but also has inner scopes of `where`/`get`. Interleave these, maybe even with multiple different scoped values. And nest them with various depths. And then verify both the expected number of calls / loads, as well as the result. Also: is it possible to stuff ScopedValues into ScopedValues? That would be another interesting stress-test with lots of options. test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 24: > 22: */ > 23: > 24: package compiler.c2.irTests; Christian and I have discussed this a while back: it would be nicer to put tests where they belong thematically. For example now it would be difficult to find all ScopedValue compiler tests, some are in the `irTests` directory, some elsewhere. Hence, I suggest you put them all under `compiler/scoped_value` or similar. test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 137: > 135: MyDouble sv1 = sv.get(); > 136: notInlined(); > 137: MyDouble sv2 = sv.get(); // Doesn't optimize out (load of sv cannot common) Is this a necessary constraint, or a limitation of the optimization? Please add a corresponding comment. That would be helpful if this test all of the sudden failed the IR rule, and one has to debug. test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 185: > 183: @IR(counts = {IRNode.IF, "<= 4", IRNode.LOAD_P_OR_N, "<= 5" }) > 184: public static void testFastPath5() { > 185: Object unused = svObject.get(); // cannot be removed if result not used why? could there be some exception? please add comment why. test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 236: > 234: @IR(counts = {IRNode.LOAD_D, "1" }) > 235: public static double testFastPath7(boolean[] flags) { > 236: double res = 0; Suggestion: double res = 0; // hoisted here before the loop, and commoned. Would that be correct? test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 485: > 483: TestFramework.assertDeoptimizedByC2(m); > 484: // Compile again > 485: runAndCompile15(); Might it be good to do some result verification, i.e. that the `get` always returns the expected object? test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 524: > 522: MyDouble sv1 = localSV.get(); > 523: notInlined(); > 524: MyDouble sv2 = localSV.get(); // should optimize out Why does this work now, and some other cases with `notInlined` in between do not work? test/hotspot/jtreg/compiler/scoped_value/TestScopedValueBadDominatorAfterExpansion.java line 30: > 28: * @summary SIGSEGV in PhaseIdealLoop::get_early_ctrl() > 29: * @compile --enable-preview -source ${jdk.version} TestScopedValueBadDominatorAfterExpansion.java > 30: * @run main/othervm --enable-preview -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:+UseParallelGC TestScopedValueBadDominatorAfterExpansion Why is the parallel GC required here? Can you also have a run without flags, so that other GC's could be tried with this code? test/hotspot/jtreg/compiler/scoped_value/TestScopedValueBadDominatorAfterExpansion.java line 40: > 38: > 39: public static void main(String[] args) { > 40: Object o = new Object(); Would it not be nice to use 2 objects, and verify the fields afterwards, that they have the correct objects? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-2008386881 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570432089 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570395734 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570400334 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570403293 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570406718 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570419353 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570421326 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570390446 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570388963 From epeter at openjdk.org Thu Apr 18 10:20:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 10:20:08 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 09:51:50 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'master' into JDK-8320649 >> - review >> - test fix >> - test fix >> - Merge branch 'master' into JDK-8320649 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8320649 >> - review >> - 32 bit build fix >> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e > > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 24: > >> 22: */ >> 23: >> 24: package compiler.c2.irTests; > > Christian and I have discussed this a while back: it would be nicer to put tests where they belong thematically. For example now it would be difficult to find all ScopedValue compiler tests, some are in the `irTests` directory, some elsewhere. Hence, I suggest you put them all under `compiler/scoped_value` or similar. Where are the already existing ScopedValue tests? > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 137: > >> 135: MyDouble sv1 = sv.get(); >> 136: notInlined(); >> 137: MyDouble sv2 = sv.get(); // Doesn't optimize out (load of sv cannot common) > > Is this a necessary constraint, or a limitation of the optimization? Please add a corresponding comment. That would be helpful if this test all of the sudden failed the IR rule, and one has to debug. If this was in a loop, the two `get` would be hoisted, and commoned, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570396797 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570410765 From chagedorn at openjdk.org Thu Apr 18 10:52:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 10:52:01 GMT Subject: RFR: 8324950: IGV: save the state to a file [v32] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 08:19:21 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > open tabs in deterministic order (List instead of Set) As discussed offline, the confusion was due to opening the tabs in a random order each time a file is opened. The fix now looks good. Only a few minor comments - I'm not very familiar with the code. I've dong some testing with IGV and the features seem to work. Great job! src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 328: > 326: this.clearWorkspace(); > 327: this.open(); // Reopen the OutlineTopComponent > 328: this.requestActive(); `this` can probably be removed Suggestion: clearWorkspace(); open(); // Reopen the OutlineTopComponent requestActive(); src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 388: > 386: } > 387: frame.dispose(); > 388: return false; Could be simplified to: frame.dispose(); return result == JOptionPane.YES_OPTION; src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 499: > 497: if (path == null || Files.notExists(Path.of(path))) { > 498: return; > 499: } Can `path` really be `null` here? src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/serialization/Parser.java line 405: > 403: } else { > 404: // Blocks and their nodes defined: add other nodes to an > 405: // artificial "no block" block Suggestion: // artificial "no block" block ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-2008035346 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570456302 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570458917 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570464972 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570470708 From chagedorn at openjdk.org Thu Apr 18 10:52:03 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 10:52:03 GMT Subject: RFR: 8324950: IGV: save the state to a file [v31] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 09:11:13 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update GraphParser.java src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 130: > 128: * Obtain the OutlineTopComponent instance. Never call {@link #getDefault} directly! > 129: */ > 130: public static synchronized OutlineTopComponent findInstance() { Could this method then be private when it should never be called directly? src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 149: > 147: if (path == null) { > 148: return; > 149: } Can `path` really be `null` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570128283 PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570131672 From mli at openjdk.org Thu Apr 18 11:14:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Apr 2024 11:14:06 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle Message-ID: Hi, Can you help to review the patch for instrinsic VectorLoadShuffle? BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. Thanks ## Test test/jdk/jdk/incubator/vector/ test/hotspot/jtreg/compiler/vectorapi ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/18835/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18835&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321014 Stats: 42 lines in 1 file changed: 42 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18835/head:pull/18835 PR: https://git.openjdk.org/jdk/pull/18835 From mli at openjdk.org Thu Apr 18 11:55:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Apr 2024 11:55:33 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v5] In-Reply-To: References: Message-ID: > Hi, > Can you have a review on this patch to add RoundVF/RoundDF intrinsics? > Thanks! > > ## Tests > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > > test/jdk/java/lang/Math/RoundTests.java Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' into round-F+D-v - restore round mode back to rne - Merge branch 'master' into round-F+D-v - fix minors - merge master - fix space - add tests - add test cases - v2: (src + 0.5) + rdn - Fix corner cases - ... and 3 more: https://git.openjdk.org/jdk/compare/b648ed0a...baca01db ------------- Changes: https://git.openjdk.org/jdk/pull/17745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17745&range=04 Stats: 242 lines in 7 files changed: 238 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17745/head:pull/17745 PR: https://git.openjdk.org/jdk/pull/17745 From mli at openjdk.org Thu Apr 18 11:55:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Apr 2024 11:55:33 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: <3wamGp9toFZEr7IO54NC4VOU8dAfpL2WJyWTSNv0m_s=.ebec8482-610a-4f92-9f42-5fe79b41dd23@github.com> Message-ID: On Thu, 11 Apr 2024 10:54:03 GMT, Hamlin Li wrote: >> Thanks for discussion. >> Sure. Let me do some investigation and fix it first. > > tracked by https://bugs.openjdk.org/browse/JDK-8330094 I have merged master (including https://github.com/openjdk/jdk/pull/18785, https://github.com/openjdk/jdk/pull/18758), and rerun the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1570576724 From epeter at openjdk.org Thu Apr 18 12:31:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:19 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: <6-Yl6oBb-GdMyxY9DdqLcJbkZGxjItUvr2xHF3rFYk0=.2fd21903-990c-4d60-9ff8-0606506ba86d@github.com> On Thu, 18 Apr 2024 11:46:44 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'master' into JDK-8320649 >> - review >> - test fix >> - test fix >> - Merge branch 'master' into JDK-8320649 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8320649 >> - review >> - 32 bit build fix >> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e > > src/hotspot/share/opto/loopnode.hpp line 1801: > >> 1799: Node*&second_index, >> 1800: float &prob_cache_miss_at_first_if, float &first_if_cnt, >> 1801: float &prob_cache_miss_at_second_if, float &second_if_cnt) const; > > Suggestion: > > void find_most_likely_cache_index(const ScopedValueGetHitsInCacheNode* hits_in_cache, Node*& first_index, > Node*& second_index, > float& prob_cache_miss_at_first_if, float& first_if_cnt, > float& prob_cache_miss_at_second_if, float& second_if_cnt) const; That is also what you have at the definition. > src/hotspot/share/opto/loopopts.cpp line 3783: > >> 3781: // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together >> 3782: move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i); >> 3783: incr = false; > > Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`? Could we otherwise exhaust the node limit, by peeling a loop that is too large? > src/hotspot/share/opto/loopopts.cpp line 3997: > >> 3995: } >> 3996: >> 3997: void PhaseIdealLoop::move_scoped_value_nodes_to_not_peel(VectorSet &peel, VectorSet ¬_peel, Node_List &peel_list, > > Can you please add more comments to help the reader understand? So we are not peeling in this case? Maybe rename to `move_scoped_value_nodes_to_avoid_peeling_it` > src/hotspot/share/opto/loopopts.cpp line 4010: > >> 4008: peel.remove(hits_in_cache->_idx); >> 4009: not_peel.set(hits_in_cache->_idx); >> 4010: peel_list.remove(i); > > Looks like duplicated code from the call-site. A refactoring may help. I think you could combine the code with the case: `if (n->in(0) == nullptr && !n->is_Load() && !n->is_CMove()) {` And then you would have this code here, as well as the `TracePartialPeeling` code shared for both. > src/hotspot/share/opto/subnode.hpp line 341: > >> 339: assert(req() == Index1, "wrong of inputs for ScopedValueGetHitsInCacheNode"); >> 340: add_req(index1); >> 341: assert(req() == Index2, "wrong of inputs for ScopedValueGetHitsInCacheNode"); > > Suggestion: > > assert(req() == Index2, "wrong number of inputs for ScopedValueGetHitsInCacheNode"); same for the others ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570567702 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570620922 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570552244 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570562435 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570441200 From epeter at openjdk.org Thu Apr 18 12:31:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:19 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8320649 > - review > - test fix > - test fix > - Merge branch 'master' into JDK-8320649 > - whitespaces > - review > - Merge branch 'master' into JDK-8320649 > - review > - 32 bit build fix > - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e Some more requests / comments. src/hotspot/share/opto/graphKit.hpp line 71: > 69: > 70: const Type* scopedValueCache_type(); > 71: Node* scopedValueCache_handle(); Should these not all be `make_` methods? src/hotspot/share/opto/graphKit.hpp line 914: > 912: Node* vector_shift_count(Node* cnt, int shift_op, BasicType bt, int num_elem); > 913: > 914: Node* scopedValueCache(); Should these not all be `make_` methods? src/hotspot/share/opto/intrinsicnode.cpp line 380: > 378: > 379: IfNode* ScopedValueGetLoadFromCacheNode::iff() const { > 380: return in(0)->in(0)->as_If(); Suggestion: return in(0)->as_IfTrue()->in(0)->as_If(); Would make verification below implied. src/hotspot/share/opto/intrinsicnode.cpp line 386: > 384: void ScopedValueGetLoadFromCacheNode::verify() const { > 385: // check a ScopedValueGetHitsInCache guards this ScopedValueGetLoadFromCache > 386: assert(in(0)->Opcode() == Op_IfTrue, "unexpected ScopedValueGetLoadFromCache shape"); You could remove this, if you just added `as_IfTrue` to `iff()` src/hotspot/share/opto/intrinsicnode.cpp line 390: > 388: assert(iff->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape"); > 389: assert(iff->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape"); > 390: assert(iff->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape"); Suggestion: assert(iff()->in(1)->is_Bool(), "unexpected ScopedValueGetLoadFromCache shape"); assert(iff()->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, "unexpected ScopedValueGetLoadFromCache shape"); assert(iff()->in(1)->in(1) == in(1), "unexpected ScopedValueGetLoadFromCache shape"); src/hotspot/share/opto/loopPredicate.cpp line 1561: > 1559: IfNode* iff, IfProjNode*& new_predicate_proj) { > 1560: BoolNode* bol = iff->in(1)->as_Bool(); > 1561: if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache){ Suggestion: if (bol->in(1)->Opcode() != Op_ScopedValueGetHitsInCache) { src/hotspot/share/opto/loopPredicate.cpp line 1627: > 1625: // It is easier to re-create the cache load subgraph rather than trying to change the inputs of the existing one to move > 1626: // it out of loops > 1627: Node* PhaseIdealLoop::scoped_value_cache_node(Node* raw_mem) { Suggestion: Node* PhaseIdealLoop::make_scoped_value_cache_node(Node* raw_mem_slice) { This is really a `make` method. Not sure about the `slice`, just an idea. src/hotspot/share/opto/loopnode.hpp line 703: > 701: bool policy_peeling(PhaseIdealLoop* phase, bool scoped_value_only); > 702: > 703: uint estimate_peeling(PhaseIdealLoop* phase, bool peel_only_if_has_scoped_value); Can we use the same name for `scoped_value_only` and `peel_only_if_has_scoped_value`? In `policy_peeling` you pass the value into `estimate_peeling`, so it seems to be the same. Somehow it does not sit well with me that we have such a special-case flag in such a high-level and general method. But I don't know a fix now. It just looks like not the best design. But that may not be your fault. Are there any alternatives? src/hotspot/share/opto/loopnode.hpp line 1801: > 1799: Node*&second_index, > 1800: float &prob_cache_miss_at_first_if, float &first_if_cnt, > 1801: float &prob_cache_miss_at_second_if, float &second_if_cnt) const; Suggestion: void find_most_likely_cache_index(const ScopedValueGetHitsInCacheNode* hits_in_cache, Node*& first_index, Node*& second_index, float& prob_cache_miss_at_first_if, float& first_if_cnt, float& prob_cache_miss_at_second_if, float& second_if_cnt) const; src/hotspot/share/opto/loopopts.cpp line 3783: > 3781: // ScopedValueGetLoadFromCache and companion ScopedValueGetHitsInCacheNode must stay together > 3782: move_scoped_value_nodes_to_not_peel(peel, not_peel, peel_list, sink_list, i); > 3783: incr = false; Do we not have to increment the `cloned_for_outside_use`, which affects the `estimate`? src/hotspot/share/opto/loopopts.cpp line 3997: > 3995: } > 3996: > 3997: void PhaseIdealLoop::move_scoped_value_nodes_to_not_peel(VectorSet &peel, VectorSet ¬_peel, Node_List &peel_list, Can you please add more comments to help the reader understand? So we are not peeling in this case? src/hotspot/share/opto/loopopts.cpp line 4010: > 4008: peel.remove(hits_in_cache->_idx); > 4009: not_peel.set(hits_in_cache->_idx); > 4010: peel_list.remove(i); Looks like duplicated code from the call-site. A refactoring may help. src/hotspot/share/opto/multnode.cpp line 284: > 282: if (u->is_CFG()) { > 283: if (wq.size() >= path_limit) { > 284: return false; Can you add a comment why it is safe to just return `false`, even if we might have returned `true` if the limit was higher? src/hotspot/share/opto/node.hpp line 1660: > 1658: map(i, top_of_stack); > 1659: } > 1660: } Hmm. Technically, I think you now need to implement the `delete_at` also for `Node_List` and `Unique_Node_List`. Otherwise, if someone should use `delete_at`, for e `Unique_Node_List`, and expect a sane result, they will encounter strange bugs. At least, you should implement them with a `ShouldNotReachHere` or similar. src/hotspot/share/opto/subnode.hpp line 341: > 339: assert(req() == Index1, "wrong of inputs for ScopedValueGetHitsInCacheNode"); > 340: add_req(index1); > 341: assert(req() == Index2, "wrong of inputs for ScopedValueGetHitsInCacheNode"); Suggestion: assert(req() == Index2, "wrong number of inputs for ScopedValueGetHitsInCacheNode"); ------------- PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-2008454328 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604281 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570604339 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570601339 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570600882 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570623797 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570592956 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570582483 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570577911 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570565104 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570556270 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570467750 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570558913 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570461310 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570453913 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570440916 From epeter at openjdk.org Thu Apr 18 12:31:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:20 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: <42IjtaFnhJy9RiLE_-v4y6AZ6R3vUFJtQ0UzeLaI79I=.e48124c6-a0e2-48f1-be81-d37f8c1f7388@github.com> References: <42IjtaFnhJy9RiLE_-v4y6AZ6R3vUFJtQ0UzeLaI79I=.e48124c6-a0e2-48f1-be81-d37f8c1f7388@github.com> Message-ID: On Tue, 30 Jan 2024 09:22:08 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 977: >> >>> 975: } >>> 976: >>> 977: Node* Node::find_unique_out_with(int opcode) const { >> >> Random idea: >> Would it not be nice if this method automatically casted the node to that node-class? >> Suggestions: >> - using templates: give the class name and the opcode. A bit annoying to use >> - using macros: give it the node-type name: i.e. `Add` for `AddNode`. The macro then uses the template, filling in `AddNode` and `Op_Add`. What do you think? > > Yes, it would but that out of scope for this PR. Automatic cast would be really nice. I mean you could do it in a separate prior PR. But I think now it is a bit nasty with the extra cast you require at the use-site. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570439734 From epeter at openjdk.org Thu Apr 18 12:31:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:20 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: <42IjtaFnhJy9RiLE_-v4y6AZ6R3vUFJtQ0UzeLaI79I=.e48124c6-a0e2-48f1-be81-d37f8c1f7388@github.com> Message-ID: On Thu, 18 Apr 2024 10:22:19 GMT, Emanuel Peter wrote: >> Yes, it would but that out of scope for this PR. > > Automatic cast would be really nice. > I mean you could do it in a separate prior PR. But I think now it is a bit nasty with the extra cast you require at the use-site. Plus, if you don't do it now, it will probably remain as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570440246 From epeter at openjdk.org Thu Apr 18 12:31:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:20 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: <42IjtaFnhJy9RiLE_-v4y6AZ6R3vUFJtQ0UzeLaI79I=.e48124c6-a0e2-48f1-be81-d37f8c1f7388@github.com> Message-ID: On Thu, 18 Apr 2024 10:22:42 GMT, Emanuel Peter wrote: >> Automatic cast would be really nice. >> I mean you could do it in a separate prior PR. But I think now it is a bit nasty with the extra cast you require at the use-site. > > Plus, if you don't do it now, it will probably remain as it is. Hmm, I guess most cases are going to be `find_unique_out_with(Op_Bool)->as_Bool()`, and that is ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570445610 From epeter at openjdk.org Thu Apr 18 12:31:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:31:20 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 15:36:34 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/node.cpp line 988: > >> 986: return res; >> 987: } >> 988: > > Code duplication warning ? > Not sure what is the best solution though. I think you could remove duplication with a simple "implement" function, that takes a parameter "want_unique". Then if you find the element, and don't want unique, you return. If you are looking for unique, you just continue and check that you don't find it again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1570458297 From luhenry at openjdk.org Thu Apr 18 12:32:01 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 18 Apr 2024 12:32:01 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 11:09:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch for instrinsic VectorLoadShuffle? > > BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. > > Thanks > > ## Test > test/jdk/jdk/incubator/vector/ > test/hotspot/jtreg/compiler/vectorapi src/hotspot/cpu/riscv/riscv_v.ad line 81: > 79: case Op_VectorLoadShuffle: > 80: case Op_VectorRearrange: > 81: if (vlen < 4) { Why the 4? It would be worth adding a comment to explicitly explain why. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1570620584 From tholenstein at openjdk.org Thu Apr 18 12:41:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:41:28 GMT Subject: RFR: 8324950: IGV: save the state to a file [v33] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/67e0ce00..582195e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=31-32 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From chagedorn at openjdk.org Thu Apr 18 12:45:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 12:45:30 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v2] In-Reply-To: References: Message-ID: <5D-pIM2L0KsUt0Jj5fZEcvgM99H1o8dRVbfGSt7R0Xc=.9535a22a-d131-47fe-9a40-5ceb2c34797c@github.com> > This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. > > #### Background > > The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Move push-inputs/outputs to Node_List. - Review Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18723/files - new: https://git.openjdk.org/jdk/pull/18723/files/3f40bd56..f072bae9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18723&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18723&range=00-01 Stats: 60 lines in 5 files changed: 24 ins; 21 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18723/head:pull/18723 PR: https://git.openjdk.org/jdk/pull/18723 From chagedorn at openjdk.org Thu Apr 18 12:45:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 12:45:30 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v2] In-Reply-To: <5D-pIM2L0KsUt0Jj5fZEcvgM99H1o8dRVbfGSt7R0Xc=.9535a22a-d131-47fe-9a40-5ceb2c34797c@github.com> References: <5D-pIM2L0KsUt0Jj5fZEcvgM99H1o8dRVbfGSt7R0Xc=.9535a22a-d131-47fe-9a40-5ceb2c34797c@github.com> Message-ID: On Thu, 18 Apr 2024 12:42:31 GMT, Christian Hagedorn wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Move push-inputs/outputs to Node_List. > - Review Emanuel Thanks a lot for your review! I've addressed all your comments and pushed an update. I already incorporated the idea of moving the `push` methods to `Node_List`. ------------- PR Review: https://git.openjdk.org/jdk/pull/18723#pullrequestreview-2008695482 From chagedorn at openjdk.org Thu Apr 18 12:45:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 12:45:30 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v2] In-Reply-To: <3PFvJiUnIewzcPe_9LHMNyNRdVidXStvXNfapaueoKs=.783ca074-c94c-40d5-acda-b66ff55c94a7@github.com> References: <3PFvJiUnIewzcPe_9LHMNyNRdVidXStvXNfapaueoKs=.783ca074-c94c-40d5-acda-b66ff55c94a7@github.com> Message-ID: <8ONSqX2KwN7gaTgTU14UCwgcCeRr9A-OzqXWq3TZJ5Q=.ff5952b0-2ee4-4af4-aa44-16f07d9c9162@github.com> On Thu, 18 Apr 2024 08:43:51 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1450: >> >>> 1448: if (m != nullptr) { >>> 1449: wq.push(m); >>> 1450: } >> >> This code now looks almost identical to `TemplateAssertionPredicateExpressionNode::find_opaque_loop_nodes`, except that there we just are looking for one init/stride, and here we count them. Can we refactor this? >> >> Idea: have a callback/lambda we call on init/stride nodes. That callback can then count, or give a "terminate" return: `enum Action { Continue, Terminate };`. >> >> Or maybe you just use the count method also for checking if there are any init/stride. That is slightly more expensive, but maybe it does not matter. Or you give your count method a parameter that simply terminates early, and a return value true/false if we found any. Lots of options. > > Ah wait. That is what we used to do: count to check if there are any. That is what `subgraph_has_opaque` did. Yes, I eventually want to get rid of `count_opaque_loop_nodes()`. This is kinda an intermediate state. >> src/hotspot/share/opto/predicates.hpp line 344: >> >>> 342: // Expression Node belongs to. >>> 343: template >>> 344: void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) { >> >> Suggestion: >> >> void for_each_template_assertion_predicate(ApplyToTemplateFunction apply_to_template_function) const { >> >> Would that work? > > I think the name `ApplyToTemplateFunction` / `apply_to_template_function` is more noisy than necessary. I would just do `Callback` and `callback`. In the for-each context it is clear what this means. Fair point, updated. >> src/hotspot/share/opto/predicates.hpp line 360: >> >>> 358: } >>> 359: assert(template_counter <= 2, "a node cannot be part of more than two templates"); >>> 360: assert(template_counter <= 1 || _node->is_OpaqueLoopInit(), "only OpaqueLoopInit nodes can be part of two templates"); >> >> Can this be true? Maybe there are some implicit assumptions about `_node`. But if it was for example an `AddI`, then this node could be used all over the place, and certainly could be used by many TAPE. Am I wrong? > > If these asserts are correct, you probably want to add some comments. When we follow all inputs of a `TemplateAssertionPredicateExpressionNode`, we eventually end up at an `OpaqueLoop*Node`. These nodes do not common up. Therefore, each `TemplateAssertionPredicateExpressionNode` can only be part of one Template Assertion Predicate Expression. One exception is the `OpaqueLoopInitNode` itself. Due to convenience, the init and last value Template Assertion Predicate share this node. I can add a comment to explain these asserts further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570592050 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570610236 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570616176 From chagedorn at openjdk.org Thu Apr 18 12:45:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 12:45:31 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v2] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 08:55:32 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Move push-inputs/outputs to Node_List. >> - Review Emanuel > > src/hotspot/share/opto/predicates.cpp line 354: > >> 352: } >> 353: >> 354: void TemplateAssertionPredicateExpressionNode::push_non_null_inputs(Unique_Node_List& list, const Node* node) { > > Why not make this a method in Node? `node->push_non_null_inputs(list)`. > If that can be part of the header file, then it would even be efficiently inlined, I assume. > We could then use it all over the place! > > Well, you should probably indicate that you are not traversing `in(0)`... not sure what would be an adequate name. Good point. Might it even be better suited inside `Node_List`? It does more sound like a thing a list should know how to do. How about going with `list.push_non_null_non_cfg_inputs_of(node)`? FWIW, there are quite some places where we only want to put the non-cfg nodes on a node list. I suggest to file a follow up RFE to replace those with this new method if you agree. Same for the outputs below. > src/hotspot/share/opto/predicates.cpp line 367: > >> 365: } >> 366: >> 367: void TemplateAssertionPredicateExpressionNode::push_outputs(Unique_Node_List& list, const Node* node) { > > Why not make this a method in `Node`? `node->push_outs(list)`. Same as above, better inside `Node_List`? > src/hotspot/share/opto/predicates.hpp line 297: > >> 295: // - Two: A OpaqueLoopInitNode could be part of two Template Assertion Predicates. >> 296: // - One: In all other cases. >> 297: class TemplateAssertionPredicateExpressionNode : public StackObj { > > I have a slight irritation that this has a `...Node` suffix. Indicates that it is a subclass of `Node`, which is not correct. But probably it is still a good name, so you can leave it. I see your point. Due to a lack of having a better naming idea, I went with `...Node`. Let me think some more about it. > src/hotspot/share/opto/predicates.hpp line 314: > >> 312: public: >> 313: // Check whether the provided node is part of a Template Assertion Predicate Expression or not. >> 314: static bool is_valid(Node* node) { > > I would rename this too. To keep it similar to `is_maybe_in_template_assertion_predicate_expression`, name it `is_in_template_assertion_predicate_expression`. > > Hmm. You could also shorten the names, since we know from the context that we are talking about TAPE: > > is_in_expression > is_maybe_in_expression > > > Also: why is there the `find_opaque_loop_nodes` method? Is it even used anywhere else? I think that's a left-over from an earlier refactoring. I removed `find_opaque_loop_nodes()` and directly use `is_in_expression()`. > src/hotspot/share/opto/predicates.hpp line 320: > >> 318: // Check if the opcode of node could be found in a Template Assertion Predicate Expression. >> 319: // This also provides a fast check whether a node is unrelated. >> 320: static bool valid_opcode(const Node* node) { > > I'm not a fan of this name, it implies that nodes could be "valid" or "invalid". For one, I think this should start with `is_...`. This would be very long, but at least more accurate: `is_maybe_in_template_assertion_predicate_expression`. It's always a good sign when you point out the names I'm also not quite satisfied to start with - so that's an indicator to change it :-) I tried to be concise here but then there are really no good and precise options. Let's go with `is_maybe_in_expression()` and `is_in_expression()` as also suggested above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570596089 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570596662 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570597725 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570608887 PR Review Comment: https://git.openjdk.org/jdk/pull/18723#discussion_r1570601200 From tholenstein at openjdk.org Thu Apr 18 12:46:30 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:46:30 GMT Subject: RFR: 8324950: IGV: save the state to a file [v34] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Update src/utils/IdealGraphVisualizer/Data/src/main/java/com/sun/hotspot/igv/data/serialization/Parser.java Co-authored-by: Christian Hagedorn - Update with suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/582195e3..ea8f2389 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=32-33 Stats: 7 lines in 2 files changed: 0 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Thu Apr 18 12:46:30 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:46:30 GMT Subject: RFR: 8324950: IGV: save the state to a file [v31] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 07:10:12 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Update GraphParser.java > > src/utils/IdealGraphVisualizer/Coordinator/src/main/java/com/sun/hotspot/igv/coordinator/OutlineTopComponent.java line 130: > >> 128: * Obtain the OutlineTopComponent instance. Never call {@link #getDefault} directly! >> 129: */ >> 130: public static synchronized OutlineTopComponent findInstance() { > > Could this method then be private when it should never be called directly? sure ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17630#discussion_r1570657736 From epeter at openjdk.org Thu Apr 18 12:51:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 12:51:16 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Tue, 16 Apr 2024 14:43:21 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'master' into JDK-8320649 > - review > - test fix > - test fix > - Merge branch 'master' into JDK-8320649 > - whitespaces > - review > - Merge branch 'master' into JDK-8320649 > - review > - 32 bit build fix > - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e I am wondering if it would make sense to have some `scoped_value.hpp/cpp`, where you can put all your new classes. This would also allow you to put documentation about the general approach at the top of the `scoped_value.hpp` file. Currently, the code is spread all over, and it would be hard to know where one could find a good summary of the whole optimization. **For any Reviewer** Before diving into the code, make sure to study `ScopedValue::get`, it is what this optimization is all based on. So far, I have spend most my review-time on code-style, and making sure that the code is understandable. It seems there is a reasonable amount of tests now, but there could always be more. The worrying part: `ScopedValue` is in no fuzzer, and so it is hard to tickle edge-cases. Since this is a lot of code, it is hard to know if there are not some subtle bugs hiding. But I've looked at the code too many times now, and can't see any bugs. A second or third reviewer should use a fresh eye, and carefully review the changes. After this round of comments is addressed, I'll approve it, and I'm stepping away, and let someone else have a turn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2063787147 From fyang at openjdk.org Thu Apr 18 12:51:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 18 Apr 2024 12:51:15 GMT Subject: RFR: 8330419: Unused code in ConnectionGraph::specialize_castpp In-Reply-To: <72Q4nA_0HyrcVMMPJM4kKzMSNHGiF5RVLnlfyZDXT_g=.381fe7c3-67bf-4752-bc63-4b2ae7b3b9be@github.com> References: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> <72Q4nA_0HyrcVMMPJM4kKzMSNHGiF5RVLnlfyZDXT_g=.381fe7c3-67bf-4752-bc63-4b2ae7b3b9be@github.com> Message-ID: <482JmTBdvECSqql8csSnXRzwcsOyoE1zxHZaSllEbyo=.2f594b54-2bef-4428-9449-bd49093f4077@github.com> On Thu, 18 Apr 2024 06:01:41 GMT, Christian Hagedorn wrote: >> Please review this small code cleanup change. >> >> Noticed that `minus_one` local created in `ConnectionGraph::specialize_castpp` which is introduced by [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) is never used. I think it should be safe to remove this. Also renamed `boll` to `bol` to be consistent in naming with other places where we create a `BoolNode`. >> >> Tersting: tier1 tested on linux-aarch64 (release & fastdebug) > > Looks good and trivial. @chhagedorn : Thanks for taking a look! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18805#issuecomment-2063787002 From fyang at openjdk.org Thu Apr 18 12:51:16 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 18 Apr 2024 12:51:16 GMT Subject: Integrated: 8330419: Unused code in ConnectionGraph::specialize_castpp In-Reply-To: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> References: <6L5j8WiAU4xDXERf8g8nt_T-CCHwQauEdjObEVxjV74=.e8942a0a-a804-47a4-8d56-6c3ad1dd51ef@github.com> Message-ID: On Wed, 17 Apr 2024 00:05:23 GMT, Fei Yang wrote: > Please review this small code cleanup change. > > Noticed that `minus_one` local created in `ConnectionGraph::specialize_castpp` which is introduced by [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) is never used. I think it should be safe to remove this. Also renamed `boll` to `bol` to be consistent in naming with other places where we create a `BoolNode`. > > Tersting: tier1 tested on linux-aarch64 (release & fastdebug) This pull request has now been integrated. Changeset: 571e6bc3 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/571e6bc3f7d521d3be7ee1c6c32705c768645b75 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod 8330419: Unused code in ConnectionGraph::specialize_castpp Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18805 From chagedorn at openjdk.org Thu Apr 18 12:53:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 12:53:38 GMT Subject: RFR: 8324950: IGV: save the state to a file [v35] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 12:50:09 GMT, Tobias Holenstein wrote: >> The current workflow in IGV is the following: >> 1) import an XML file with graphs or send via network >> 2) open or more graphs in a tab >> 3) extract a set of nodes to be displayed in the tab >> 4) close IGV and start from 1) again >> >> The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. >> ### The new workflow >> >> When opening IGV the user gets an empty workspace without any opened files. >> - Graphs can be sent via the network to IGV >> - Graph can be opened from an XML file >> empty >> >> Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: >> >> graph >> >> A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: >> >> >> >> >> ... >> ... >> ... >> ... >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> The workspace menu is restructured: >> >> open >> >> - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: >> - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. >> >> save >> >> - `Save..` saves the current opened xml file. Create a ... > > Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 > - remove null check Looks good, thanks for the update! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17630#pullrequestreview-2008798528 From tholenstein at openjdk.org Thu Apr 18 12:53:38 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:53:38 GMT Subject: RFR: 8324950: IGV: save the state to a file [v35] In-Reply-To: References: Message-ID: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8324950' of github.com:tobiasholenstein/jdk into JDK-8324950 - remove null check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17630/files - new: https://git.openjdk.org/jdk/pull/17630/files/ea8f2389..339b0ed3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17630&range=33-34 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17630/head:pull/17630 PR: https://git.openjdk.org/jdk/pull/17630 From tholenstein at openjdk.org Thu Apr 18 12:56:09 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:56:09 GMT Subject: RFR: 8324950: IGV: save the state to a file [v25] In-Reply-To: <606-XJdbtoFsY1IE48qvkaC6LgLZkaLKc0JIGlZJSMU=.c8f80bdb-a75a-453f-97b8-e0e08bbec28d@github.com> References: <8AsJ1ssqCodHDiDp8cn-GSIPHiUp5NHNaBH8oDZe6lI=.7261d6bb-90cb-4324-8232-62506f4b8e6f@github.com> <606-XJdbtoFsY1IE48qvkaC6LgLZkaLKc0JIGlZJSMU=.c8f80bdb-a75a-453f-97b8-e0e08bbec28d@github.com> Message-ID: On Wed, 17 Apr 2024 07:04:57 GMT, Roberto Casta?eda Lozano wrote: >>> There is an issue with saved difference graph states. If I open [diff.zip](https://github.com/openjdk/jdk/files/14976481/diff.zip) (which I just created by importing some graphs, opening one of them, and diffing it against another one), I get the following assertion error: >>> >>> ``` >>> [INFO] java.lang.AssertionError >>> [INFO] at com.sun.hotspot.igv.util.RangeSliderModel.setPositions(RangeSliderModel.java:101) >>> [INFO] at com.sun.hotspot.igv.coordinator.OutlineTopComponent.lambda$loadContext$2(OutlineTopComponent.java:481) >>> [INFO] at java.desktop/java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:318) >>> [INFO] at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:773) >>> [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:720) >>> [INFO] at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:714) >>> [INFO] at java.base/java.security.AccessController.doPrivileged(AccessController.java:399) >>> [INFO] at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86) >>> [INFO] at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742) >>> [INFO] at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:136) >>> [INFO] [catch] at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203) >>> [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124) >>> [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113) >>> [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109) >>> [INFO] at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) >>> [INFO] at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90) >>> ``` >>> >>> I imagine fully supporting saving and restoring the diff state would require quite a lot of additional complexity, both in IGV and in the XML files. Maybe this is not a very important use case, and we could just not support it? >> >> Thanks for catching that! I fixed it. Difference graph are supported as long as they are in the same group. A difference graph from two different groups is just not saved to the xml. > >> Thanks for catching that! I fixed it. Difference graph are supported as long as they are in the same group. A difference graph from two different groups is just not saved to the xml. > > Great, that works fine now, as far as I can see. A minor related issue is that difference graphs from two different groups are not closed when the workspace is cleared. Is this expected? I would intuitively expect them to be closed as well, but I guess one can also argue that they do not belong to the workspace. Thanks for the reviews @robcasloz and @chhagedorn ------------- PR Comment: https://git.openjdk.org/jdk/pull/17630#issuecomment-2063798804 From tholenstein at openjdk.org Thu Apr 18 12:56:11 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Apr 2024 12:56:11 GMT Subject: Integrated: 8324950: IGV: save the state to a file In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 14:13:47 GMT, Tobias Holenstein wrote: > The current workflow in IGV is the following: > 1) import an XML file with graphs or send via network > 2) open or more graphs in a tab > 3) extract a set of nodes to be displayed in the tab > 4) close IGV and start from 1) again > > The idea of this RFE is to save the opened graph tabs and extracted nodes of a graph in the `graph.xml` file. > ### The new workflow > > When opening IGV the user gets an empty workspace without any opened files. > - Graphs can be sent via the network to IGV > - Graph can be opened from an XML file > empty > > Unzipping this [example_graph.zip](https://github.com/openjdk/jdk/files/14946834/example_graph.zip) and opening `graphs.xml` shows the following graph. New with this RFE is that opened graph tabs and extracted nodes are saved to the `graph.xml` file and restored when re-opening the `graphs.xml`: > > graph > > A new `` is introduced in `graphs.xml` that stores the opened graphs and their visible nodes: > > > > > ... > ... > ... > ... > > > > > > > > > > > > > > > > The workspace menu is restructured: > > open > > - `Open` allows the user to open an XML file. In IGV there is either no XML opened indicated as `untitled` or exactly one xml file opened. It's not possible to have two XML files opened at the same time: > - `Import`: Allows the user to import graphs from another XML file to the current opened XML file. > > save > > - `Save..` saves the current opened xml file. Create a new file if no file is opened. > - `Save as...` save the current graphs as a copy to an xml file. > Note: ... This pull request has now been integrated. Changeset: ec180d47 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/ec180d4782d39d50d2db3dfbe78e62a215c0a414 Stats: 1748 lines in 30 files changed: 854 ins; 632 del; 262 mod 8324950: IGV: save the state to a file Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17630 From bulasevich at openjdk.org Thu Apr 18 13:22:57 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 18 Apr 2024 13:22:57 GMT Subject: RFR: 8330061: Cleanup: follow code heaps order in CodeCache initialization and logging, code heap info in logs In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 21:51:14 GMT, Dmitry Chuyko wrote: > This is an additional tiny cleanup after CodeCache::initialize_heaps refactoring (JDK-8311248). CodeCache::initialize_heaps: code heaps info is printed in code heaps order, final size adjustments and flags are made in code heaps order. CodeCache::allocate: assertion message contains blob type. CodeCache::print_trace: name of the heap containing the method is printed. > > Testing: jtreg test/hotspot/jtreg/compiler/codecache, tier1, tier2. src/hotspot/share/code/codeCache.cpp line 254: > 252: size_t total = non_nmethod.size + profiled.size + non_profiled.size; > 253: if (total != cache_size && !cache_size_set) { > 254: log_info(codecache)("ReservedCodeCache size " SIZE_FORMAT "K changed to total segments size" This code is not important. Consider compacting it back to three lines src/hotspot/share/code/codeCache.cpp line 1578: > 1576: } > 1577: tty->print("CodeCache %s: addr: " INTPTR_FORMAT ", size: 0x%x", event, p2i(cb), size); > 1578: FOR_ALL_HEAPS(heap) { It would be preferable to use the get_code_heap_containing method here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18732#discussion_r1570735977 PR Review Comment: https://git.openjdk.org/jdk/pull/18732#discussion_r1570739270 From mli at openjdk.org Thu Apr 18 13:40:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Apr 2024 13:40:30 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: References: Message-ID: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> > Hi, > Can you help to review the patch for instrinsic VectorLoadShuffle? > > BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. > > Thanks > > ## Test > test/jdk/jdk/incubator/vector/ > test/hotspot/jtreg/compiler/vectorapi Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18835/files - new: https://git.openjdk.org/jdk/pull/18835/files/db63a657..b1400b33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18835&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18835&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18835/head:pull/18835 PR: https://git.openjdk.org/jdk/pull/18835 From mli at openjdk.org Thu Apr 18 13:40:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 18 Apr 2024 13:40:30 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 12:22:12 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment > > src/hotspot/cpu/riscv/riscv_v.ad line 81: > >> 79: case Op_VectorLoadShuffle: >> 80: case Op_VectorRearrange: >> 81: if (vlen < 4) { > > Why the 4? It would be worth adding a comment to explicitly explain why. added, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1570773981 From chagedorn at openjdk.org Thu Apr 18 13:41:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 13:41:32 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v3] In-Reply-To: References: Message-ID: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> > This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. > > #### Background > > The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18723/files - new: https://git.openjdk.org/jdk/pull/18723/files/f072bae9..00fe1bb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18723&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18723&range=01-02 Stats: 36 lines in 2 files changed: 16 ins; 18 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18723.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18723/head:pull/18723 PR: https://git.openjdk.org/jdk/pull/18723 From epeter at openjdk.org Thu Apr 18 13:53:01 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Apr 2024 13:53:01 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v3] In-Reply-To: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> References: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> Message-ID: <48CjdWoRvf5NxSDwCw2Xf3Ml2tGQuhFVW1Ygwwf5pqc=.af11def3-95a0-465a-a299-da361ea70cc2@github.com> On Thu, 18 Apr 2024 13:41:32 GMT, Christian Hagedorn wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update Looks good to me now. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18723#pullrequestreview-2008967334 From luhenry at openjdk.org Thu Apr 18 14:18:58 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 18 Apr 2024 14:18:58 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> References: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> Message-ID: On Thu, 18 Apr 2024 13:40:30 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch for instrinsic VectorLoadShuffle? >> >> BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. >> >> Thanks >> >> ## Test >> test/jdk/jdk/incubator/vector/ >> test/hotspot/jtreg/compiler/vectorapi > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18835#pullrequestreview-2009042491 From chagedorn at openjdk.org Thu Apr 18 14:31:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Apr 2024 14:31:57 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v3] In-Reply-To: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> References: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> Message-ID: <-Z3Kh43WybWE9nEDlCI-xTbql4hZjM4sJtdbsunT_Ko=.fc83e85d-5c99-47c8-ba11-199a2ee8e881@github.com> On Thu, 18 Apr 2024 13:41:32 GMT, Christian Hagedorn wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update Thanks Emanuel for your review and good catch with `Unique_Node_List::push()` not overriding `Node_List::push()`! We should clean up these classes and some point. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18723#issuecomment-2064012153 From kvn at openjdk.org Thu Apr 18 14:57:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 Apr 2024 14:57:03 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v3] In-Reply-To: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> References: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> Message-ID: On Thu, 18 Apr 2024 13:41:32 GMT, Christian Hagedorn wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18723#pullrequestreview-2009144411 From bkilambi at openjdk.org Thu Apr 18 15:47:30 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Apr 2024 15:47:30 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v7] In-Reply-To: References: Message-ID: > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Adjust format for the backend rules changed in previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/f38dae21..6d25d78f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=05-06 Stats: 6 lines in 2 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From bkilambi at openjdk.org Thu Apr 18 15:47:30 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 18 Apr 2024 15:47:30 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v6] In-Reply-To: References: <8xmxstkq7D_wMBI-BhUcJzoJOn2bWcsUuQtXXIv4YMk=.a1a5baee-e2af-40b2-9df9-67642d90d565@github.com> <9zThozzY0xAekz17NJ2PIwa-37r8M95MM_E4lJl-Kao=.124dfbb1-814e-4ae7-8c48-da3b37d5bb42@github.com> Message-ID: On Thu, 18 Apr 2024 08:13:07 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 2858: >> >>> 2856: // reduction addF >>> 2857: >>> 2858: instruct reduce_non_strict_order_add2F_neon(vRegF dst, vRegF fsrc, vReg vsrc) %{ >> >> Now that you have changed the name of the method, you should also change the `format` in all of the methods. > > Oh no! Sorry I missed that. Will do that right away. I have updated the `format` for the rules I changed. Thanks for spotting it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18034#discussion_r1570999327 From shade at openjdk.org Thu Apr 18 16:59:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 18 Apr 2024 16:59:59 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v10] In-Reply-To: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> References: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> Message-ID: On Mon, 8 Apr 2024 16:00:36 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > some formatting suggestions from @shipilev > > Co-authored-by: Aleksey Shipil?v @TheRealMDoerr, @GoeLin -- I think you'd want to ack that covering "IRIW" parts with just a `StoreStore` is okay here. I think it is, since we "just" want the same semantics for `volatile`-s as for `final`-s. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2064557768 From duke at openjdk.org Thu Apr 18 17:06:39 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 18 Apr 2024 17:06:39 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v11] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Merge branch 'master' into storestore - Merge branch 'master' into storestore - Apply suggestions from code review some formatting suggestions from @shipilev Co-authored-by: Aleksey Shipil?v - Guard everything by feature flag - Revert "Statistics for barriers generated/eliminated" This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. - Make flag product diagnostic and guard string concat storestore by flag - Statistics for barriers generated/eliminated - global flag to turn on storestore barrier emission and membar acquires IR tests - Add micro benchmark courtesy of @shipilev - More comprehensive IR tests based on @shipilev's suggestions - ... and 10 more: https://git.openjdk.org/jdk/compare/235ba9a7...104d733d ------------- Changes: https://git.openjdk.org/jdk/pull/18505/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=10 Stats: 610 lines in 9 files changed: 605 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From matsaave at openjdk.org Thu Apr 18 17:19:30 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 18 Apr 2024 17:19:30 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fei comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18477/files - new: https://git.openjdk.org/jdk/pull/18477/files/f612f947..c4789510 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From sviswanathan at openjdk.org Thu Apr 18 18:29:00 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 18 Apr 2024 18:29:00 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add enter() and leave(); remove Windows-specific register stuff @vnkozlov Could you please review this PR from @asgibbons? Looking forward to your inputs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2064852401 From dlong at openjdk.org Thu Apr 18 20:45:56 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 18 Apr 2024 20:45:56 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18813#pullrequestreview-2009852076 From mdoerr at openjdk.org Thu Apr 18 21:41:58 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 18 Apr 2024 21:41:58 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v10] In-Reply-To: References: <7T3WhLLml-T5hk5lVwlAar0aMd5sZZOnamLPu4BdKXg=.3f5bf2f0-b93b-4353-8cd6-26c2b03b2033@github.com> Message-ID: On Thu, 18 Apr 2024 16:57:24 GMT, Aleksey Shipilev wrote: > @TheRealMDoerr, @GoeLin -- I think you'd want to ack that covering "IRIW" parts with just a `StoreStore` is okay here. I think it is, since we "just" want the same semantics for `volatile`-s as for `final`-s. Correct, a StoreStore barrier is sufficient. Thanks for the notification! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2065370507 From mbalao at openjdk.org Fri Apr 19 00:10:05 2024 From: mbalao at openjdk.org (Martin Balao) Date: Fri, 19 Apr 2024 00:10:05 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) Message-ID: We would like to propose a fix for 8330611. To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. This work is in collaboration with @franferrax . ------------- Commit messages: - 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) Changes: https://git.openjdk.org/jdk/pull/18849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330611 Stats: 23 lines in 3 files changed: 18 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18849/head:pull/18849 PR: https://git.openjdk.org/jdk/pull/18849 From mbalao at openjdk.org Fri Apr 19 00:12:56 2024 From: mbalao at openjdk.org (Martin Balao) Date: Fri, 19 Apr 2024 00:12:56 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . This is how the generated code looks like after the proposed fix: 0x7fffe4730bad: vmovdqu %xmm0,(%r9) 0x7fffe4730bb2: test $0x8,%r8b 0x7fffe4730bb6: je 0x7fffe4730bd3 0x7fffe4730bbc: vpextrq $0x0,%xmm0,%r13 0x7fffe4730bc2: xor (%rdi,%r12,1),%r13 0x7fffe4730bc6: mov %r13,(%rsi,%r12,1) 0x7fffe4730bca: vpsrldq $0x8,%xmm0,%xmm0 0x7fffe4730bcf: add $0x8,%r12d 0x7fffe4730bd3: test $0x4,%r8b 0x7fffe4730bd7: je 0x7fffe4730bf4 0x7fffe4730bdd: vpextrd $0x0,%xmm0,%r13d 0x7fffe4730be3: xor (%rdi,%r12,1),%r13d 0x7fffe4730be7: mov %r13d,(%rsi,%r12,1) 0x7fffe4730beb: vpsrldq $0x4,%xmm0,%xmm0 0x7fffe4730bf0: add $0x4,%r12 0x7fffe4730bf4: test $0x2,%r8b 0x7fffe4730bf8: je 0x7fffe4730c16 0x7fffe4730bfe: vpextrw $0x0,%xmm0,%r13d 0x7fffe4730c03: xor (%rdi,%r12,1),%r13w 0x7fffe4730c08: mov %r13w,(%rsi,%r12,1) 0x7fffe4730c0d: vpsrldq $0x2,%xmm0,%xmm0 0x7fffe4730c12: add $0x2,%r12d 0x7fffe4730c16: test $0x1,%r8b 0x7fffe4730c1a: je 0x7fffe4730c32 0x7fffe4730c20: vpextrb $0x0,%xmm0,%r13d 0x7fffe4730c26: xor (%rdi,%r12,1),%r13b 0x7fffe4730c2a: mov %r13b,(%rsi,%r12,1) 0x7fffe4730c2e: add $0x1,%r12d ``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2065515893 From cslucas at openjdk.org Fri Apr 19 00:40:19 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 19 Apr 2024 00:40:19 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required Message-ID: The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. Tested on Linux x64 tiers1-3. ------------- Commit messages: - SR allocate needs to be of exact type. Changes: https://git.openjdk.org/jdk/pull/18851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330247 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18851/head:pull/18851 PR: https://git.openjdk.org/jdk/pull/18851 From kvn at openjdk.org Fri Apr 19 01:02:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 01:02:58 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:35:16 GMT, Cesar Soares Lucas wrote: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. Good. Did you run CTW test from bug report? Is it possible to extract simple reproducer from it and add it to this PR? ------------- PR Review: https://git.openjdk.org/jdk/pull/18851#pullrequestreview-2010308253 From fyang at openjdk.org Fri Apr 19 01:05:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 Apr 2024 01:05:02 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments Updated changes LGTM. I didn't see any issue when running SPECjvm2008 sunflow overnight on my 64-core aarch64 server. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18477#pullrequestreview-2010312997 From fyang at openjdk.org Fri Apr 19 01:10:59 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 Apr 2024 01:10:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v9] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:20:21 GMT, Martin Doerr wrote: > FAO @bulasevich @TheRealMDoerr @RealFYang @RealLucy I've created [JDK-8330472](https://bugs.openjdk.org/browse/JDK-8330472) to port the changes here to arm/ppc/riscv/s390. Also, the changes in this PR have been in made in such way that they only affect architectures on which the intrinsic is implemented. Would you also be able to test the changes in this PR to make sure no regressions are introduced on these archs? FYI: This also test good on linux-riscv64 platform. Thanks for the ping! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2065567115 From jkarthikeyan at openjdk.org Fri Apr 19 03:02:59 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 19 Apr 2024 03:02:59 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 13:59:42 GMT, Emanuel Peter wrote: >> This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. >> I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. >> >> Thoughts and reviews would be appreciated! > > Looks reasonable, thanks for the fix! Thanks for the review @eme64, and thanks for taking a look @dafedafe! I think the scope of the tests shouldn't be limited by this change. The vectorization IR test is the only test that takes the 1s as the right hand argument, and I think functionally the it should behave the same as the true/false ratio will still be roughly equal before and after. Before we were comparing `rand_int < rand_int`, which is true roughly 50% of the time. Comparing against 1 it should still be true roughly 50% of the time since `nextInt()` distributes ints across the whole domain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18734#issuecomment-2065663984 From jkarthikeyan at openjdk.org Fri Apr 19 03:04:00 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 19 Apr 2024 03:04:00 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 11:01:08 GMT, Emanuel Peter wrote: >> Thanks for the comments @chhagedorn and @eme64! I've pushed a commit that should address the points brought up in review, and renamed the function to `Type::equals`. > > @jaskarth this looks good. I am running testing again now. > > @merykitty do you have an opinion on this? You have done quite some work on types. Thanks for taking another look @eme64, and for the review @merykitty :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18533#issuecomment-2065664346 From cslucas at openjdk.org Fri Apr 19 03:07:58 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 19 Apr 2024 03:07:58 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:35:16 GMT, Cesar Soares Lucas wrote: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. Yes, I was able to reproduce the failure using CTW and a JAR file. I'll create a minimal test case and include in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18851#issuecomment-2065667237 From bulasevich at openjdk.org Fri Apr 19 05:34:59 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 19 Apr 2024 05:34:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:11:40 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Use vmIntrinsics instead of vmIntrinsicID > - Fix formatting src/hotspot/share/c1/c1_GraphBuilder.cpp line 4441: > 4439: > 4440: void GraphBuilder::append_alloc_array_copy(ciMethod* callee) { > 4441: { There are stylistic questions to this block - why separate block? - not necessary comment "Peek at receiver" - misprint in "not primtive array" comment - extra line for { - excessive comment "// not primitive array" - bailout message do not mention phi - please use INLINE_BAILOUT macro Comment about phi does not make things clear: not evident why receiver type can be nullptr and why it is phi. After all, why can't we check src->exact_type()->as_array_klass()->element_type() here? test/hotspot/jtreg/compiler/c1/TestNullArrayClone.java line 26: > 24: /* > 25: * @test > 26: * @bug 8302850 please add @summary is the purpose of the test to check that array clone throws NPE for null input and does not throw otherwise? Don't we want to check the contents of the copied data? Don't we want to check different sizes and array types? Is 1K iterations enough to compile the method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1571823622 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1571824070 From bulasevich at openjdk.org Fri Apr 19 05:41:59 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 19 Apr 2024 05:41:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:11:40 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Use vmIntrinsics instead of vmIntrinsicID > - Fix formatting $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" ... TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 1234 tests does not seem to be enough for a low-level feature. I would also check :hotspot_gc :hotspot_serviceability :hotspot_runtime and jdk tier1-3 targets. I checked ARM32 with this change and found no regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2065797695 PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2065798196 From amitkumar at openjdk.org Fri Apr 19 06:02:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 Apr 2024 06:02:01 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:11:40 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Use vmIntrinsics instead of vmIntrinsicID > - Fix formatting Tests result is clean on s390x as well. command: make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_compiler 1167 1167 0 0 ============================== TEST SUCCESS command: make run-test-tier1 ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1 2153 2151 2 0 << >> jtreg:test/jdk:tier1 2359 2357 2 0 << jtreg:test/langtools:tier1 4477 4477 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 33 33 0 0 ============================== TEST FAILURE The failures in `tier1` were not related to this PR. Let me know if more testing is required. CC:@reallucy ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2065817523 From chagedorn at openjdk.org Fri Apr 19 07:28:58 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Apr 2024 07:28:58 GMT Subject: RFR: 8323429: Missing C2 optimization for FP min/max when both inputs are same [v3] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 11:08:01 GMT, Galder Zamarre?o wrote: >> Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. >> >> It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. >> >> `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. >> >> I've run hotspot compiler tests successfully on x86_64. > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Use failOn instead of counts Yes, that looks unrelated and is most likely [JDK-8328066](https://bugs.openjdk.org/browse/JDK-8328066) which was fixed in the meantime. So, I think you can safely ignore this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18738#issuecomment-2065933716 From chagedorn at openjdk.org Fri Apr 19 07:29:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Apr 2024 07:29:57 GMT Subject: RFR: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates [v3] In-Reply-To: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> References: <2XSRXz3DbFYojfqv8WdcfC6poSegjRwkixny2vXajd0=.4daf896f-2140-40bc-8a80-4cb653e74972@github.com> Message-ID: On Thu, 18 Apr 2024 13:41:32 GMT, Christian Hagedorn wrote: >> This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. >> >> #### Background >> >> The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update Thanks Vladimir for your review! I'll submit some more testing with the latest changes before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18723#issuecomment-2065937008 From galder at openjdk.org Fri Apr 19 07:49:00 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Apr 2024 07:49:00 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 05:59:34 GMT, Amit Kumar wrote: >> Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use vmIntrinsics instead of vmIntrinsicID >> - Fix formatting > > Tests result is clean on s390x as well. > > > command: make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1167 1167 0 0 > ============================== > TEST SUCCESS > > > > command: make run-test-tier1 > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier1 2153 2151 2 0 << >>> jtreg:test/jdk:tier1 2359 2357 2 0 << > jtreg:test/langtools:tier1 4477 4477 0 0 > jtreg:test/jaxp:tier1 0 0 0 0 > jtreg:test/lib-test:tier1 33 33 0 0 > ============================== > TEST FAILURE > > > The failures in `tier1` were not related to this PR. Let me know if more testing is required. > CC:@reallucy Thanks for testing @offamitkumar @RealFYang ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2065988991 From galder at openjdk.org Fri Apr 19 07:51:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Apr 2024 07:51:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 05:31:40 GMT, Boris Ulasevich wrote: > There are stylistic questions to this block ... > Comment about phi does not make things clear: not evident why receiver type can be nullptr and why it is phi. After all, why can't we check src->exact_type()->as_array_klass()->element_type() here? This block comes from @dean-long, I'll let him comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1571973470 From galder at openjdk.org Fri Apr 19 07:57:01 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 19 Apr 2024 07:57:01 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:32:48 GMT, Martin Doerr wrote: >> Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use vmIntrinsics instead of vmIntrinsicID >> - Fix formatting > > Thanks for cleaning this up! Tests have passed on PPC64. ^ Thanks @TheRealMDoerr too :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2066010598 From shade at openjdk.org Fri Apr 19 08:41:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Apr 2024 08:41:57 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Wed, 17 Apr 2024 18:44:57 GMT, Joshua Cao wrote: > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. Good catch! src/hotspot/share/opto/compile.hpp line 321: > 319: > 320: bool _post_loop_opts_phase; // Loop opts are finished. > 321: bool _began_macro_expansion; // Macro expansion is started. I think this sounds better as inverse, `bool _allow_macro_nodes; // Allow creating macro nodes`. Then we can also assert `_allow_macro_nodes` in `add_macro_node`? test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java line 44: > 42: public static void main(String[] args) { > 43: TestFramework.run(); > 44: TestFramework.runWithFlags("-XX:+UseShenandoahGC"); I don't think we add GC-specific testing here. For one, the test would fail for the builds that do not include Shenandoah. The common practice it to rely on test pipelines running the test suites with different GCs. Does `make test TEST=compiler/c2/irTests/TestIfMinMax.java TEST_VM_OPTS=-XX:+UseShenandoahGC` work? ------------- PR Review: https://git.openjdk.org/jdk/pull/18824#pullrequestreview-2010863520 PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572033108 PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572035101 From aph at openjdk.org Fri Apr 19 09:12:58 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Apr 2024 09:12:58 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: <5JH14RCnyb9IuxaWWlVmPUraT1gPbPpejyxVTzaF3wY=.f2af6949-233a-4df4-8d23-a644d642f814@github.com> On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . That looks right. I suspect that most of the awfulness of this could have been avoided if the intrinsic operated on fixed-size blocks of data rather than byes, but it is what it is. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18849#pullrequestreview-2010929231 From aph at openjdk.org Fri Apr 19 09:28:58 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Apr 2024 09:28:58 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1786: > 1784: add(cache, cache, Array::base_offset_in_bytes()); > 1785: lea(cache, Address(cache, index)); > 1786: // Must prevent reordering of the following cp cache loads with bytecode load This is rather unclear. Where is the bytecode load to which this comment refers? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3039: > 3037: // access constant pool cache entry > 3038: __ load_field_entry(c_rarg2, r0); > 3039: Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1572090941 PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1572093527 From epeter at openjdk.org Fri Apr 19 10:00:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Apr 2024 10:00:27 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal Message-ID: This is an enhancement for AutoVectorization. I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. **Solution Sketch: "canonicalize" the invar** - Extract all summands of the `invar`: make a list. - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. - Bypass `CastLL` and `CastII` - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. - Sort all extracted summands by node idx. - Add up all summands in new order. If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. **Example** invar1 = b + c + d + a invar2 = d + b + a + c -> equivalent but not identical nodes Sort, and add up again: invar1 = a + b + c + d invar2 = a + b + c + d -> now the nodes are identical **Motivation: MemorySegment with invar** One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? This example did not vectorize, even though it should: https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. Why does this happen? After parsing, the graph looks like this: ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. load_adr = base + memory_segment_offset + CastLL(invar + iv) store_adr = base + memory_segment_offset + invar + iv And when we run SuperWord, the graph looks like this: ![image](https://github.com/openjdk/jdk/assets/32593061/c6b37919-39e9-419b-a23d-e480a39b3e51) The invar now has 3 summands: - `530 LoadL`: the offset that the MemorySegment has internally, to indicate its offset to the base of the beginning of the backing array. - `11 Para`: the input `invar` parameter from our test method. - `1460 Phi` pre-loop iv increment (value unknown at compile time) We can see that the addition of the summands is quite different for the load and store. An alternative solution would have been to add a corresponding `CastLL` for both the `load` and `store`, and hope that this means that the final address would look identical, though I don't know how difficult that would be. **Future Work** There are still a few relevant `MemorySegment` address patterns that are not properly recognized by `VPointer`. In some cases, the `RangeCheck` is not eliminated from the loop: [JDK-8327209](https://bugs.openjdk.org/browse/JDK-8327209) I also have seen a case where the `If` from the `VarHandleSegmentAsInts::offsetPlain` did not fold away. ![image](https://github.com/openjdk/jdk/assets/32593061/9c064631-30ce-4c5f-b938-f0b9b5afa7d4) The check looks like `((2271 AddL) + 4) & 3 == 0`. Maybe this could be fixed with IGVN optimizations, but I'm not sure. I am also thinking about refactoring `VPointer` completely. It has grown over time, and its pattern-matching is a bit nasty. ------------- Commit messages: - IR rules for test only on 64 bit - more tests, more comments, rm trace code - more int/long tests: where offsetPlain moves away - add long tests - verify cfg case - test: handle AlignVector - some int tests - allow LShift for scaling - better comments - allow add and sub in invar - ... and 5 more: https://git.openjdk.org/jdk/compare/6bc6392d...687611a0 Changes: https://git.openjdk.org/jdk/pull/18795/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330274 Stats: 1062 lines in 3 files changed: 1062 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18795/head:pull/18795 PR: https://git.openjdk.org/jdk/pull/18795 From epeter at openjdk.org Fri Apr 19 10:00:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Apr 2024 10:00:28 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 12:00:51 GMT, Emanuel Peter wrote: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... src/hotspot/share/opto/vectorization.cpp line 101: > 99: tty->print(" lpt->_head %d", _cl->_idx); _cl->dump(); > 100: _lpt->dump_head(); > 101: _cl->dump_bfs(100, _cl_exit, "c-"); Note: Simply makes it easier to see what kind of CFG is between the loop head and end. Is it a `RangeCheck`, or some `If`? src/hotspot/share/opto/vectorization.cpp line 503: > 501: #ifdef ASSERT > 502: // We are changing the invar, and the debug info may no longer be accurate. > 503: if (new_invar != _invar) { _debug_invar = NodeSentinel; } Note: Roland had inserted this `_debug_invar` verification code a year ago. Putting `NodeSentinel` basically just disables the verifiation, he uses that also elsewhere already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18795#discussion_r1568819915 PR Review Comment: https://git.openjdk.org/jdk/pull/18795#discussion_r1568821470 From fyang at openjdk.org Fri Apr 19 10:01:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 19 Apr 2024 10:01:07 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v11] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:06:39 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' into storestore > - Merge branch 'master' into storestore > - Apply suggestions from code review > > some formatting suggestions from @shipilev > > Co-authored-by: Aleksey Shipil?v > - Guard everything by feature flag > - Revert "Statistics for barriers generated/eliminated" > > This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. > - Make flag product diagnostic and guard string concat storestore by flag > - Statistics for barriers generated/eliminated > - global flag to turn on storestore barrier emission and membar acquires > IR tests > - Add micro benchmark courtesy of @shipilev > - More comprehensive IR tests based on @shipilev's suggestions > - ... and 10 more: https://git.openjdk.org/jdk/compare/235ba9a7...104d733d test/hotspot/jtreg/compiler/c2/irTests/ConstructorBarriers.java line 33: > 31: * @summary Test barriers emitted in constructors > 32: * @library /test/lib / > 33: * @requires os.arch=="aarch64" | os.arch=="x86_64" | os.arch=="amd64" Hi, Could you please enable this test for riscv64 as well? It passes on linux-riscv64 (fastdebug build). `@requires os.arch=="aarch64" | os.arch=="riscv64" | os.arch=="x86_64" | os.arch=="amd64"` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18505#discussion_r1572132454 From enikitin at openjdk.org Fri Apr 19 10:14:09 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 19 Apr 2024 10:14:09 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main Message-ID: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. I found only one test that seem to use driver mode incorrectly, this PR fixes it. Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. ------------- Commit messages: - 8326742: Change compiler tests without additional VM flags from @run driver to @run main Changes: https://git.openjdk.org/jdk/pull/18854/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18854&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8326742 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18854.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18854/head:pull/18854 PR: https://git.openjdk.org/jdk/pull/18854 From enikitin at openjdk.org Fri Apr 19 10:15:21 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 19 Apr 2024 10:15:21 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess Message-ID: Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. ------------- Commit messages: - 8330621: Make 5 compiler tests use ProcessTools.executeProcess Changes: https://git.openjdk.org/jdk/pull/18856/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18856&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330621 Stats: 17 lines in 5 files changed: 5 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18856.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18856/head:pull/18856 PR: https://git.openjdk.org/jdk/pull/18856 From tholenstein at openjdk.org Fri Apr 19 10:17:25 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 19 Apr 2024 10:17:25 GMT Subject: RFR: 8330587: IGV: remove ControlFlowTopComponent Message-ID: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> The control flow window (very right, next to Bytecodes) implemented by ControlFlowTopComponent is no longer used with the availability of the new CFG view. Therefore ControlFlowTopComponent is removed ------------- Commit messages: - JDK-8330587: IGV: remove ControlFlowTopComponent Changes: https://git.openjdk.org/jdk/pull/18859/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18859&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330587 Stats: 1236 lines in 16 files changed: 0 ins; 1236 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18859.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18859/head:pull/18859 PR: https://git.openjdk.org/jdk/pull/18859 From chagedorn at openjdk.org Fri Apr 19 11:38:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Apr 2024 11:38:57 GMT Subject: RFR: 8330587: IGV: remove ControlFlowTopComponent In-Reply-To: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> References: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> Message-ID: On Fri, 19 Apr 2024 09:33:21 GMT, Tobias Holenstein wrote: > The control flow window (very right, next to Bytecodes) implemented by ControlFlowTopComponent is no longer used with the availability of the new CFG view. > > Therefore ControlFlowTopComponent is removed That's a reasonable choice - looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18859#pullrequestreview-2011202280 From chagedorn at openjdk.org Fri Apr 19 11:43:56 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Apr 2024 11:43:56 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess In-Reply-To: References: Message-ID: <2TWOpFjA6gXYCpkUg3FUz3k8YsnNbdZjVebHyty1GYU=.837ff374-5f76-42f9-942b-9484c0c7b814@github.com> On Fri, 19 Apr 2024 07:22:06 GMT, Evgeny Nikitin wrote: > Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. > > As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. Looks good! test/hotspot/jtreg/compiler/profiling/spectrapredefineclass_classloaders/Launcher.java line 71: > 69: OutputAnalyzer output = ProcessTools.executeProcess(pb); > 70: output.shouldHaveExitValue(0); > 71: } catch (IOException ex) { Can probably also be updated as in Test7068051? Suggestion: } catch (Exception ex) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18856#pullrequestreview-2011208159 PR Review Comment: https://git.openjdk.org/jdk/pull/18856#discussion_r1572243237 From duke at openjdk.org Fri Apr 19 11:50:01 2024 From: duke at openjdk.org (Charles Connell) Date: Fri, 19 Apr 2024 11:50:01 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . Thank you for this fix! (I discovered the bug) I agree it would be very difficult to verify the lack of out-of-bounds memory access. However, after some tinkering myself, I also noticed that even obvious correctness problems in this "tail" area (like simply not copying bytes into the dest array) do not fail any existing unit tests. If it's not that hard, maybe that would a good test to add. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2066402265 From stefank at openjdk.org Fri Apr 19 11:55:00 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Apr 2024 11:55:00 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 07:22:06 GMT, Evgeny Nikitin wrote: > Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. > > As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. Looks good to me. Do you know that you also could skip explicitly creating ProcessBuilders and instead run: ProcessTools.executeProcess(jar.getCommand()); ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18856#pullrequestreview-2011226615 From jbhateja at openjdk.org Fri Apr 19 12:00:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Apr 2024 12:00:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 00:47:29 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fix white space, add comments src/hotspot/cpu/x86/assembler_x86.cpp line 12925: > 12923: void Assembler::prefix_rex2(Address adr, bool is_map1) { > 12924: int bits = is_map1 ? REX2BIT_M0 : 0; > 12925: bits |= get_base_prefix_bits(adr.base()->encoding()); Suggestion: bits |= get_base_prefix_bits(adr.base()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1572247446 From jbhateja at openjdk.org Fri Apr 19 12:00:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Apr 2024 12:00:00 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 11:44:35 GMT, Jatin Bhateja wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> fix white space, add comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 12925: > >> 12923: void Assembler::prefix_rex2(Address adr, bool is_map1) { >> 12924: int bits = is_map1 ? REX2BIT_M0 : 0; >> 12925: bits |= get_base_prefix_bits(adr.base()->encoding()); > > Suggestion: > > bits |= get_base_prefix_bits(adr.base()); As per section 3.7.5 of Intel SDM (Index ? Scale) + Displacement is a valid addressing mode. Thus we should set the bits corresponding to extended base register encoding only if its a valid register. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1572250460 From mli at openjdk.org Fri Apr 19 12:09:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Apr 2024 12:09:13 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v6] In-Reply-To: References: Message-ID: > Hi, > Can you have a review on this patch to add RoundVF/RoundDF intrinsics? > Thanks! > > ## Tests > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectRiscv64.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/floatingpoint/TestRound.java > > test/jdk/java/lang/Math/RoundTests.java Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'master' into round-F+D-v - Merge branch 'master' into round-F+D-v - restore round mode back to rne - Merge branch 'master' into round-F+D-v - fix minors - merge master - fix space - add tests - add test cases - v2: (src + 0.5) + rdn - ... and 4 more: https://git.openjdk.org/jdk/compare/177092b9...2b57205f ------------- Changes: https://git.openjdk.org/jdk/pull/17745/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17745&range=05 Stats: 242 lines in 7 files changed: 238 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17745.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17745/head:pull/17745 PR: https://git.openjdk.org/jdk/pull/17745 From mli at openjdk.org Fri Apr 19 12:12:01 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Apr 2024 12:12:01 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v3] In-Reply-To: References: <3wamGp9toFZEr7IO54NC4VOU8dAfpL2WJyWTSNv0m_s=.ebec8482-610a-4f92-9f42-5fe79b41dd23@github.com> Message-ID: On Thu, 18 Apr 2024 11:52:31 GMT, Hamlin Li wrote: >> tracked by https://bugs.openjdk.org/browse/JDK-8330094 > > I have merged master (including https://github.com/openjdk/jdk/pull/18785, https://github.com/openjdk/jdk/pull/18758), and rerun the tests. Merged master again to include https://github.com/openjdk/jdk/pull/18839, and rerun the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17745#discussion_r1572272012 From roland at openjdk.org Fri Apr 19 12:59:16 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Apr 2024 12:59:16 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int [v2] In-Reply-To: References: Message-ID: <4sjqJ79xOh4Mt_SxaNxUfKDXRNredCyAFe4OGW8c60w=.6ecf3afb-0060-47aa-9d4c-d33f81eef18a@github.com> > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18813/files - new: https://git.openjdk.org/jdk/pull/18813/files/ecd940f3..d876cbc9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18813&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18813&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18813.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18813/head:pull/18813 PR: https://git.openjdk.org/jdk/pull/18813 From roland at openjdk.org Fri Apr 19 12:59:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Apr 2024 12:59:17 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int [v2] In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 16:14:28 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> more > > src/hotspot/share/opto/loopnode.cpp line 2969: > >> 2967: int scaled_iters = (int)scaled_iters_long; >> 2968: if ((jlong)scaled_iters != scaled_iters_long) { >> 2969: // Remove outer loop and safepoint (too few iterations) > > Please put more extended comment here. What you have in PR description would be nice. I added a comment. Does it look good to you? > src/hotspot/share/opto/loopnode.cpp line 2973: > >> 2971: return; >> 2972: } >> 2973: int short_scaled_iters = LoopStripMiningIterShortLoop * ABS(stride); > > So stride is not MIN_INT here but the expression still can overflow. Should we use `jlong` for expression and `short_scaled_iters`? `iter_estimate` is `jlong`. Good catch. I updated the patch as suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1572326967 PR Review Comment: https://git.openjdk.org/jdk/pull/18813#discussion_r1572327289 From roland at openjdk.org Fri Apr 19 13:09:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Apr 2024 13:09:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v16] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/a4ffc11e..f63bf543 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From jbhateja at openjdk.org Fri Apr 19 13:38:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Apr 2024 13:38:01 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add enter() and leave(); remove Windows-specific register stuff Hi @asgibbons Please add a new test / extend an existing test for SIGBUS violation testing test/hotspot/jtreg/runtime/Unsafe/InternalErrorTest.java src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2611: > 2609: // Propagate byte to full Register > 2610: __ movzbl(rScratch1, byteVal); > 2611: __ mov64(wide_value, 0x0101010101010101); Long constant should be suffixed by ULL. test/micro/org/openjdk/bench/java/lang/foreign/MemorySegmentZeroUnsafe.java line 1: > 1: package org.openjdk.bench.java.lang.foreign; Copyright header missing. ------------- PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2011247585 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572370327 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572267154 From aph at openjdk.org Fri Apr 19 13:45:58 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Apr 2024 13:45:58 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 11:45:31 GMT, Charles Connell wrote: > Thank you for this fix! (I discovered the bug) I agree it would be very difficult to verify the lack of out-of-bounds memory access. However, after some tinkering myself, I also noticed that even obvious correctness problems in this "tail" area (like simply not copying bytes into the dest array) do not fail any existing unit tests. If it's not that hard, maybe that would a good test to add. It would not be impossible to write a gtest that maps some private memory then calls the AES-CTR stub directly, ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2066620374 From roland at openjdk.org Fri Apr 19 13:51:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Apr 2024 13:51:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: <4_H2XsBLZbEMyMSiuX3dM1Zvm-vDBYzQRWYtt1NrJ_A=.a2de5ad8-c284-48f2-b2cb-359f282fe438@github.com> On Thu, 18 Apr 2024 09:52:32 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 24: >> >>> 22: */ >>> 23: >>> 24: package compiler.c2.irTests; >> >> Christian and I have discussed this a while back: it would be nicer to put tests where they belong thematically. For example now it would be difficult to find all ScopedValue compiler tests, some are in the `irTests` directory, some elsewhere. Hence, I suggest you put them all under `compiler/scoped_value` or similar. > > Where are the already existing ScopedValue tests? There are no ScopedValue compiler tests that I know of. Moving the ScopedValue IR tests out of irTests look good to me. Let me do that. >> test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 137: >> >>> 135: MyDouble sv1 = sv.get(); >>> 136: notInlined(); >>> 137: MyDouble sv2 = sv.get(); // Doesn't optimize out (load of sv cannot common) >> >> Is this a necessary constraint, or a limitation of the optimization? Please add a corresponding comment. That would be helpful if this test all of the sudden failed the IR rule, and one has to debug. > > If this was in a loop, the two `get` would be hoisted, and commoned, right? They wouldn't. `sv` is a field and c2 can't common the 2 field loads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572395412 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572397085 From roland at openjdk.org Fri Apr 19 13:51:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Apr 2024 13:51:08 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v15] In-Reply-To: References: <3fcIOnZHYI7ebFLr6vUGnMCo7GDnQ-FTDNjKTeoXqNA=.99b678a0-d04c-4c0d-a269-d0fc41104bfc@github.com> Message-ID: On Thu, 18 Apr 2024 09:56:56 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'master' into JDK-8320649 >> - review >> - test fix >> - test fix >> - Merge branch 'master' into JDK-8320649 >> - whitespaces >> - review >> - Merge branch 'master' into JDK-8320649 >> - review >> - 32 bit build fix >> - ... and 12 more: https://git.openjdk.org/jdk/compare/bfff02ee...a4ffc11e > > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 185: > >> 183: @IR(counts = {IRNode.IF, "<= 4", IRNode.LOAD_P_OR_N, "<= 5" }) >> 184: public static void testFastPath5() { >> 185: Object unused = svObject.get(); // cannot be removed if result not used > > why? could there be some exception? please add comment why. Yes `get()` can throw. > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 236: > >> 234: @IR(counts = {IRNode.LOAD_D, "1" }) >> 235: public static double testFastPath7(boolean[] flags) { >> 236: double res = 0; > > Suggestion: > > double res = 0; > // hoisted here before the loop, and commoned. > > Would that be correct? Yes, it's correct > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 524: > >> 522: MyDouble sv1 = localSV.get(); >> 523: notInlined(); >> 524: MyDouble sv2 = localSV.get(); // should optimize out > > Why does this work now, and some other cases with `notInlined` in between do not work? Because the `sv` field is loaded in a local before the call. > test/hotspot/jtreg/compiler/scoped_value/TestScopedValueBadDominatorAfterExpansion.java line 30: > >> 28: * @summary SIGSEGV in PhaseIdealLoop::get_early_ctrl() >> 29: * @compile --enable-preview -source ${jdk.version} TestScopedValueBadDominatorAfterExpansion.java >> 30: * @run main/othervm --enable-preview -XX:-BackgroundCompilation -XX:-TieredCompilation -XX:+UseParallelGC TestScopedValueBadDominatorAfterExpansion > > Why is the parallel GC required here? Can you also have a run without flags, so that other GC's could be tried with this code? I don't remember the details (I wrote this test a couple months ago) but, most likely, the G1 write barrier got in the way. I'll add a line with no gc option. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572400445 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572405486 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572407630 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1572393836 From enikitin at openjdk.org Fri Apr 19 14:42:35 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 19 Apr 2024 14:42:35 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess [v2] In-Reply-To: References: Message-ID: > Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. > > As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Fix issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18856/files - new: https://git.openjdk.org/jdk/pull/18856/files/c4641d05..628037fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18856&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18856&range=00-01 Stats: 10 lines in 4 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18856.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18856/head:pull/18856 PR: https://git.openjdk.org/jdk/pull/18856 From enikitin at openjdk.org Fri Apr 19 14:42:35 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 19 Apr 2024 14:42:35 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess [v2] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 11:51:50 GMT, Stefan Karlsson wrote: > Do you know that you also could skip explicitly creating ProcessBuilders and instead run: Tried to be as less intrusive as possible :). But I agree, the fewer lines the better. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18856#issuecomment-2066723291 From enikitin at openjdk.org Fri Apr 19 14:42:35 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 19 Apr 2024 14:42:35 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess [v2] In-Reply-To: <2TWOpFjA6gXYCpkUg3FUz3k8YsnNbdZjVebHyty1GYU=.837ff374-5f76-42f9-942b-9484c0c7b814@github.com> References: <2TWOpFjA6gXYCpkUg3FUz3k8YsnNbdZjVebHyty1GYU=.837ff374-5f76-42f9-942b-9484c0c7b814@github.com> Message-ID: On Fri, 19 Apr 2024 11:40:17 GMT, Christian Hagedorn wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix issues > > test/hotspot/jtreg/compiler/profiling/spectrapredefineclass_classloaders/Launcher.java line 71: > >> 69: OutputAnalyzer output = ProcessTools.executeProcess(pb); >> 70: output.shouldHaveExitValue(0); >> 71: } catch (IOException ex) { > > Can probably also be updated as in Test7068051? > Suggestion: > > } catch (Exception ex) { Fixed in both `spectrapredefineclass` and `spectrapredefineclass_classloaders` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18856#discussion_r1572478331 From stefank at openjdk.org Fri Apr 19 14:45:59 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 19 Apr 2024 14:45:59 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess [v2] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 14:42:35 GMT, Evgeny Nikitin wrote: >> Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. >> >> As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Fix issues Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18856#pullrequestreview-2011625590 From sgibbons at openjdk.org Fri Apr 19 14:58:05 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 14:58:05 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 13:25:33 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add enter() and leave(); remove Windows-specific register stuff > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2611: > >> 2609: // Propagate byte to full Register >> 2610: __ movzbl(rScratch1, byteVal); >> 2611: __ mov64(wide_value, 0x0101010101010101); > > Long constant should be suffixed by ULL. Fixed. > test/micro/org/openjdk/bench/java/lang/foreign/MemorySegmentZeroUnsafe.java line 1: > >> 1: package org.openjdk.bench.java.lang.foreign; > > Copyright header missing. Added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572502532 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572502204 From matsaave at openjdk.org Fri Apr 19 15:53:57 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 19 Apr 2024 15:53:57 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 09:23:56 GMT, Andrew Haley wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fei comments > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1786: > >> 1784: add(cache, cache, Array::base_offset_in_bytes()); >> 1785: lea(cache, Address(cache, index)); >> 1786: // Must prevent reordering of the following cp cache loads with bytecode load > > This is rather unclear. Where is the bytecode load to which this comment refers? This comment was above all the other uses of membar and was just moved here for convenience. The "bytecode load" being referred to here is `InterpreterMacroAssembler::dispatch_next`. The bytecode loading could be scheduled before the cache entry is resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1572580608 From duke at openjdk.org Fri Apr 19 15:58:57 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 15:58:57 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) In-Reply-To: References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Fri, 19 Apr 2024 08:38:45 GMT, Aleksey Shipilev wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java line 44: > >> 42: public static void main(String[] args) { >> 43: TestFramework.run(); >> 44: TestFramework.runWithFlags("-XX:+UseShenandoahGC"); > > I don't think we add GC-specific testing here. For one, the test would fail for the builds that do not include Shenandoah. > > The common practice it to rely on test pipelines running the test suites with different GCs. Does `make test TEST=compiler/c2/irTests/TestIfMinMax.java TEST_VM_OPTS=-XX:+UseShenandoahGC` work? Yes, it works. Confirms it can reproduce the crash. Will remove that line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572588331 From kvn at openjdk.org Fri Apr 19 16:17:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 16:17:57 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int [v2] In-Reply-To: <4sjqJ79xOh4Mt_SxaNxUfKDXRNredCyAFe4OGW8c60w=.6ecf3afb-0060-47aa-9d4c-d33f81eef18a@github.com> References: <4sjqJ79xOh4Mt_SxaNxUfKDXRNredCyAFe4OGW8c60w=.6ecf3afb-0060-47aa-9d4c-d33f81eef18a@github.com> Message-ID: On Fri, 19 Apr 2024 12:59:16 GMT, Roland Westrelin wrote: >> This fixes 3 calls to ABS with a min int argument. I think all of them >> are harmless: >> >> - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The >> check is for a stride of 1 or -1. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the >> computation of `scaled_iters_long`, the stride is passed to `ABS()` >> and then implicitly casted to long. I now cast the stride to long >> before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` >> overflows the int range for all values of `LoopStripMiningIter` >> except 0 or 1. Those values are handled early on in that method. So >> for a min in stride: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> is always true and the method returns early. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the >> computation of `short_scaled_iters` also calls `ABS()` with the >> stride as argument. But the result of that computation is only used >> if the test for: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> doesn't cause an early return of the method. I reordered statements >> so the `ABS()` calls happens after that test which will cause an early >> return if the stride is min int. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18813#pullrequestreview-2011861226 From sgibbons at openjdk.org Fri Apr 19 16:25:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 16:25:28 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v22] In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Address review comments; update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/7a1d67e5..dccf6b6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=20-21 Stats: 37 lines in 13 files changed: 23 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555 From duke at openjdk.org Fri Apr 19 16:32:56 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 16:32:56 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) In-Reply-To: References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: <9qACIwL6SlPimw0i6gwPIYNkKQEawj4moWjlvIL16as=.eace3b6a-f4dc-4616-9156-10a3d2761f11@github.com> On Fri, 19 Apr 2024 08:37:05 GMT, Aleksey Shipilev wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > src/hotspot/share/opto/compile.hpp line 321: > >> 319: >> 320: bool _post_loop_opts_phase; // Loop opts are finished. >> 321: bool _began_macro_expansion; // Macro expansion is started. > > I think this sounds better as inverse, `bool _allow_macro_nodes; // Allow creating macro nodes`. Then we can also assert `_allow_macro_nodes` in `add_macro_node`? Can't add the assert for free. Apparently macro nodes are added later in the compiler pipeline, Matcher in this case. Stack: [0x00007fb2c6d50000,0x00007fb2c6e51000], sp=0x00007fb2c6e4caa0, free space=1010k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x63bd9d] Compile::add_macro_node(Node*)+0x49 (compile.hpp:742) V [libjvm.so+0x124ccc5] Node::clone() const+0x15f (node.cpp:501) V [libjvm.so+0x119a93a] Matcher::xform(Node*, int)+0x3a2 (matcher.cpp:1150) V [libjvm.so+0x1195413] Matcher::match()+0xe8b (matcher.cpp:359) V [libjvm.so+0x9a9723] Compile::Code_Gen()+0x95 (compile.cpp:2947) V [libjvm.so+0x99fd8d] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1827 (compile.cpp:895) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572635814 From duke at openjdk.org Fri Apr 19 16:38:13 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 16:38:13 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v2] In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Rename began_macro_expansion to allow_macro_nodes. Remove shenandoah flag from test. - Merge branch 'master' into shen - 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18824/files - new: https://git.openjdk.org/jdk/pull/18824/files/ebfaf34c..4cacffe9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=00-01 Stats: 37860 lines in 257 files changed: 15275 ins; 21623 del; 962 mod Patch: https://git.openjdk.org/jdk/pull/18824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18824/head:pull/18824 PR: https://git.openjdk.org/jdk/pull/18824 From duke at openjdk.org Fri Apr 19 16:47:08 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 16:47:08 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v3] In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Update comment for MinL/MaxL based on renaming of allow_macro_nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18824/files - new: https://git.openjdk.org/jdk/pull/18824/files/4cacffe9..88c3c1d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18824/head:pull/18824 PR: https://git.openjdk.org/jdk/pull/18824 From duke at openjdk.org Fri Apr 19 16:47:08 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 16:47:08 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v3] In-Reply-To: References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Fri, 19 Apr 2024 08:39:08 GMT, Aleksey Shipilev wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment for MinL/MaxL based on renaming of allow_macro_nodes > > Good catch! @shipilev updated based on your comments and also updated the comment in `CMoveNode::Ideal_minmax` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18824#issuecomment-2066925934 From shade at openjdk.org Fri Apr 19 16:47:08 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Apr 2024 16:47:08 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v3] In-Reply-To: <9qACIwL6SlPimw0i6gwPIYNkKQEawj4moWjlvIL16as=.eace3b6a-f4dc-4616-9156-10a3d2761f11@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <9qACIwL6SlPimw0i6gwPIYNkKQEawj4moWjlvIL16as=.eace3b6a-f4dc-4616-9156-10a3d2761f11@github.com> Message-ID: On Fri, 19 Apr 2024 16:30:48 GMT, Joshua Cao wrote: >> src/hotspot/share/opto/compile.hpp line 321: >> >>> 319: >>> 320: bool _post_loop_opts_phase; // Loop opts are finished. >>> 321: bool _began_macro_expansion; // Macro expansion is started. >> >> I think this sounds better as inverse, `bool _allow_macro_nodes; // Allow creating macro nodes`. Then we can also assert `_allow_macro_nodes` in `add_macro_node`? > > Can't add the assert for free. Apparently macro nodes are added later in the compiler pipeline, Matcher in this case. > > > Stack: [0x00007fb2c6d50000,0x00007fb2c6e51000], sp=0x00007fb2c6e4caa0, free space=1010k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x63bd9d] Compile::add_macro_node(Node*)+0x49 (compile.hpp:742) > V [libjvm.so+0x124ccc5] Node::clone() const+0x15f (node.cpp:501) > V [libjvm.so+0x119a93a] Matcher::xform(Node*, int)+0x3a2 (matcher.cpp:1150) > V [libjvm.so+0x1195413] Matcher::match()+0xe8b (matcher.cpp:359) > V [libjvm.so+0x9a9723] Compile::Code_Gen()+0x95 (compile.cpp:2947) > V [libjvm.so+0x99fd8d] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1827 (compile.cpp:895) Eh? We do `Compile::add_macro_node` from `Code_Gen`, which means we add to `_macro_nodes` unnecessarily? Same for expensive nodes. That's unfortunate... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572648377 From shade at openjdk.org Fri Apr 19 16:54:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Apr 2024 16:54:59 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v3] In-Reply-To: References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Fri, 19 Apr 2024 16:47:08 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update comment for MinL/MaxL based on renaming of allow_macro_nodes src/hotspot/share/opto/compile.hpp line 792: > 790: > 791: bool allow_macro_nodes() { return _allow_macro_nodes; } > 792: void dont_allow_macro_nodes() { _allow_macro_nodes = false; } `dont_allow_macro_nodes` is a confusing name here, especially given it is a setter in contrast to `allow_macro_nodes`. Let's call it `reset_allow_macro_nodes()`. src/hotspot/share/opto/macro.cpp line 2448: > 2446: // Returns true if a failure occurred. > 2447: bool PhaseMacroExpand::expand_macro_nodes() { > 2448: C->dont_allow_macro_nodes(); Leave a comment here: `// Do not allow new macro nodes once we started to expand` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572654763 PR Review Comment: https://git.openjdk.org/jdk/pull/18824#discussion_r1572655726 From duke at openjdk.org Fri Apr 19 16:56:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 16:56:28 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v12] In-Reply-To: References: Message-ID: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Add riscv64 to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18505/files - new: https://git.openjdk.org/jdk/pull/18505/files/104d733d..f2e8cf64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18505&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18505.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18505/head:pull/18505 PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Fri Apr 19 17:16:24 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 17:16:24 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Comment on not allowing macro nodes after we start expanding. Rename dont_allow_macro_nodes to reset_allow_macro_nodes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18824/files - new: https://git.openjdk.org/jdk/pull/18824/files/88c3c1d9..35cff12c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=02-03 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18824/head:pull/18824 PR: https://git.openjdk.org/jdk/pull/18824 From shade at openjdk.org Fri Apr 19 17:16:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Apr 2024 17:16:24 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> Message-ID: On Fri, 19 Apr 2024 17:13:51 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Comment on not allowing macro nodes after we start expanding. Rename > dont_allow_macro_nodes to reset_allow_macro_nodes. I am good with this version, but of course someone from compiler team needs to take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18824#pullrequestreview-2011967012 From mbalao at openjdk.org Fri Apr 19 17:44:56 2024 From: mbalao at openjdk.org (Martin Balao) Date: Fri, 19 Apr 2024 17:44:56 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 11:45:31 GMT, Charles Connell wrote: >> We would like to propose a fix for 8330611. >> >> To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. >> >> While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. >> >> A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. >> >> This work is in collaboration with @franferrax . > > Thank you for this fix! (I discovered the bug) I agree it would be very difficult to verify the lack of out-of-bounds memory access. However, after some tinkering myself, I also noticed that even obvious correctness problems in this "tail" area (like simply not copying bytes into the dest array) do not fail any existing unit tests. If it's not that hard, maybe that would a good test to add. Thanks @charlesconnell for reporting this bug and @theRealAph for your review. As part of my initial verification, I compared values obtained from the intrinsic with values obtained from the interpreter and they were equal in all cases. These cases were for input whose sizes were multiple of block size - 1 (so all bits of the tail had to be processed). To make sure that the testing was correct, I did error seeding (replaced a `movw` with a `movb` in the tail processing) and it failed as expected. I have now extended the error seeding strategy and verified how 3/4 tests under compiler/codegen/aes failed. I think that we have reasonable coverage. If there are no objections, I'm planning to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2067026317 From jvernee at openjdk.org Fri Apr 19 17:50:06 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 19 Apr 2024 17:50:06 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Add enter() and leave(); remove Windows-specific register stuff src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4013: > 4011: // Initialize table for unsafe copy memeory check. > 4012: if (UnsafeMemoryAccess::_table == nullptr) { > 4013: UnsafeMemoryAccess::create_table(26); How did you arrive at a table size of 26? src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2603: > 2601: const Register wide_value = rax; > 2602: const Register rScratch1 = r10; > 2603: Maybe put an `assert_different_registers` here for the above registers, just to be sure. (I see you are avoiding the existing `rscratch1` already, because of a conflict with `c_rarg2`) src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2674: > 2672: // Parameter order is (ptr, byteVal, size) > 2673: __ xchgq(c_rarg1, c_rarg2); > 2674: __ pop(rbp); // Clear effect of enter() Why not just use `leave()` here? src/hotspot/share/opto/library_call.cpp line 4959: > 4957: if (stopped()) return true; > 4958: > 4959: if (StubRoutines::unsafe_setmemory() == nullptr) return false; I don't see why this check is needed here, since we already check whether the stub is there in `is_intrinsic_supported`. Note that `inline_unsafe_copyMemory` also doesn't have this check. I think it would be good to keep consistency between the two. src/hotspot/share/opto/runtime.cpp line 780: > 778: const TypeFunc* OptoRuntime::make_setmemory_Type() { > 779: // create input type (domain) > 780: int num_args = 4; This variable seems redundant. src/hotspot/share/opto/runtime.cpp line 786: > 784: fields[argp++] = TypePtr::NOTNULL; // dest > 785: fields[argp++] = TypeLong::LONG; // size > 786: fields[argp++] = Type::HALF; // size Since the size is a `size_t`, I don't think this is correct on 32-bit platforms. I think here we want `TypeX_X`, and then add the extra `HALF` only on 64-bit platforms. Similar to what we do in `make_arraycopy_Type`: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/runtime.cpp#L799-L802 (Note that you will also have to adjust `argcnt` for this) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572570842 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572578437 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572593795 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572556648 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572564382 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572562058 From jvernee at openjdk.org Fri Apr 19 17:50:01 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 19 Apr 2024 17:50:01 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v22] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <1e63Ivvo2ZkyuP3U-RHrnZaUxv1PiKa2UnR5b2H9vpc=.290efaf8-6067-4e92-b7ae-932f6284b4cb@github.com> On Fri, 19 Apr 2024 16:25:28 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments; update copyright years I'm not really qualified as a compiler code reviewer, but I've left some comments to try and help this along. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2523: > 2521: // Number of (8*X)-byte chunks into rScratch1 > 2522: __ movq(tmp, size); > 2523: __ shrq(tmp, 3); `shr` [sets the zero flag][1], so I think you can just move the jump to after the shift and avoid a separate comparison? ```suggestion // Number of (8*X)-byte chunks into rScratch1 __ movq(tmp, size); __ shrq(tmp, 3); __ jccb(Assembler::zero, L_Tail); [1]: https://www.felixcloutier.com/x86/sal:sar:shl:shr#flags-affected ------------- PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2011751831 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572712233 From kvn at openjdk.org Fri Apr 19 17:53:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 17:53:57 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . @jatin-bhateja or @sviswa7 please review these changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2067038093 From kvn at openjdk.org Fri Apr 19 18:17:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 18:17:01 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <6BzGvaMr42tgUlEeHinsh7jGrvjBMIuNFijfMWhOSI0=.c65b5638-e247-4b09-9b63-1bf377668947@github.com> On Fri, 19 Apr 2024 15:43:17 GMT, Jorn Vernee wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add enter() and leave(); remove Windows-specific register stuff > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4013: > >> 4011: // Initialize table for unsafe copy memeory check. >> 4012: if (UnsafeMemoryAccess::_table == nullptr) { >> 4013: UnsafeMemoryAccess::create_table(26); > > How did you arrive at a table size of 26? This needs comment ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572744077 From kvn at openjdk.org Fri Apr 19 18:28:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 18:28:01 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v22] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 16:25:28 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments; update copyright years General comment/suggestion before I dive into review. Can we do renaming `UnsafeCopyMemory*` -> `UnsafeMemory*` in follow up RFE. This change hides the real change. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8336: > 8334: // Initialize table for copy memory (arraycopy) check. > 8335: if (UnsafeMemoryAccess::_table == nullptr) { > 8336: UnsafeMemoryAccess::create_table(18); Needs comment explaining 18 number src/hotspot/share/utilities/copy.hpp line 303: > 301: inline static void shared_disjoint_words_atomic(const HeapWord* from, > 302: HeapWord* to, size_t count) { > 303: I don't think this justify to change the file. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2012077574 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572750449 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572746349 From kvn at openjdk.org Fri Apr 19 18:40:57 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 18:40:57 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: On Fri, 19 Apr 2024 06:11:15 GMT, Evgeny Nikitin wrote: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18854#pullrequestreview-2012115122 From shade at openjdk.org Fri Apr 19 18:42:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 19 Apr 2024 18:42:57 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: <8NRY-k0RMK3tXf7TYkI9H1TVCO-PrbR0x4FMlqFypQg=.96717c57-bfb8-4c24-9b19-bb3d45d1fc8d@github.com> References: <8NRY-k0RMK3tXf7TYkI9H1TVCO-PrbR0x4FMlqFypQg=.96717c57-bfb8-4c24-9b19-bb3d45d1fc8d@github.com> Message-ID: On Wed, 17 Apr 2024 15:59:38 GMT, Aleksey Shipilev wrote: > All right, let me run tests with #18751 applied and see if we have any surprises. I ran Maven CTW, Fuzzer tests, and the rest of OpenJDK jtregs with my ABS-checking patch applied, and there are no surprises. Looks good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2067104198 From aph at openjdk.org Fri Apr 19 19:01:57 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Apr 2024 19:01:57 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 15:51:47 GMT, Matias Saavedra Silva wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1786: >> >>> 1784: add(cache, cache, Array::base_offset_in_bytes()); >>> 1785: lea(cache, Address(cache, index)); >>> 1786: // Must prevent reordering of the following cp cache loads with bytecode load >> >> This is rather unclear. Where is the bytecode load to which this comment refers? > > This comment was above all the other uses of membar and was just moved here for convenience. The "bytecode load" being referred to here is `InterpreterMacroAssembler::dispatch_next`. The bytecode loading could be scheduled before the cache entry is resolved. The comment at the stop says "This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded." But this seems to be before the constant pool cache field entry is loaded. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1572793177 From aph at openjdk.org Fri Apr 19 19:04:56 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Apr 2024 19:04:56 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 17:42:22 GMT, Martin Balao wrote: > I have now extended the error seeding strategy and verified how 3/4 tests under compiler/codegen/aes failed. I think that we have reasonable coverage. If there are no objections, I'm planning to integrate. Sounds good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2067133679 From sgibbons at openjdk.org Fri Apr 19 20:13:03 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:03 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v23] In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/dccf6b6c..dd0094ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=21-22 Stats: 175 lines in 21 files changed: 4 ins; 5 del; 166 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555 From sgibbons at openjdk.org Fri Apr 19 20:13:05 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:05 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v22] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 18:25:17 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments; update copyright years > > General comment/suggestion before I dive into review. > Can we do renaming `UnsafeCopyMemory*` -> `UnsafeMemory*` in follow up RFE. This change hides the real change. @vnkozlov I un-did the name change and will submit a separate request for re-naming. Thanks. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8336: > >> 8334: // Initialize table for copy memory (arraycopy) check. >> 8335: if (UnsafeMemoryAccess::_table == nullptr) { >> 8336: UnsafeMemoryAccess::create_table(18); > > Needs comment explaining 18 number Hmmm... There was no comment explaining the 8 number :-). I added 10 to the table size because I knew I was going to add 7 places where a mark was required. I left 3 for safety. The algorithm has since changed, so I changed this to: `UnsafeCopyMemory::create_table(8 + 4); // 8 for copyMemory; 4 for setMemory` I did a similar change to all other table creation numbers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067197605 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572840222 From sgibbons at openjdk.org Fri Apr 19 20:13:05 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:05 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: <6BzGvaMr42tgUlEeHinsh7jGrvjBMIuNFijfMWhOSI0=.c65b5638-e247-4b09-9b63-1bf377668947@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <6BzGvaMr42tgUlEeHinsh7jGrvjBMIuNFijfMWhOSI0=.c65b5638-e247-4b09-9b63-1bf377668947@github.com> Message-ID: On Fri, 19 Apr 2024 18:14:05 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4013: >> >>> 4011: // Initialize table for unsafe copy memeory check. >>> 4012: if (UnsafeMemoryAccess::_table == nullptr) { >>> 4013: UnsafeMemoryAccess::create_table(26); >> >> How did you arrive at a table size of 26? > > This needs comment I added 10 to the table size because I knew I was going to add 7 places where a mark was required for setMemory. I left 3 for safety. The algorithm changed so only 4 are needed. The algorithm has since changed, so I changed this to: `UnsafeCopyMemory::create_table(16 + 4); // 16 for copyMemory; 4 for setMemory` I did a similar change to all other table creation numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572841521 From sgibbons at openjdk.org Fri Apr 19 20:13:06 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:06 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v22] In-Reply-To: <1e63Ivvo2ZkyuP3U-RHrnZaUxv1PiKa2UnR5b2H9vpc=.290efaf8-6067-4e92-b7ae-932f6284b4cb@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <1e63Ivvo2ZkyuP3U-RHrnZaUxv1PiKa2UnR5b2H9vpc=.290efaf8-6067-4e92-b7ae-932f6284b4cb@github.com> Message-ID: On Fri, 19 Apr 2024 17:42:36 GMT, Jorn Vernee wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments; update copyright years > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2523: > >> 2521: // Number of (8*X)-byte chunks into rScratch1 >> 2522: __ movq(tmp, size); >> 2523: __ shrq(tmp, 3); > > `shr` [sets the zero flag][1], so I think you can just move the jump to after the shift and avoid a separate comparison > > ```suggestion > // Number of (8*X)-byte chunks into rScratch1 > __ movq(tmp, size); > __ shrq(tmp, 3); > __ jccb(Assembler::zero, L_Tail); > > > [1]: https://www.felixcloutier.com/x86/sal:sar:shl:shr#flags-affected Good catch. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572794443 From sgibbons at openjdk.org Fri Apr 19 20:13:07 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:07 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 15:50:05 GMT, Jorn Vernee wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add enter() and leave(); remove Windows-specific register stuff > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2603: > >> 2601: const Register wide_value = rax; >> 2602: const Register rScratch1 = r10; >> 2603: > > Maybe put an `assert_different_registers` here for the above registers, just to be sure. (I see you are avoiding the existing `rscratch1` already, because of a conflict with `c_rarg2`) Done. > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2674: > >> 2672: // Parameter order is (ptr, byteVal, size) >> 2673: __ xchgq(c_rarg1, c_rarg2); >> 2674: __ pop(rbp); // Clear effect of enter() > > Why not just use `leave()` here? No special reason. I've changed it since it seems to provide more clarity. > src/hotspot/share/opto/library_call.cpp line 4959: > >> 4957: if (stopped()) return true; >> 4958: >> 4959: if (StubRoutines::unsafe_setmemory() == nullptr) return false; > > I don't see why this check is needed here, since we already check whether the stub is there in `is_intrinsic_supported`. > > Note that `inline_unsafe_copyMemory` also doesn't have this check. I think it would be good to keep consistency between the two. Removed. > src/hotspot/share/opto/runtime.cpp line 780: > >> 778: const TypeFunc* OptoRuntime::make_setmemory_Type() { >> 779: // create input type (domain) >> 780: int num_args = 4; > > This variable seems redundant. It is. It is there due to copy/paste from the other 10 places that also have the same redundant variable declaration. I've removed it from here, but I think I'll be asked to submit a separate PR if I remove it from the other locations. Note that it's also redundant in `make_arraycopy_Type()`. > src/hotspot/share/opto/runtime.cpp line 786: > >> 784: fields[argp++] = TypePtr::NOTNULL; // dest >> 785: fields[argp++] = TypeLong::LONG; // size >> 786: fields[argp++] = Type::HALF; // size > > Since the size is a `size_t`, I don't think this is correct on 32-bit platforms. I think here we want `TypeX_X`, and then add the extra `HALF` only on 64-bit platforms. Similar to what we do in `make_arraycopy_Type`: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/runtime.cpp#L799-L802 > > (Note that you will also have to adjust `argcnt` for this) I don't understand this well enough to be confident in the change. Can you please verify that I've changed it properly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572797332 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572800059 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572804660 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572815040 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572823468 From sgibbons at openjdk.org Fri Apr 19 20:13:08 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 20:13:08 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <2jYlnjJmp3oI89OC8iPF3UIFDeabAOAD51VhipzQDE8=.7d126e34-0c72-4e3e-8de4-957cd7f8dc8b@github.com> On Fri, 19 Apr 2024 18:16:33 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Add enter() and leave(); remove Windows-specific register stuff > > src/hotspot/share/utilities/copy.hpp line 303: > >> 301: inline static void shared_disjoint_words_atomic(const HeapWord* from, >> 302: HeapWord* to, size_t count) { >> 303: switch (count) { > > I don't think this justify to change the file. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572824249 From jvernee at openjdk.org Fri Apr 19 20:21:32 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 19 Apr 2024 20:21:32 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v21] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 19:18:13 GMT, Scott Gibbons wrote: >> src/hotspot/share/opto/runtime.cpp line 786: >> >>> 784: fields[argp++] = TypePtr::NOTNULL; // dest >>> 785: fields[argp++] = TypeLong::LONG; // size >>> 786: fields[argp++] = Type::HALF; // size >> >> Since the size is a `size_t`, I don't think this is correct on 32-bit platforms. I think here we want `TypeX_X`, and then add the extra `HALF` only on 64-bit platforms. Similar to what we do in `make_arraycopy_Type`: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/runtime.cpp#L799-L802 >> >> (Note that you will also have to adjust `argcnt` for this) > > I don't understand this well enough to be confident in the change. Can you please verify that I've changed it properly? Your latest version looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572909435 From sviswanathan at openjdk.org Fri Apr 19 20:21:33 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 19 Apr 2024 20:21:33 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 2618: > 2616: __ movdqu(Address(saved_encCounter_start, 0), xmm0); > 2617: // XOR encryted block cipher in xmm0 with PT to produce CT > 2618: __ evpxorq(xmm0, xmm0, Address(src_addr, pos, Address::times_1, 0), Assembler::AVX_128bit); This could be fixed alternatively by using mask register with evpxorq. That will have lower impact on performance. @smita-kamath can share the changes needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18849#discussion_r1572908603 From dlong at openjdk.org Fri Apr 19 20:26:41 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Apr 2024 20:26:41 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 07:49:13 GMT, Galder Zamarre?o wrote: >> src/hotspot/share/c1/c1_GraphBuilder.cpp line 4441: >> >>> 4439: >>> 4440: void GraphBuilder::append_alloc_array_copy(ciMethod* callee) { >>> 4441: { >> >> There are stylistic questions to this block >> - why separate block? >> - not necessary comment "Peek at receiver" >> - misprint in "not primtive array" comment >> - extra line for { >> - excessive comment "// not primitive array" >> - bailout message do not mention phi >> - please use INLINE_BAILOUT macro >> >> Comment about phi does not make things clear: not evident why receiver type can be nullptr and why it is phi. After all, why can't we check src->exact_type()->as_array_klass()->element_type() here? > >> There are stylistic questions to this block > ... >> Comment about phi does not make things clear: not evident why receiver type can be nullptr and why it is phi. After all, why can't we check src->exact_type()->as_array_klass()->element_type() here? > > This block comes from @dean-long, I'll let him comment. > why separate block? It's left over from when I had it wrapped with `if (UseNewCode)` as I was testing it. It can be removed. > not necessary comment "Peek at receiver" OK. > misprint in "not primtive array" comment Good catch. You can include the suggested fix using the github UI and get added as a contributor if you like. > extra line for { To me it makes it more readable, but if it goes against the style guide, feel free to change it. > excessive comment "// not primitive array" OK. > bailout message do not mention phi Yes, if we bailout due to `receiver_type == nullptr` then we should have a better bailout message. But you may be correct that it is impossible. This code was copied from an earlier version where it did seem possible. If we change it we'll need a new round of testing. > please use INLINE_BAILOUT macro We would need a version of INLINE_BAILOUT that doesn't return false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1572912920 From coleenp at openjdk.org Fri Apr 19 20:39:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 Apr 2024 20:39:31 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: <40IlrOGD6Hp8Zx_BjCfvrnqOjxlwKvTaVGhDPz1ntGk=.e474fb29-75b8-40fd-97f6-9f436efd7cb9@github.com> On Fri, 19 Apr 2024 18:59:40 GMT, Andrew Haley wrote: >> This comment was above all the other uses of membar and was just moved here for convenience. The "bytecode load" being referred to here is `InterpreterMacroAssembler::dispatch_next`. The bytecode loading could be scheduled before the cache entry is resolved. > > The comment at the stop says "This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded." But this seems to be before the constant pool cache field entry is loaded. I think what this wants to say is that when you've loaded the pointer to the ResolvedFieldEntry in 'cache', previous memory operations have been synched before the fields in the resolved entry are read. I'm looking for a different word than synched. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1572928003 From svkamath at openjdk.org Fri Apr 19 20:56:28 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Fri, 19 Apr 2024 20:56:28 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . Hi, I've attached the alternative fix here. Please let me know if you have any questions. Thank you. [alternative-fix-8330611.txt](https://github.com/openjdk/jdk/files/15045540/alternative-fix-8330611.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2067270677 From kvn at openjdk.org Fri Apr 19 21:11:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 21:11:00 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v23] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 20:13:03 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments This looks good. I only have question about long vs short jumps in stub's code. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2550: > 2548: > 2549: // If zero, then we're done > 2550: __ jccb(Assembler::zero, L_exit); Code in `generate_unsafe_setmemory()` uses long jumps to `L_exit` but here you use short. Why? src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2638: > 2636: L_exit, _masm); > 2637: } > 2638: __ jmp(L_exit); Here is long jump to `L_exit` after `do_setmemory_atomic_loop()` call. Should this be also short jump? src/hotspot/share/opto/runtime.cpp line 785: > 783: fields[argp++] = TypePtr::NOTNULL; // dest > 784: fields[argp++] = TypeX_X; // size > 785: LP64_ONLY(fields[argp++] = Type::HALF); // size Nit: align `/` src/hotspot/share/utilities/copy.hpp line 2: > 1: /* > 2: * Copyright (c) 2003, 2024, Oracle and/or its affiliates. All rights reserved. You forgot to undo year change in this file. ------------- PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2012400269 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572947954 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572948693 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572955327 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572960023 From kvn at openjdk.org Fri Apr 19 21:11:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 21:11:00 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v23] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <5qkCM1RfvInEvp3ipImOqWXV7Cdg97BUCApATuR2KnI=.30f00efc-d8cd-4abe-9107-bdfa84df9165@github.com> On Fri, 19 Apr 2024 20:54:32 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2638: > >> 2636: L_exit, _masm); >> 2637: } >> 2638: __ jmp(L_exit); > > Here is long jump to `L_exit` after `do_setmemory_atomic_loop()` call. Should this be also short jump? Do we have additional code in debug VM wihch increase distance and requires long jump? I don't see it. Usually it something which call `__ STOP()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1572951726 From duke at openjdk.org Fri Apr 19 21:38:53 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 19 Apr 2024 21:38:53 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v3] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: bug fix in ::prefix_rex2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/3d62dce8..95ce7dfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri Apr 19 21:38:53 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 19 Apr 2024 21:38:53 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v2] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 11:47:37 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 12925: >> >>> 12923: void Assembler::prefix_rex2(Address adr, bool is_map1) { >>> 12924: int bits = is_map1 ? REX2BIT_M0 : 0; >>> 12925: bits |= get_base_prefix_bits(adr.base()->encoding()); >> >> Suggestion: >> >> bits |= get_base_prefix_bits(adr.base()); > > As per section 3.7.5 of Intel SDM (Index ? Scale) + Displacement is a valid addressing mode. Thus we should set the bits corresponding to extended base register encoding only if its a valid register. Thank you @jatin-bhateja. I've made the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1572991954 From sviswanathan at openjdk.org Fri Apr 19 21:43:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 19 Apr 2024 21:43:38 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v3] In-Reply-To: References: Message-ID: <32WcAB-99iB1qb1zU-pRlkq8xUDTUkZPTELfNNcHnOk=.c0cdc0c2-2540-4c51-bf22-0026e777edb7@github.com> On Fri, 19 Apr 2024 21:38:53 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > bug fix in ::prefix_rex2 src/hotspot/cpu/x86/assembler_x86.cpp line 12969: > 12967: void Assembler::prefix_rex2(Address adr, Register reg, bool byteinst, bool is_map1) { > 12968: int bits = is_map1 ? REX2BIT_M0 : 0; > 12969: bits |= get_base_prefix_bits(adr.base()->encoding()); This also needs fix to: bits |= get_base_prefix_bits(adr.base()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1572995583 From duke at openjdk.org Fri Apr 19 21:51:45 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 19 Apr 2024 21:51:45 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v3] In-Reply-To: <32WcAB-99iB1qb1zU-pRlkq8xUDTUkZPTELfNNcHnOk=.c0cdc0c2-2540-4c51-bf22-0026e777edb7@github.com> References: <32WcAB-99iB1qb1zU-pRlkq8xUDTUkZPTELfNNcHnOk=.c0cdc0c2-2540-4c51-bf22-0026e777edb7@github.com> Message-ID: On Fri, 19 Apr 2024 21:41:13 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> bug fix in ::prefix_rex2 > > src/hotspot/cpu/x86/assembler_x86.cpp line 12969: > >> 12967: void Assembler::prefix_rex2(Address adr, Register reg, bool byteinst, bool is_map1) { >> 12968: int bits = is_map1 ? REX2BIT_M0 : 0; >> 12969: bits |= get_base_prefix_bits(adr.base()->encoding()); > > This also needs fix to: > bits |= get_base_prefix_bits(adr.base()); Thank you @sviswa7, made the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1572999483 From duke at openjdk.org Fri Apr 19 21:51:45 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 19 Apr 2024 21:51:45 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: bug fix in other ::prefix_rex2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/95ce7dfa..eb246fd7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From sgibbons at openjdk.org Fri Apr 19 22:08:52 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 22:08:52 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Long to short jmp; other cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/dd0094ea..19616244 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=22-23 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555 From sgibbons at openjdk.org Fri Apr 19 22:08:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 22:08:53 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v23] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 20:53:31 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2550: > >> 2548: >> 2549: // If zero, then we're done >> 2550: __ jccb(Assembler::zero, L_exit); > > Code in `generate_unsafe_setmemory()` uses long jumps to `L_exit` but here you use short. Why? Ah - the original code (3 iterations ago) was about 10 bytes too long for a short jump. It's short enough now. Changed. > src/hotspot/share/opto/runtime.cpp line 785: > >> 783: fields[argp++] = TypePtr::NOTNULL; // dest >> 784: fields[argp++] = TypeX_X; // size >> 785: LP64_ONLY(fields[argp++] = Type::HALF); // size > > Nit: align `/` Done > src/hotspot/share/utilities/copy.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2003, 2024, Oracle and/or its affiliates. All rights reserved. > > You forgot to undo year change in this file. Yup. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573006432 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573014982 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573015145 From sgibbons at openjdk.org Fri Apr 19 22:08:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 19 Apr 2024 22:08:53 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v23] In-Reply-To: <5qkCM1RfvInEvp3ipImOqWXV7Cdg97BUCApATuR2KnI=.30f00efc-d8cd-4abe-9107-bdfa84df9165@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <5qkCM1RfvInEvp3ipImOqWXV7Cdg97BUCApATuR2KnI=.30f00efc-d8cd-4abe-9107-bdfa84df9165@github.com> Message-ID: On Fri, 19 Apr 2024 20:58:43 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2638: >> >>> 2636: L_exit, _masm); >>> 2637: } >>> 2638: __ jmp(L_exit); >> >> Here is long jump to `L_exit` after `do_setmemory_atomic_loop()` call. Should this be also short jump? > > Do we have additional code in debug VM wihch increase distance and requires long jump? I don't see it. Usually it something which call `__ STOP()`. The old code required a long jump due to the size of `do_setmemory_atomic_loop` but has since been refactored. The `jmp(Label)` code will generate a short jump provided the label has been defined and is in range. Otherwise a long jump is generated. Changed to `jmpb` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573012933 From kvn at openjdk.org Fri Apr 19 22:14:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Apr 2024 22:14:31 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <7KJIFS8Y1SqIbr847g66L6inpqMEyKXA6mIlrmrsG6o=.071b82ef-a248-41f4-a36c-e7e5ae28dacb@github.com> On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Long to short jmp; other cleanup Good. I will submit our testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067343940 From duke at openjdk.org Fri Apr 19 22:35:31 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 22:35:31 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors Message-ID: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > // An InitializeNode collects and isolates object initialization after // an AllocateNode and before the next possible safepoint. As a // memory barrier (MemBarNode), it keeps critical stores from drifting // down past any safepoint or any publication of the allocation. All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. ------------- Depends on: https://git.openjdk.org/jdk/pull/18505 Commit messages: - 8032218: Emit single post-constructor barrier for chain of superclass constructors Changes: https://git.openjdk.org/jdk/pull/18870/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18870&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8032218 Stats: 360 lines in 7 files changed: 240 ins; 117 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18870.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18870/head:pull/18870 PR: https://git.openjdk.org/jdk/pull/18870 From duke at openjdk.org Fri Apr 19 22:52:32 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Apr 2024 22:52:32 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: On Fri, 19 Apr 2024 22:31:10 GMT, Joshua Cao wrote: > Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. > > Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR removes the emitting of `StoreStore` barriers in `Parse::do_exits()`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. I originally had a [different approach](https://github.com/caojoshua/jdk/commit/eb48719f75279721bcce48d00ed262ebbf3691e0) where we detect superclass constructors and do not emit barriers for them. We can tell if a constructor is a superclass constructor if: 1. the current and parent method have the same receiver 2. the current and parent method are both initializers I didn't complete it. It fails a case where the superclass constructor does not write a final, and the child constructor does write a final. The patch would not emit a barrier in that case because the outermost constructor did not write a final. The child `Parse` would need to propagate `wrote_final` and other `wrote_*` flags to the parent `Parse`. `Parse` instances don't have any information on each other, and the code would need a fair amount of restructuring to support it. Despite the bug, it passed all the test cases I created anyway. Thats when I looked into the macro expansion for `InitializeNode` and thought that by itself was sufficient in emitting post-constructor barriers, arriving to this patch. Sharing this in case I have some wrong assumptions in this PR. We could revert to working on the initial approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18870#issuecomment-2067369248 From dlong at openjdk.org Fri Apr 19 22:59:28 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Apr 2024 22:59:28 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments If I understand correctly, the order of writes must be: 1. ResolvedFieldEntry fields, except _get_code and _put_code 2. _get_code, _put_code 3. patch_bytecode(fast_bytecode) so the order of reads must be reversed. That's why there are load-acquires when reading _get_code and _put_code. After [3] is done, after dispatching to fast_bytecode, we need to do a LoadLoad between the already read fast bytecode [3] and the "cache" fields [1]. The LoadLoad is not for the load of the next bytecode that will be done in dispatch_next(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2067373161 From dlong at openjdk.org Fri Apr 19 23:31:27 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Apr 2024 23:31:27 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: On Fri, 19 Apr 2024 22:31:10 GMT, Joshua Cao wrote: > Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. > > Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR removes the emitting of `StoreStore` barriers in `Parse::do_exits()`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. Doesn't this only work if the allocation and call to ctor are compiled together? Where is the StoreStore added if we compile a method by itself? The allocation could be in the interpreter but the when calling we call compiled code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18870#issuecomment-2067390603 From kvn at openjdk.org Sat Apr 20 04:31:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 Apr 2024 04:31:34 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <0cg24YXFi4foGH_uKTY6JmABMhzjMH6gmH78iE0CC4w=.a52937a2-d728-4616-b158-a2a338cbb6f4@github.com> On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Long to short jmp; other cleanup `runtime/Unsafe/InternalErrorTest.java` test SIGBUS when run with `-Xcomp` (and other flags in test's @run command): # SIGBUS (0xa) at pc=0x0000000119514760, pid=63021, tid=28163 # # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) # Problematic frame: # v ~StubRoutines::jbyte_fill 0x0000000119514760 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067547078 From aph at openjdk.org Sat Apr 20 12:39:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 20 Apr 2024 12:39:29 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 22:56:29 GMT, Dean Long wrote: > If I understand correctly, the order of writes must be: > > 1. ResolvedFieldEntry fields, except _get_code and _put_code So, release fence here? > 2. _get_code, _put_code and another here > 3. patch_bytecode(fast_bytecode) > > > so the order of reads must be reversed. That's why there are load-acquires when reading _get_code and _put_code. After [3] is done, after dispatching to fast_bytecode, we need to do a LoadLoad between the already read fast bytecode [3] and the "cache" fields [1]. The LoadLoad is not for the load of the next bytecode that will be done in dispatch_next(). So, I guess the loadload fence being inserted here is the one we need between [2] and [3]. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2067660791 From jbhateja at openjdk.org Sat Apr 20 14:25:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 20 Apr 2024 14:25:33 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Long to short jmp; other cleanup src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > 2528: switch (type) { > 2529: case USM_SHORT: > 2530: __ movw(Address(dest, (2 * i)), wide_value); MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit stores, any specific reason for keeping same unroll factor for all the stores. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > 2537: break; > 2538: } > 2539: } I understand we want to be as accurate as possible in filling the tail in an event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if destination segment was quadword aligned, aligned quadword stores are implicitly atomic on x86 targets, what's your thoughts on using a vector instruction based loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573297441 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573299069 From jbhateja at openjdk.org Sat Apr 20 18:20:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 20 Apr 2024 18:20:54 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v2] In-Reply-To: References: Message-ID: > - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. > - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. > - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. > - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18749/files - new: https://git.openjdk.org/jdk/pull/18749/files/493470c1..0c67e68a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18749&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18749&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18749/head:pull/18749 PR: https://git.openjdk.org/jdk/pull/18749 From sgibbons at openjdk.org Sat Apr 20 19:09:43 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 19:09:43 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v25] In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: <3Z1vY5KHl-D5I1VoQYb6w0B1QToR0cVOnOov_vfrAe0=.d7e4944e-b781-477a-862b-dc067fab9d13@github.com> > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fix UnsafeCopyMemoryMark scope issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18555/files - new: https://git.openjdk.org/jdk/pull/18555/files/19616244..c1290169 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555 From sgibbons at openjdk.org Sat Apr 20 19:09:44 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 19:09:44 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Long to short jmp; other cleanup > > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530: > >> 2528: switch (type) { >> 2529: case USM_SHORT: >> 2530: __ movw(Address(dest, (2 * i)), wide_value); > > MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit stores, any specific reason for keeping same unroll factor for all the stores. My understanding is the spec requires the appropriate-sized write based on alignment and size. This is why there's no 128-bit or 256-bit store loops. > src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539: > >> 2537: break; >> 2538: } >> 2539: } > > I understand we want to be as accurate as possible in filling the tail in an event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if destination segment was quadword aligned, aligned quadword stores are implicitly atomic on x86 targets, what's your thoughts on using a vector instruction based loop. I believe the spec is specific on the size of the store required given alignment and size. I want to honor that spec even though wider stores could be done in many cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720 PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108 From sgibbons at openjdk.org Sat Apr 20 19:09:44 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 19:09:44 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 19 Apr 2024 22:08:52 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Long to short jmp; other cleanup The SIGBUS was due to improper scoping of the UnsafeCopyMemoryMark. The change is: ` {` ` // Add set memory mark to protect against unsafe accesses faulting` `- UnsafeCopyMemoryMark(this, ((t == T_BYTE) && !aligned), true);` `+ UnsafeCopyMemoryMark usmm(this, ((t == T_BYTE) && !aligned), true);` ` __ generate_fill(t, aligned, to, value, r11, rax, xmm0);` ` }` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067758164 From sgibbons at openjdk.org Sat Apr 20 19:14:32 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 19:14:32 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v24] In-Reply-To: <0cg24YXFi4foGH_uKTY6JmABMhzjMH6gmH78iE0CC4w=.a52937a2-d728-4616-b158-a2a338cbb6f4@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <0cg24YXFi4foGH_uKTY6JmABMhzjMH6gmH78iE0CC4w=.a52937a2-d728-4616-b158-a2a338cbb6f4@github.com> Message-ID: On Sat, 20 Apr 2024 04:28:43 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Long to short jmp; other cleanup > > `runtime/Unsafe/InternalErrorTest.java` test SIGBUS when run with `-Xcomp` (and other flags in test's @run command): > > # SIGBUS (0xa) at pc=0x0000000119514760, pid=63021, tid=28163 > # > # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-04-19-2326152.vladimir.kozlov.jdkgit2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) > # Problematic frame: > # v ~StubRoutines::jbyte_fill 0x0000000119514760 @vnkozlov Thanks for the feedback. Can you please start the testing again? I'd appreciate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067759300 From kvn at openjdk.org Sat Apr 20 20:48:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 Apr 2024 20:48:31 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v25] In-Reply-To: <3Z1vY5KHl-D5I1VoQYb6w0B1QToR0cVOnOov_vfrAe0=.d7e4944e-b781-477a-862b-dc067fab9d13@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <3Z1vY5KHl-D5I1VoQYb6w0B1QToR0cVOnOov_vfrAe0=.d7e4944e-b781-477a-862b-dc067fab9d13@github.com> Message-ID: On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix UnsafeCopyMemoryMark scope issue Before I do testing, please sync with mainline. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067777569 From sgibbons at openjdk.org Sat Apr 20 22:31:48 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 22:31:48 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v26] In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'openjdk:master' into setMemory - Fix UnsafeCopyMemoryMark scope issue - Long to short jmp; other cleanup - Review comments - Address review comments; update copyright years - Add enter() and leave(); remove Windows-specific register stuff - Fix memory mark after sync to upstream - Merge branch 'openjdk:master' into setMemory - Set memory test (#23) * Even more review comments * Re-write of atomic copy loops * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} * Only add a memory mark for byte unaligned fill * Remove MUSL_LIBC ifdef * Remove MUSL_LIBC ifdef - Set memory test (#22) * Even more review comments * Re-write of atomic copy loops * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} * Only add a memory mark for byte unaligned fill - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 ------------- Changes: https://git.openjdk.org/jdk/pull/18555/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18555&range=25 Stats: 507 lines in 36 files changed: 420 ins; 5 del; 82 mod Patch: https://git.openjdk.org/jdk/pull/18555.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18555/head:pull/18555 PR: https://git.openjdk.org/jdk/pull/18555 From sgibbons at openjdk.org Sat Apr 20 22:31:48 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sat, 20 Apr 2024 22:31:48 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v25] In-Reply-To: <3Z1vY5KHl-D5I1VoQYb6w0B1QToR0cVOnOov_vfrAe0=.d7e4944e-b781-477a-862b-dc067fab9d13@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> <3Z1vY5KHl-D5I1VoQYb6w0B1QToR0cVOnOov_vfrAe0=.d7e4944e-b781-477a-862b-dc067fab9d13@github.com> Message-ID: On Sat, 20 Apr 2024 19:09:43 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fix UnsafeCopyMemoryMark scope issue Merge done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2067803696 From jbhateja at openjdk.org Sun Apr 21 13:40:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 21 Apr 2024 13:40:00 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v18] In-Reply-To: References: Message-ID: <1nYoblnuDaQmp__ljc9W0EjmTtPGXI7zxS0QwaRlpUM=.50798174-eb25-4c5e-879b-029f15e135e4@github.com> > Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro. > > ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) > > > 2) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Review resolutions. - Review comment resolutions. - Review comments resolutions - Review comments resolutions. - Review comments resolutions. - Generalizing masked sub-gather support. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Fix incorrect comment - ... and 6 more: https://git.openjdk.org/jdk/compare/6d569961...b24cc5cd ------------- Changes: https://git.openjdk.org/jdk/pull/16354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=17 Stats: 1178 lines in 32 files changed: 1129 ins; 21 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354 From kvn at openjdk.org Sun Apr 21 16:45:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 21 Apr 2024 16:45:39 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v26] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Sat, 20 Apr 2024 22:31:48 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge branch 'openjdk:master' into setMemory > - Fix UnsafeCopyMemoryMark scope issue > - Long to short jmp; other cleanup > - Review comments > - Address review comments; update copyright years > - Add enter() and leave(); remove Windows-specific register stuff > - Fix memory mark after sync to upstream > - Merge branch 'openjdk:master' into setMemory > - Set memory test (#23) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > > * Remove MUSL_LIBC ifdef > > * Remove MUSL_LIBC ifdef > - Set memory test (#22) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 My testing passed. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2013478795 From sgibbons at openjdk.org Sun Apr 21 21:01:38 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sun, 21 Apr 2024 21:01:38 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v26] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Sat, 20 Apr 2024 22:31:48 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge branch 'openjdk:master' into setMemory > - Fix UnsafeCopyMemoryMark scope issue > - Long to short jmp; other cleanup > - Review comments > - Address review comments; update copyright years > - Add enter() and leave(); remove Windows-specific register stuff > - Fix memory mark after sync to upstream > - Merge branch 'openjdk:master' into setMemory > - Set memory test (#23) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > > * Remove MUSL_LIBC ifdef > > * Remove MUSL_LIBC ifdef > - Set memory test (#22) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 Thank you all for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2068196116 From jbhateja at openjdk.org Sun Apr 21 23:24:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 21 Apr 2024 23:24:41 GMT Subject: Integrated: 8318650: Optimized subword gather for x86 targets. In-Reply-To: References: Message-ID: <6zpY5qLpjjfNh62GcpLMHjB_big53dvZmPhwCLMRvCU=.409cf55f-08d6-4135-b7bc-9a544cc18eaa@github.com> On Wed, 25 Oct 2023 04:34:59 GMT, Jatin Bhateja wrote: > Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro. > > ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) > > > 2) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 185e711b Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/185e711bfe4c4d013b56e867f85cfb4177b3a2cf Stats: 1178 lines in 32 files changed: 1129 ins; 21 del; 28 mod 8318650: Optimized subword gather for x86 targets. Reviewed-by: sviswanathan, epeter, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/16354 From jbhateja at openjdk.org Sun Apr 21 23:27:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 21 Apr 2024 23:27:44 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v26] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Sat, 20 Apr 2024 22:31:48 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge branch 'openjdk:master' into setMemory > - Fix UnsafeCopyMemoryMark scope issue > - Long to short jmp; other cleanup > - Review comments > - Address review comments; update copyright years > - Add enter() and leave(); remove Windows-specific register stuff > - Fix memory mark after sync to upstream > - Merge branch 'openjdk:master' into setMemory > - Set memory test (#23) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > > * Remove MUSL_LIBC ifdef > > * Remove MUSL_LIBC ifdef > - Set memory test (#22) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18555#pullrequestreview-2013564907 From sgibbons at openjdk.org Sun Apr 21 23:27:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Sun, 21 Apr 2024 23:27:45 GMT Subject: Integrated: 8329331: Intrinsify Unsafe::setMemory In-Reply-To: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Fri, 29 Mar 2024 22:32:06 GMT, Scott Gibbons wrote: > This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. > > Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. > > Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). > > [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) This pull request has now been integrated. Changeset: bd67ac69 Author: Scott Gibbons Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/bd67ac69a234cd1096e534c7d4a45d88715884b4 Stats: 507 lines in 36 files changed: 420 ins; 5 del; 82 mod 8329331: Intrinsify Unsafe::setMemory Reviewed-by: sviswanathan, jbhateja, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18555 From fyang at openjdk.org Mon Apr 22 03:14:38 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 22 Apr 2024 03:14:38 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v2] In-Reply-To: References: Message-ID: On Wed, 21 Feb 2024 16:30:24 GMT, Hamlin Li wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix space > >> Hello Hamlin, I recall you had licheepi board. I would be nice if you can try to measure rvv performance gain with this https://github.com/syntacore/syntaj21/tree/rvv0.7.1 >> >> This PR showed it's not always easy to win perf just by using rvv - #17413 >> >> I understand it might not be possible, but would be nice to give it a try (I can share hsdis with support for 0.7.1 if needed) > > Had a internal discussion about your suggestion, seems 0.7.1 is not incompatible with 1.0/2.0, and for this simple intrinsic, we think a better path is to have it first, then re-visit it when we have real hardware to measure the performance later. @Hamlin-Li: Thanks for the quick update. Considering saving/restoring for FRM could be expensive, I do wonder if we could gather some performance numbers before we go. I see people are now testing on RVV-1.0 hardwares [1] and I am also trying to get one. Also from discussion on [2], I see there are also other approaches available there without flipping the FP rounding mode. But I am not sure if they make sense for our case or work better without actual testing. [1] https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255 [2] https://github.com/openjdk/jdk/pull/8204 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17745#issuecomment-2068404097 From chagedorn at openjdk.org Mon Apr 22 06:09:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 06:09:35 GMT Subject: Integrated: 8330004: Refactor cloning down code in Split If for Template Assertion Predicates In-Reply-To: References: Message-ID: On Wed, 10 Apr 2024 13:19:11 GMT, Christian Hagedorn wrote: > This is another patch split off https://github.com/openjdk/jdk/pull/16877. It refactors the "cloning down" code for Split If with Template Assertion Predicates. This mainly includes the replacement of `subgraph_has_opaque()` with a new class `TemplateAssertionPredicateExpressionNode`. More details can be found as PR comments. > > #### Background > > The cloning down code is required in Split If when trying to split any node up that belongs to a Template Assertion Predicate Expression (TAPE) (including the `OpaqueLoop*` nodes). We need to prevent that to avoid having any phi nodes in the TAPE which could result in failures when trying to later match and clone Template Assertion Predicates. Instead of cloning such a TAPE node up, we clone ("down") the entire TAPE. > > Thanks, > Christian This pull request has now been integrated. Changeset: 20546c1e Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/20546c1ea064daa8e2faa71142904ea2c62b3311 Stats: 320 lines in 7 files changed: 237 ins; 65 del; 18 mod 8330004: Refactor cloning down code in Split If for Template Assertion Predicates Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18723 From duke at openjdk.org Mon Apr 22 07:11:44 2024 From: duke at openjdk.org (Swati Sharma) Date: Mon, 22 Apr 2024 07:11:44 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: > Hi All, > > Added a new jtreg test case for large arrayCopy disjoint case. > This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. > > Please review and provide your feedback. > > Thanks, > Swati > Intel Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: 8326421: Resolved review comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17962/files - new: https://git.openjdk.org/jdk/pull/17962/files/436a17f1..fcdbf18a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17962&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17962&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17962/head:pull/17962 PR: https://git.openjdk.org/jdk/pull/17962 From dfenacci at openjdk.org Mon Apr 22 07:31:37 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 22 Apr 2024 07:31:37 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 01:38:28 GMT, Jasmine Karthikeyan wrote: > This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. > I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. > > Thoughts and reviews would be appreciated! Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18734#pullrequestreview-2013964089 From fyang at openjdk.org Mon Apr 22 07:34:28 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 22 Apr 2024 07:34:28 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> References: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> Message-ID: On Thu, 18 Apr 2024 13:40:30 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch for instrinsic VectorLoadShuffle? >> >> BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. >> >> Thanks >> >> ## Test >> test/jdk/jdk/incubator/vector/ >> test/hotspot/jtreg/compiler/vectorapi > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add comment Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/riscv_v.ad line 3581: > 3579: predicate(Matcher::vector_element_basic_type(n) == T_BYTE); > 3580: match(Set dst (VectorLoadShuffle dst)); > 3581: effect(TEMP_DEF dst); Seems no need to add a `TEMP_DEF` for `dst` here. src/hotspot/cpu/riscv/riscv_v.ad line 3586: > 3584: // For T_BYTE, no need to do anything > 3585: %} > 3586: ins_pipe(pipe_slow); I think `pipe_class_empty` is better since this emits nothing at all. src/hotspot/cpu/riscv/riscv_v.ad line 3602: > 3600: __ vsetvli_helper(bt, Matcher::vector_length(this)); > 3601: if (bt == T_SHORT) { > 3602: __ vsext_vf2(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg)); I prefer `vzext_vf2/4/8` which does zero extension for the source indexes. ------------- PR Review: https://git.openjdk.org/jdk/pull/18835#pullrequestreview-2013895647 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574229098 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574214923 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574261651 From thartmann at openjdk.org Mon Apr 22 07:43:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 07:43:33 GMT Subject: RFR: 8330621: Make 5 compiler tests use ProcessTools.executeProcess [v2] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 14:42:35 GMT, Evgeny Nikitin wrote: >> Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. >> >> As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Fix issues Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18856#pullrequestreview-2013992245 From rcastanedalo at openjdk.org Mon Apr 22 07:43:28 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Apr 2024 07:43:28 GMT Subject: RFR: 8330587: IGV: remove ControlFlowTopComponent In-Reply-To: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> References: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> Message-ID: On Fri, 19 Apr 2024 09:33:21 GMT, Tobias Holenstein wrote: > The control flow window (very right, next to Bytecodes) implemented by ControlFlowTopComponent is no longer used with the availability of the new CFG view. > > Therefore ControlFlowTopComponent is removed Looks good, thanks for cleaning it up! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18859#pullrequestreview-2013994514 From enikitin at openjdk.org Mon Apr 22 07:43:34 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Mon, 22 Apr 2024 07:43:34 GMT Subject: Integrated: 8330621: Make 5 compiler tests use ProcessTools.executeProcess In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 07:22:06 GMT, Evgeny Nikitin wrote: > Said tests use simple `new ProcessBuilder` and its `start` method to start secondary processes. > > As stated in [JDK-8174768](https://bugs.openjdk.org/browse/JDK-8174768), we try to have more information about started secondary processes and make the execution more controllable. This PR makes those tests use ProcessTools.executeProcess instead of using the `.start` method. This pull request has now been integrated. Changeset: 5394f57f Author: Evgeny Nikitin Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/5394f57f002c066021d811382a336253ae9f2014 Stats: 23 lines in 5 files changed: 5 ins; 4 del; 14 mod 8330621: Make 5 compiler tests use ProcessTools.executeProcess Reviewed-by: chagedorn, stefank, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18856 From rcastanedalo at openjdk.org Mon Apr 22 07:55:31 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Apr 2024 07:55:31 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 12:00:51 GMT, Emanuel Peter wrote: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... Nice analysis, Emanuel! > An alternative solution would have been to add a corresponding CastLL for both the load and store, and hope that this means that the final address would look identical, though I don't know how difficult that would be. I think it would be worth exploring this alternative before committing to the canonicalization solution proposed here. If we could get GVN to find the equivalence instead, that would be a more general solution in the sense that other transformations may also benefit from it. ------------- PR Review: https://git.openjdk.org/jdk/pull/18795#pullrequestreview-2014027360 From chagedorn at openjdk.org Mon Apr 22 08:02:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 08:02:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v3] In-Reply-To: References: Message-ID: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Remaining renaming and small clean-ups ------------- Changes: https://git.openjdk.org/jdk/pull/16877/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=02 Stats: 75 lines in 4 files changed: 17 ins; 7 del; 51 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From chagedorn at openjdk.org Mon Apr 22 08:36:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 08:36:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix useful Parse Predicate marking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16877/files - new: https://git.openjdk.org/jdk/pull/16877/files/e7e52b61..f9f74276 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From chagedorn at openjdk.org Mon Apr 22 08:36:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 08:36:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: On Fri, 22 Dec 2023 17:26:10 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix useful Parse Predicate marking > > Ok, this is it for now. I think it is awesome how you are refactoring the code, and packing it into classes to break up large methods ? @eme64 @rwestrel I've updated the PR after the integration of the PRs that I've split off from this one (see updated PR description). We are now only left with a few small refactorings which I propose with this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16877#issuecomment-2068807827 From chagedorn at openjdk.org Mon Apr 22 08:36:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 08:36:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 08:33:26 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix useful Parse Predicate marking src/hotspot/share/opto/loopnode.cpp line 4314: > 4312: > 4313: void PhaseIdealLoop::mark_useful_parse_predicates_for_loop(IdealLoopTree* loop) { > 4314: Node* entry = loop->_head->as_Loop()->skip_strip_mined()->in(LoopNode::EntryControl); Is now required due to changing `can_apply_loop_predication()` which is checked before calling this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16877#discussion_r1574353790 From bulasevich at openjdk.org Mon Apr 22 08:52:35 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 22 Apr 2024 08:52:35 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 09:11:40 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: > > - Use vmIntrinsics instead of vmIntrinsicID > - Fix formatting src/hotspot/share/c1/c1_GraphBuilder.cpp line 4452: > 4450: return; > 4451: } > 4452: } Suggestion: const int args_base = state()->stack_size() - callee->arg_size(); ciType* receiver_type = state()->stack_at(args_base)->exact_type(); if (receiver_type == nullptr) { inline_bailout("must have a receiver"); return; } if (!receiver_type->is_type_array_klass()) { inline_bailout("clone array not primitive"); return; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1574381077 From shade at openjdk.org Mon Apr 22 08:58:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Apr 2024 08:58:32 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 15:49:59 GMT, Roland Westrelin wrote: >> This fixes 3 calls to ABS with a min int argument. I think all of them >> are harmless: >> >> - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The >> check is for a stride of 1 or -1. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the >> computation of `scaled_iters_long`, the stride is passed to `ABS()` >> and then implicitly casted to long. I now cast the stride to long >> before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` >> overflows the int range for all values of `LoopStripMiningIter` >> except 0 or 1. Those values are handled early on in that method. So >> for a min in stride: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> is always true and the method returns early. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the >> computation of `short_scaled_iters` also calls `ABS()` with the >> stride as argument. But the result of that computation is only used >> if the test for: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> doesn't cause an early return of the method. I reordered statements >> so the `ABS()` calls happens after that test which will cause an early >> return if the stride is min int. > > Thanks for reviewing this. > >> Have you tried running tests with #18751 applied? > > I only ran the particular test that you mentioned in the bug. @rwestrel, if you could integrate this, we can then go forward with #18751. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2068855426 From thartmann at openjdk.org Mon Apr 22 09:07:33 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 09:07:33 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: On Fri, 19 Apr 2024 06:11:15 GMT, Evgeny Nikitin wrote: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18854#pullrequestreview-2014202521 From thartmann at openjdk.org Mon Apr 22 09:17:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 09:17:30 GMT Subject: RFR: 8329258: TailCall should not use frame pointer register for jump target [v4] In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 13:23:46 GMT, Boris Ulasevich wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment adjustment > >> I think other platforms are affected as well but I don't have the hardware to test there. >> @bulasevich (ARM32), could you please have a look? > > Hi. I checked ARM32. R11 (FP) is a common register that is not just dedicated solely to the frame pointer. And with a given test and patch I can not reproduce SIGSEGV on ARM32 platform. So I think ARM32 is not affected. Thanks for checking, @bulasevich and @dean-long. The current patch should be sufficient then. Could I please get a second review? (@theRealAph?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18716#issuecomment-2068894491 From thartmann at openjdk.org Mon Apr 22 09:25:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 09:25:28 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: <_cTw81FirOsbODzIudcgDUtX-J8yx6uD9WzVSFWjPjU=.fb11e748-2caa-4feb-a546-272b5b635389@github.com> On Mon, 15 Apr 2024 07:04:11 GMT, Roberto Casta?eda Lozano wrote: >> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). >> >> The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: >> >> >> import java.lang.invoke.VarHandle; >> import java.lang.invoke.MethodHandles; >> >> public class Example { >> static class Outer { >> Object f; >> } >> >> static final VarHandle fVarHandle; >> static { >> MethodHandles.Lookup l = MethodHandles.lookup(); >> try { >> fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); >> } catch (Exception e) { >> throw new Error(e); >> } >> } >> >> static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { >> return fVarHandle.compareAndSet(o, oldVal, newVal); >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10_000; i++) { >> Outer o = new Outer(); >> Object oldVal = new Object(); >> o.f = oldVal; >> Object newVal = new Object(); >> testCompareAndSwap(o, oldVal, newVal); >> } >> } >> } >> >> >> Before this changeset, issuing this command: >> >> >> $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP >> >> >> gives the following dump: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) >> >> >> After this changeset, we get: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleRefer... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add example Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18754#pullrequestreview-2014239622 From chagedorn at openjdk.org Mon Apr 22 09:26:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 09:26:28 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: <1RdDiF2OtiIazT9XvjQ1-6piyNXawEjEz1owMSOEUnA=.b6176d03-99e2-4d23-a16b-3b47dec91ab3@github.com> On Fri, 19 Apr 2024 06:11:15 GMT, Evgeny Nikitin wrote: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. test/hotspot/jtreg/compiler/ccp/TestShiftConvertAndNotification.java line 41: > 39: * @summary Test CCP notification for value update of AndL through LShiftI and > 40: * ConvI2L (no flags). > 41: * @run main/othervm compiler.ccp.TestShiftConvertAndNotification Can we just use `main` instead of `main/othervm`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18854#discussion_r1574430488 From rcastanedalo at openjdk.org Mon Apr 22 09:30:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Apr 2024 09:30:30 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Mon, 15 Apr 2024 07:04:11 GMT, Roberto Casta?eda Lozano wrote: >> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). >> >> The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: >> >> >> import java.lang.invoke.VarHandle; >> import java.lang.invoke.MethodHandles; >> >> public class Example { >> static class Outer { >> Object f; >> } >> >> static final VarHandle fVarHandle; >> static { >> MethodHandles.Lookup l = MethodHandles.lookup(); >> try { >> fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); >> } catch (Exception e) { >> throw new Error(e); >> } >> } >> >> static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { >> return fVarHandle.compareAndSet(o, oldVal, newVal); >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10_000; i++) { >> Outer o = new Outer(); >> Object oldVal = new Object(); >> o.f = oldVal; >> Object newVal = new Object(); >> testCompareAndSwap(o, oldVal, newVal); >> } >> } >> } >> >> >> Before this changeset, issuing this command: >> >> >> $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP >> >> >> gives the following dump: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) >> >> >> After this changeset, we get: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleRefer... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add example Thanks for reviewing, Tobias! @vnkozlov: I added a simple IR test as requested, could I get a second review? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18754#issuecomment-2068919807 From thartmann at openjdk.org Mon Apr 22 09:30:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 09:30:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: <4VsAbkEXJ6UucDnbBsnti-FC59KtjkB5nutBU66mNKk=.a6d16ce2-c760-4f48-b7a7-e7d321b6b073@github.com> On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 39: > 37: * @modules jdk.incubator.vector > 38: * > 39: * @run main compiler.vectorapi.VectorGatherMaskFoldingTest Suggestion: * @run driver compiler.vectorapi.VectorGatherMaskFoldingTest test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 151: > 149: > 150: @Test > 151: @Warmup(10000) Since all tests use the same warmup, I would suggest to set it once via `testFrameworkobject.setDefaultWarmup(10000)`, see https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1574432877 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1574435811 From thartmann at openjdk.org Mon Apr 22 10:49:57 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 10:49:57 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> Message-ID: <5GdKtMxnmsxXDm8hUO5xR7OtIZG5TTmz2wqpv4xcEhA=.62b46ad9-2387-418c-88ec-f124e7b099b8@github.com> On Fri, 19 Apr 2024 17:16:24 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Comment on not allowing macro nodes after we start expanding. Rename > dont_allow_macro_nodes to reset_allow_macro_nodes. Looks good to me. I submitted testing and will report back once it passed. Please adjust the description of [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531) according to the new naming. > Somehow the CMove condition is converted to non-canonical >, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The MinL is never created and there is no crash. Is the problem that the condition is not canonicalized or that the CMoveNode is not process by IGVN after canonicalization of the cmp? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18824#pullrequestreview-2014424503 From stuefe at openjdk.org Mon Apr 22 10:55:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 22 Apr 2024 10:55:49 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report Message-ID: Somewhat trivial change to reduce the chance of tearing the final compilation cost history report. See JBS for details. --- The patch: - upon end of a compilation, we print the the offending log line and account the cost in the compilation cost history table. For the latter we lock over NMTCompilationCostHistory_lock. The patch swaps these two actions such that we print after pulling the lock. That greatly reduces, albeit not completely removes, the chance of printing log lines into the final report. (I did not want to widen the scope of that lock to include the printout) - also moves the locking of NMTCompilationCostHistory_lock up to the start of the reporting function to include printing the report header into the locking ------------- Commit messages: - Update compilationMemoryStatistic.cpp - Update compilationMemoryStatistic.cpp - Start Changes: https://git.openjdk.org/jdk/pull/18866/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18866&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330625 Stats: 39 lines in 1 file changed: 12 ins; 13 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/18866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18866/head:pull/18866 PR: https://git.openjdk.org/jdk/pull/18866 From epeter at openjdk.org Mon Apr 22 11:01:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Apr 2024 11:01:27 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: Message-ID: <80ffdEZa8SFfrG5s61q9VHxfywR0W_TihbGqHAM9gnw=.1e7c3ef1-3ff6-4a73-8df5-894b717edbc6@github.com> On Mon, 22 Apr 2024 07:53:05 GMT, Roberto Casta?eda Lozano wrote: >> This is an enhancement for AutoVectorization. >> >> I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). >> >> Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. >> >> **Solution Sketch: "canonicalize" the invar** >> >> - Extract all summands of the `invar`: make a list. >> - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. >> - Bypass `CastLL` and `CastII` >> - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. >> >> - Sort all extracted summands by node idx. >> - Add up all summands in new order. >> >> If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. >> >> **Example** >> >> >> invar1 = b + c + d + a >> invar2 = d + b + a + c >> >> -> equivalent but not identical nodes >> >> Sort, and add up again: >> >> invar1 = a + b + c + d >> invar2 = a + b + c + d >> >> -> now the nodes are identical >> >> **Motivation: MemorySegment with invar** >> >> One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? >> >> This example did not vectorize, even though it should: >> https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 >> >> Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. >> >> Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. >> >> The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. >> >> Why does this happen? After parsing, the graph looks like this: >> ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) >> >> We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. So... > > Nice analysis, Emanuel! > >> An alternative solution would have been to add a corresponding CastLL for both the load and store, and hope that this means that the final address would look identical, though I don't know how difficult that would be. > > I think it would be worth exploring this alternative before committing to the canonicalization solution proposed here. If we could get GVN to find the equivalence instead, that would be a more general solution in the sense that other transformations may also benefit from it. @robcasloz I discussed it a little with @chhagedorn. We think it would be tricky to get such a IGVN optimization right, and possibly it would not work for all cases. Basically, we would have some value `v` with a direct use `use1` and an indirect use `use2` via some `CastLL`. v +-+-------+ | | use1 CastLL (with dependency, and constrained type) | use2 The `CastLL` has a `If` dependency, for some `RangeCheck` for example, and therefore it constrains also the type of `v`. To get `use1` and `use2` to have the same input, we would either have to: - Remove the `CastLL`: not a good idea, this would lose type info and in some cases the lost pinning would lead some nodes further down to float up. - Make `use1` use the `CastLL` instead of `v` directly. But then we would need to prove that all uses of `use1` are under the same dependency as the `CastLL`, and I think that would be tricky and error prone. And probably we could not make it work in all cases. I would really like `SuperWord` to be robust to `CastII/LL`, and this optimization here does that. It also allows to parse more cases, i.e. `a + b + c` and `b + c + a`. I suspect that this could happen to a Java user, that they accidentally swap the addition order of an index (though maybe rare). @robcasloz What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18795#issuecomment-2069098559 From epeter at openjdk.org Mon Apr 22 11:23:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Apr 2024 11:23:30 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 08:36:57 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix useful Parse Predicate marking I only have one question about that assert. Otherwise it looks like a reasonable cleanup :) src/hotspot/share/opto/loopPredicate.cpp line 110: > 108: ProjNode* uncommon_proj = parse_predicate->proj_out(false); > 109: Node* uct_region = uncommon_proj->unique_ctrl_out(); > 110: assert(uct_region->is_Region() || uct_region->is_Call(), "must be a region or call uct"); Did you want to remove this assert? Or is it elsewhere now? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16877#pullrequestreview-2014479433 PR Review Comment: https://git.openjdk.org/jdk/pull/16877#discussion_r1574582588 From chagedorn at openjdk.org Mon Apr 22 11:29:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 11:29:32 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 08:36:57 GMT, Christian Hagedorn wrote: >> **Update: April 22** >> >> After splitting off and integrating the following PRs from this PR: >> https://github.com/openjdk/jdk/pull/18080 >> https://github.com/openjdk/jdk/pull/18293 >> https://github.com/openjdk/jdk/pull/18628 >> https://github.com/openjdk/jdk/pull/18723 >> >> we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). >> >>
>>
>> >> _------------ Original PR description --------------_ >> >> This patch is intended for JDK 23. >> >> While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. >> >> The patch applies the following cleanup changes: >> - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: >> - `clone()`: Clone without modification >> - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. >> - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. >> >> This refactoring could be extracted from the complete fix. >> - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. >> - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: >> - Renaming >> - Extracting code to separate classes/methods >> - Adding comments >> - Some small refactoring including: >> - Removi... > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Fix useful Parse Predicate marking Thanks Emanuel for your review and your good feedback leading to split off and integrate the other PRs :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/16877#pullrequestreview-2014496791 From chagedorn at openjdk.org Mon Apr 22 11:29:33 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Apr 2024 11:29:33 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v4] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:16:34 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix useful Parse Predicate marking > > src/hotspot/share/opto/loopPredicate.cpp line 110: > >> 108: ProjNode* uncommon_proj = parse_predicate->proj_out(false); >> 109: Node* uct_region = uncommon_proj->unique_ctrl_out(); >> 110: assert(uct_region->is_Region() || uct_region->is_Call(), "must be a region or call uct"); > > Did you want to remove this assert? Or is it elsewhere now? It's already covered in `ParsePredicate::uncommon_trap()`: https://github.com/openjdk/jdk/blob/3e185c70feef3febf75c58a5d4d394a4b772105f/src/hotspot/share/opto/ifnode.cpp#L2149-L2154 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16877#discussion_r1574594167 From thartmann at openjdk.org Mon Apr 22 11:47:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 11:47:32 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 14:14:57 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8324517 > - test and fix Great job coming up with these tests, Roland! Did you check if the other usages of `_range_check_dependency` via `CastIINode::has_range_check` are still needed? Seems to me as if at least the checks in `PhaseIdealLoop::match_fill_loop` can be removed. src/hotspot/share/opto/compile.cpp line 3471: > 3469: remove_range_check_cast(n->as_CastII()); > 3470: } > 3471: break; Indentation is off. src/hotspot/share/opto/compile.cpp line 3896: > 3894: // Range check CastII nodes feed into an address computation subgraph. Remove them to let that subgraph float freely. > 3895: // For memory access or integer divisions nodes that depend on the cast, record the dependency on the cast's control > 3896: // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminated a Suggestion: // as a precedence edge, so they can't float above the cast in case that cast's narrowed type helped eliminate a src/hotspot/share/opto/compile.cpp line 3906: > 3904: for (DUIterator_Fast imax, i = m->fast_outs(imax); i < imax; i++) { > 3905: Node* use = m->fast_out(i); > 3906: if (use->is_Mem() || use->Opcode() == Op_DivI || use->Opcode() == Op_DivL) { `Op_ModI` and `Op_ModL` are missing here. And isn't this too strong in cases where we can prove that the operand is non-zero? Could you re-use `PhaseIterGVN::no_dependent_zero_check`? Please also add corresponding tests. Looking at `PhaseIterGVN::no_dependent_zero_check`, I noticed that `UDiv[I/L]Node` and `UMod[I/L]Node` are not handled but I think they should. I think this was missed when these nodes where added by [JDK-8282221](https://bugs.openjdk.org/browse/JDK-8282221). One can probably extend @chhagedorn's test from [JDK-8259227](https://bugs.openjdk.org/browse/JDK-8259227) to trigger the same issue. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18377#pullrequestreview-2014468911 PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1574576215 PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1574577525 PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1574616943 From rcastanedalo at openjdk.org Mon Apr 22 12:31:28 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Apr 2024 12:31:28 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: Message-ID: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> On Tue, 16 Apr 2024 12:00:51 GMT, Emanuel Peter wrote: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... Fair enough, I agree that, even if there was a solution for this specific case, we would probably encounter other cases where GVN would not be powerful enough to detect the equivalence. I still wonder though what could be causing the `CastLL(invar + iv)` vs. `invar + iv` divergence in your example and whether anything could be done to get rid of it. Maybe worth filing a RFE for further investigation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18795#issuecomment-2069273106 From thartmann at openjdk.org Mon Apr 22 12:44:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Apr 2024 12:44:36 GMT Subject: RFR: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 14:14:57 GMT, Roland Westrelin wrote: >> Range check `CastII` nodes are removed once loop opts are over. The >> test case for this change includes 3 cases where elimination of a >> range check `CastII` causes a crash in compiled code because either a >> out of bounds array load or a division by zero happen. >> >> In `test1`: >> >> - the range checks for the `array[otherArray.length]` loads constant >> fold: `otherArray.length` is a `CastII` of i at the `otherArray` >> allocation. `i` is less than 9. The `CastII` at the allocation >> narrows the type down further to `[0-9]`. >> >> - the `array[otherArray.length]` loads are control dependent on the >> unrelated: >> >> >> if (flag == 0) { >> >> >> test. There's an identical dominating test which replaces that one. As >> a consequence, the `array[otherArray.length]` loads become control >> dependent on the dominating test. >> >> - The `CastII` nodes at the `otherArray` allocations are replaced by a >> dominating range check `CastII` nodes for: >> >> >> newArray[i] = 42; >> >> >> - After loop opts, the range check `CastII` nodes are removed and the >> 2 `array[otherArray.length]` loads common at the first: >> >> >> if (flag == 0) { >> >> >> test before the: >> >> >> float[] otherArray = new float[i]; >> >> >> and >> >> >> newArray[i] = 42; >> >> >> that guarantee `i` is positive. >> >> - `test1` is called with `i = -1`, the array load proceeds with an out >> of bounds index and the crash occurs. >> >> >> `test2` and `test3` are mostly identical except for the check that's >> eliminated (a null divisor check) and the instruction that causes a >> fault (an integer division). >> >> The fix I propose is to not eliminate range check `CastII` nodes after >> loop opts. When range check`CastII` nodes were introduced, performance >> was observed to regress. Removing them after loop opts was found to >> preserve both correctness and performance. Today, the performance >> regression still exists when `CastII` nodes are left in. So I propose >> we keep them until the end of optimizations (so the 2 array loads >> above don't lose a dependency and wrongly common) but remove them at >> the end of all optimizations. >> >> In the case of the array loads, they are dependent on a range check >> for another array through a range check `CastII` and we must not lose >> that dependency otherwise the array loads could float above the range >> check at gcm time. I propose we deal with that problem the way it's >> handled for `CastPP` nodes: add the dependency to the load (or >> division)nodes ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8324517 > - test and fix test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 37: > 35: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation > 36: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined > 37: * -XX:+StressIGVN -XX:StressSeed=94546681 TestArrayAccessAboveRCAfterRCCastIIEliminated `Error: VM option 'StressIGVN' is diagnostic and must be enabled via -XX:+UnlockDiagnosticVMOptions.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18377#discussion_r1574690677 From sgibbons at openjdk.org Mon Apr 22 14:16:06 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 14:16:06 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v16] In-Reply-To: References: Message-ID: <6UnFG26aCrqCe5egk5hKsogxeOBNNdUuHfGveP82n_4=.7be96c8b-a2b9-4bdd-90f1-65ff60b5ae7f@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 48 commits: - Merge branch 'openjdk:master' into indexof - Remove infinite loop (used for debugging) - Merge branch 'openjdk:master' into indexof - Cleaned up, ready for review - Pre-cleanup code - Add JMH. Add 16-byte compares to arrays_equals - Better method for mask creation - Merge branch 'openjdk:master' into indexof - Most cleanup done. - Remove header dependency - ... and 38 more: https://git.openjdk.org/jdk/compare/3e65d90b...8e0ce70a ------------- Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=15 Stats: 4903 lines in 19 files changed: 4549 ins; 241 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From avoitylov at openjdk.org Mon Apr 22 14:25:53 2024 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 22 Apr 2024 14:25:53 GMT Subject: RFR: 8330805: ARM32 build is broken after JDK-8139457 Message-ID: The JDK-8139457 patch changes the header_size argument of C1_MacroAssembler::allocate_array, the input value now means offset in bytes. The ARM32 allocate_array implementation is fixed accordingly. Testing: jtreg hotspot, jtreg jdk tier1-3 ------------- Commit messages: - JDK-8330805 Changes: https://git.openjdk.org/jdk/pull/18890/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18890&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330805 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18890/head:pull/18890 PR: https://git.openjdk.org/jdk/pull/18890 From avoitylov at openjdk.org Mon Apr 22 14:26:58 2024 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Mon, 22 Apr 2024 14:26:58 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 Message-ID: TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). Testing: jtreg hotspot, jtreg jdk tier1-3. ------------- Commit messages: - JDK-8330806 Changes: https://git.openjdk.org/jdk/pull/18891/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18891&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330806 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18891.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18891/head:pull/18891 PR: https://git.openjdk.org/jdk/pull/18891 From jkarthikeyan at openjdk.org Mon Apr 22 14:34:31 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 22 Apr 2024 14:34:31 GMT Subject: RFR: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 01:38:28 GMT, Jasmine Karthikeyan wrote: > This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. > I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. > > Thoughts and reviews would be appreciated! Thanks again for the reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18734#issuecomment-2069673354 From mli at openjdk.org Mon Apr 22 14:45:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Apr 2024 14:45:45 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch for instrinsic VectorLoadShuffle? > > BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. > > Thanks > > ## Test > test/jdk/jdk/incubator/vector/ > test/hotspot/jtreg/compiler/vectorapi Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - refine code - Merge branch 'master' into vector-load-shuffle - add comment - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18835/files - new: https://git.openjdk.org/jdk/pull/18835/files/b1400b33..a8226686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18835&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18835&range=01-02 Stats: 50852 lines in 595 files changed: 23073 ins; 24231 del; 3548 mod Patch: https://git.openjdk.org/jdk/pull/18835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18835/head:pull/18835 PR: https://git.openjdk.org/jdk/pull/18835 From mli at openjdk.org Mon Apr 22 14:45:45 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Apr 2024 14:45:45 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: References: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> Message-ID: On Mon, 22 Apr 2024 07:05:17 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment > > src/hotspot/cpu/riscv/riscv_v.ad line 3581: > >> 3579: predicate(Matcher::vector_element_basic_type(n) == T_BYTE); >> 3580: match(Set dst (VectorLoadShuffle dst)); >> 3581: effect(TEMP_DEF dst); > > Seems no need to add a `TEMP_DEF` for `dst` here. fixed, thanks for catching. > src/hotspot/cpu/riscv/riscv_v.ad line 3586: > >> 3584: // For T_BYTE, no need to do anything >> 3585: %} >> 3586: ins_pipe(pipe_slow); > > I think `pipe_class_empty` is better since this emits nothing at all. fixed, thanks for catching. > src/hotspot/cpu/riscv/riscv_v.ad line 3602: > >> 3600: __ vsetvli_helper(bt, Matcher::vector_length(this)); >> 3601: if (bt == T_SHORT) { >> 3602: __ vsext_vf2(as_VectorRegister($dst$$reg), as_VectorRegister($src$$reg)); > > I prefer `vzext_vf2/4/8` which does zero extension for the source indexes of the vector shuffle. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574880243 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574880227 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1574880469 From mli at openjdk.org Mon Apr 22 14:51:31 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Apr 2024 14:51:31 GMT Subject: RFR: 8321010: RISC-V: C2 RoundVF [v2] In-Reply-To: References: Message-ID: On Wed, 21 Feb 2024 16:30:24 GMT, Hamlin Li wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix space > >> Hello Hamlin, I recall you had licheepi board. I would be nice if you can try to measure rvv performance gain with this https://github.com/syntacore/syntaj21/tree/rvv0.7.1 >> >> This PR showed it's not always easy to win perf just by using rvv - #17413 >> >> I understand it might not be possible, but would be nice to give it a try (I can share hsdis with support for 0.7.1 if needed) > > Had a internal discussion about your suggestion, seems 0.7.1 is not incompatible with 1.0/2.0, and for this simple intrinsic, we think a better path is to have it first, then re-visit it when we have real hardware to measure the performance later. > @Hamlin-Li: Thanks for the quick update. Considering that saving/restoring for FRM could be expensive, I do wonder if we could gather some performance numbers before we go. I see people are now testing on RVV-1.0 hardwares [1] and I am also trying to get one (AFAIK, more powerful RVV-1.0 hardwares are also coming later this year, SG2044, SG2380, etc.). Also from discussion on [2], I see there are also other approaches available there without flipping the FP rounding mode. But I am not sure if they make sense for our case or work better without actual testing. > > [1] [#18382 (comment)](https://github.com/openjdk/jdk/pull/18382#issuecomment-2045145255) [2] #8204 Thanks for the information, let me do some investigation on the solution in https://github.com/openjdk/jdk/pull/8204. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17745#issuecomment-2069741101 From mli at openjdk.org Mon Apr 22 15:07:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Apr 2024 15:07:30 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v11] In-Reply-To: References: Message-ID: On Tue, 9 Apr 2024 08:43:30 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > make methods static Seems we have no different opinions on test of Double? Maybe it's good to push Double test first if you don't object? @eme64 @theRealAph ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2069802619 From shade at openjdk.org Mon Apr 22 15:08:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Apr 2024 15:08:28 GMT Subject: RFR: 8330805: ARM32 build is broken after JDK-8139457 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:19:47 GMT, Aleksei Voitylov wrote: > The JDK-8139457 patch changes the header_size argument of C1_MacroAssembler::allocate_array, the input value now means offset in bytes. The ARM32 allocate_array implementation is fixed accordingly. > > Testing: jtreg hotspot, jtreg jdk tier1-3 Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18890#pullrequestreview-2015042346 From bkilambi at openjdk.org Mon Apr 22 15:16:32 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 22 Apr 2024 15:16:32 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v7] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 15:47:30 GMT, Bhavana Kilambi wrote: >> Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. >> >> To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. >> >> With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. >> >> [AArch64] >> On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. >> >> This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. >> >> No effects on other platforms. >> >> [Performance] >> FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). >> >> ADDLanes >> >> Benchmark Before After Unit >> FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms >> >> >> Final code is as below: >> >> Before: >> ` fadda z17.s, p7/m, z17.s, z16.s >> ` >> After: >> >> faddp v17.4s, v21.4s, v21.4s >> faddp s18, v17.2s >> fadd s18, s18, s19 >> >> >> >> >> [Test] >> Full jtreg passed on AArch64 and x86. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 >> [2] https://bugs.openjdk.org/browse/JDK-8275275 >> [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Adjust format for the backend rules changed in previous commit I see that this test -` "jdk/java/util/HashMap/WhiteBoxResizeTest.java"` seems to have failed on an x86 machine. I have tested it on my local x86 machine and the test passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18034#issuecomment-2069834291 From aph at openjdk.org Mon Apr 22 15:53:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Apr 2024 15:53:31 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> Message-ID: <9l5M9DRG3AT-3aGSAr7VE6jm3WCpmY9DkQCoDg5026c=.cdca309d-6bf0-4879-9b83-3379ec7e5799@github.com> On Thu, 11 Apr 2024 10:52:55 GMT, Emanuel Peter wrote: >>> The only issue I have for an exhaustive test for the 32-bit range is that it takes too long time to be an automatic test, and we don't want to add an manual test as it seldomly runs. >> >> But it takes a long time because you use the @Run attribute, surely. If you ran that test just as a test, without the IR framework, it'd be fine. > > @theRealAph out of office, so don't have much time to think this through. But maybe we want both, a slower IR test which ensures we have the desired IR (with random input values), and also a non-IR test that is faster and checks the correct results more exhaustively? Totally. Then, at least for float, you've got it all. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1574992865 From sgibbons at openjdk.org Mon Apr 22 16:27:48 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 16:27:48 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 Message-ID: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. ------------- Commit messages: - Add conditional jump aliases; move arrays_equals; add instructions Changes: https://git.openjdk.org/jdk/pull/18893/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330844 Stats: 654 lines in 6 files changed: 439 ins; 215 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18893/head:pull/18893 PR: https://git.openjdk.org/jdk/pull/18893 From sgibbons at openjdk.org Mon Apr 22 16:27:48 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 16:27:48 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 16:20:39 GMT, Scott Gibbons wrote: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. This is a precursor for [JDK-8320448](https://bugs.openjdk.org/browse/JDK-8320448), essentially adding infrastructure requirements for that algorithm. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2070116475 From kvn at openjdk.org Mon Apr 22 16:59:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 16:59:35 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header Message-ID: Currently PcDescCache (32 bytes: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. Tested tier1-4,stress,xcomp and performance. ------------- Commit messages: - 8330181: Move PcDesc cache from nmethod header Changes: https://git.openjdk.org/jdk/pull/18895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18895&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330181 Stats: 114 lines in 2 files changed: 41 ins; 32 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/18895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18895/head:pull/18895 PR: https://git.openjdk.org/jdk/pull/18895 From duke at openjdk.org Mon Apr 22 17:04:29 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 22 Apr 2024 17:04:29 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <5GdKtMxnmsxXDm8hUO5xR7OtIZG5TTmz2wqpv4xcEhA=.62b46ad9-2387-418c-88ec-f124e7b099b8@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> <5GdKtMxnmsxXDm8hUO5xR7OtIZG5TTmz2wqpv4xcEhA=.62b46ad9-2387-418c-88ec-f124e7b099b8@github.com> Message-ID: On Mon, 22 Apr 2024 10:46:48 GMT, Tobias Hartmann wrote: > Is the problem that the condition is not canonicalized or that the CMoveNode is not process by IGVN after canonicalization of the cmp? The `CMoveNode` is processed, but its input `Bool` and `Cmp` are never processed. Maybe we need to transform the `CMove`'s inputs in https://github.com/openjdk/jdk/blob/0b9350e8b619bc556f36652cde6f73211be5b85b/src/hotspot/share/opto/loopopts.cpp#L842. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18824#issuecomment-2070250380 From kvn at openjdk.org Mon Apr 22 17:17:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 17:17:30 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. src/hotspot/share/code/nmethod.cpp line 351: > 349: } > 350: > 351: void PcDescCache::init_to(PcDesc* initial_pc_desc) { I renamed method to `init_to()` because it is used only once for initialization. src/hotspot/share/code/nmethod.cpp line 364: > 362: PcDesc* PcDescCache::find_pc_desc(int pc_offset, bool approximate) { > 363: NOT_PRODUCT(++pc_nmethod_stats.pc_desc_queries); > 364: NOT_PRODUCT(if (approximate) ++pc_nmethod_stats.pc_desc_approx); Moved to ` PcDescContainer::find_pc_desc()` to have correct statistics because that method does initial check for last cached data. src/hotspot/share/code/nmethod.cpp line 399: > 397: > 398: void PcDescCache::add_pc_desc(PcDesc* pc_desc) { > 399: MACOS_AARCH64_ONLY(ThreadWXEnable wx(WXWrite, Thread::current());) No need any more because cache in C heap now instead of `CodeCache`. src/hotspot/share/code/nmethod.cpp line 1453: > 1451: > 1452: // Create cache after PcDesc data is copied - it will be used to initialize cache > 1453: _pc_desc_container = new PcDescContainer(scopes_pcs_begin()); It was a bug from early HotSpot days. The assert in `PcDescCache::init_to()` checks that initializing data is sentinel but before PcDesc data is copied values in allocated space could be anything. I don't understand how we don't hit this assert before in mainline. We always hit it on Aarch64 in `leyden/premain` repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575095093 PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575099082 PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575100409 PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575106074 From jbhateja at openjdk.org Mon Apr 22 17:22:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Apr 2024 17:22:44 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v3] In-Reply-To: References: Message-ID: > - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. > - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. > - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. > - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding some comments for clarity. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18749/files - new: https://git.openjdk.org/jdk/pull/18749/files/0c67e68a..23af7af6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18749&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18749&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18749/head:pull/18749 PR: https://git.openjdk.org/jdk/pull/18749 From vladimir.kozlov at oracle.com Mon Apr 22 17:35:15 2024 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 22 Apr 2024 10:35:15 -0700 Subject: Enable compiler memory limits by default? In-Reply-To: References: Message-ID: Hi Thomas, I like option 1). I think 1Gb is reasonable starting value - I know that C2 may ease eat 512Kb of memory. But before we set exact value we need to test it in all our tiers. I don't want to create a lot of failures which we will not have time to fix fast. It should be rare case as you stated. We also need to decide how we fix/avoid such failure: 1. Recompile with some optimizations off (I assume we can tell which optimization triggers big memory consumption and safely bailout from compilation) 2. Recompile with some inlining off 3. Mark method not compilable by corresponding compiler ... Please file RFE and PR. We will help with testing. Thanks, Vladimir K On 4/12/24 12:30 AM, Thomas St?fe wrote: > Hi, > > Issues like https://bugs.openjdk.org/browse/JDK-8330103 > show that compiler memory > consumption can be an issue. > > Since https://bugs.openjdk.org/browse/JDK-8318016 > , we have an optional > per-compilation memory limit. If we reach that limit, one of two things > (configurable) happens: we either assert or abort the compilation. > > These memory limits build on the compiler memory statistic added with > https://bugs.openjdk.org/browse/JDK-8317683 > . Enabling > memory?limits?also enables memory statistics. > > Some ideas: > > 1) We could enable a reasonable memory limit per default for debug > builds. Preferably combined with the assert option. That way, we run all > tests on a debug VM with memory limits enabled. If there are > pathological compilations during testing, we will notice them. > > (I don't know if we would notice them today; even if testers let JVMs > run with outside ulimits, these limits are typically very high to allow > for the total expected memory consumption of the test JVM). > > Such a memory limit could be set at whatever we feel is pathological, > e.g., several hundred MB. Even set at 1GB, we would hopefully see cases > like 8318016 in our tests. > > 2) If we don't want (1), we could at least enable memory statistics by > default for debug builds and print it out to hs-err files. > > 3) We could also enable memory limits in release builds and bail out of > the compilations. A small cost is involved, probably negligible: on > Arena enlargement, we increase several thread local counters. > Unfortunately, there is a small risk, too, in that bailout paths in C2 > may be broken, leading to follow-up errors. We fixed them all, I think, > but there is a remaining risk. OTOH, using up excessive amounts of > memory is also not optimal. > > What do you think? Would this make sense? If (1) makes sense to you, > what limit would be reasonable? > > Cheers, Thomas > > > > > From coleenp at openjdk.org Mon Apr 22 17:40:30 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 22 Apr 2024 17:40:30 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments There are two threads though: t1: 1. write resolved fields 2. release_store get_code/put_code 3. patch bytecode to fastpath (should this be a release_store?) but t2: 1. reads patched_bytecode 2. goes fastpath 3. loads pointer to cpCache entry for Resolved fields assuming they've been written in order 4. gets a resolved field So the patch puts the LoadLoad membar between 2 and 3 because t2 is the thread that's loading the information. Would a LoadLoad barrier executed by t1 help? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2070375362 From kvn at openjdk.org Mon Apr 22 17:55:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 17:55:29 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: <-8n85gXA1M50-rWMOn9dl7aD3pvV7p8CqPkyzx1hVIg=.da123f14-3701-4f8e-9e16-8d5ebf561e86@github.com> References: <_0V4aLv23eyNBgwgzFThGCfXPQw6jTZa2me6ZnF6I_g=.83cd138c-fc3c-4b01-9ccd-10ff7f4bf5d7@github.com> <-8n85gXA1M50-rWMOn9dl7aD3pvV7p8CqPkyzx1hVIg=.da123f14-3701-4f8e-9e16-8d5ebf561e86@github.com> Message-ID: On Tue, 2 Apr 2024 12:37:17 GMT, Swati Sharma wrote: >> test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyDisjointLarge.java line 29: >> >>> 27: /** >>> 28: * @test >>> 29: * @bug 8310159 >> >> Suggestion: >> >> * @bug 8326421 >> >> Was there a reason for the other bug number? I think usually we use the bug number of the issue where the test is added. I might be wrong. > > @eme64 This test is to cover the functionality correctness of the issue JDK-8310159, If you suggest this should cover a general scenario I can add the bug number of the issue itself i.e 8326421. Please let me know. I agree with @swati-sha - it should be 8310159. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17962#discussion_r1575158354 From sviswanathan at openjdk.org Mon Apr 22 17:59:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 Apr 2024 17:59:28 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 17:22:44 GMT, Jatin Bhateja wrote: >> - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. >> - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. >> - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. >> - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding some comments for clarity. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18749#pullrequestreview-2015470623 From kvn at openjdk.org Mon Apr 22 18:00:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 18:00:30 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Mon, 15 Apr 2024 07:04:11 GMT, Roberto Casta?eda Lozano wrote: >> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). >> >> The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: >> >> >> import java.lang.invoke.VarHandle; >> import java.lang.invoke.MethodHandles; >> >> public class Example { >> static class Outer { >> Object f; >> } >> >> static final VarHandle fVarHandle; >> static { >> MethodHandles.Lookup l = MethodHandles.lookup(); >> try { >> fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); >> } catch (Exception e) { >> throw new Error(e); >> } >> } >> >> static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { >> return fVarHandle.compareAndSet(o, oldVal, newVal); >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10_000; i++) { >> Outer o = new Outer(); >> Object oldVal = new Object(); >> o.f = oldVal; >> Object newVal = new Object(); >> testCompareAndSwap(o, oldVal, newVal); >> } >> } >> } >> >> >> Before this changeset, issuing this command: >> >> >> $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP >> >> >> gives the following dump: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) >> >> >> After this changeset, we get: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleRefer... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add example I don't see GHA testing for this repo. Did you enabled it? test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/GCBarrierIRExample.java line 31: > 29: > 30: /** > 31: * @test Missing `@bug` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18754#issuecomment-2070442852 PR Review Comment: https://git.openjdk.org/jdk/pull/18754#discussion_r1575164212 From sviswanathan at openjdk.org Mon Apr 22 18:02:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 Apr 2024 18:02:28 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 17:22:44 GMT, Jatin Bhateja wrote: >> - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. >> - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. >> - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. >> - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding some comments for clarity. It will be good to get another review as well. /Reviewers 2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18749#issuecomment-2070455124 From kvn at openjdk.org Mon Apr 22 18:03:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 18:03:28 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 13:50:02 GMT, Thomas Stuefe wrote: > Somewhat trivial change to reduce the chance of tearing the final compilation cost history report. See JBS for details. > > --- > > The patch: > - upon end of a compilation, we print the the offending log line and account the cost in the compilation cost history table. For the latter we lock over NMTCompilationCostHistory_lock. The patch swaps these two actions such that we print after pulling the lock. That greatly reduces, albeit not completely removes, the chance of printing log lines into the final report. (I did not want to widen the scope of that lock to include the printout) > - also moves the locking of NMTCompilationCostHistory_lock up to the start of the reporting function to include printing the report header into the locking Good. What kind of testing you do for changes in this code? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18866#pullrequestreview-2015477337 From kvn at openjdk.org Mon Apr 22 18:11:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 18:11:30 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory In-Reply-To: References: Message-ID: <_c23YQF9M64MmzaQW4lw3fGc850YkbpVbu1tiXFrA1k=.524aa4b4-3d3d-4891-9a7d-654689ad4f75@github.com> On Mon, 22 Apr 2024 13:48:41 GMT, Scott Gibbons wrote: > Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1524: > 1522: // UnsafeMemoryAccess page error: continue after ucm > 1523: bool add_entry = !is_oop && (!aligned || sizeof(jlong) == size); > 1524: UnsafeMemoryAccessMark ucmm(this, add_entry, true); May be rename `ucmm` and other related locals too to avoid confusion. Word `ucm` in comments too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575178012 From sgibbons at openjdk.org Mon Apr 22 18:23:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 18:23:40 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v2] In-Reply-To: References: Message-ID: > Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Address review comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18889/files - new: https://git.openjdk.org/jdk/pull/18889/files/281a1da9..aaa3d416 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18889&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18889&range=00-01 Stats: 67 lines in 6 files changed: 0 ins; 0 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/18889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18889/head:pull/18889 PR: https://git.openjdk.org/jdk/pull/18889 From sgibbons at openjdk.org Mon Apr 22 18:23:40 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 18:23:40 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v2] In-Reply-To: <_c23YQF9M64MmzaQW4lw3fGc850YkbpVbu1tiXFrA1k=.524aa4b4-3d3d-4891-9a7d-654689ad4f75@github.com> References: <_c23YQF9M64MmzaQW4lw3fGc850YkbpVbu1tiXFrA1k=.524aa4b4-3d3d-4891-9a7d-654689ad4f75@github.com> Message-ID: On Mon, 22 Apr 2024 18:09:14 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comment > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1524: > >> 1522: // UnsafeMemoryAccess page error: continue after ucm >> 1523: bool add_entry = !is_oop && (!aligned || sizeof(jlong) == size); >> 1524: UnsafeMemoryAccessMark ucmm(this, add_entry, true); > > May be rename `ucmm` and other related locals too to avoid confusion. Word `ucm` in comments too. Done. Comment says `unsafe access` instead of `ucm` and `umam` instead of `ucmm`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575191505 From kvn at openjdk.org Mon Apr 22 18:37:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 18:37:31 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 16:20:39 GMT, Scott Gibbons wrote: > making arrays_equals accessible from stubs I am not sure I understand why you need to move it. Your changes for JDK-8320448 shows that new code is used only by C2. You can move your new code in stubGenerator_x86_64.cpp into the part under `#ifdef COMPILER2`. And code in `stubGenerator_x86_64_string.cpp` could be put under this `#ifdef` too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2070578582 From sviswanathan at openjdk.org Mon Apr 22 18:37:34 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 Apr 2024 18:37:34 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:23:40 GMT, Scott Gibbons wrote: >> Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 1627: > 1625: { > 1626: // Add set memory mark to protect against unsafe accesses faulting > 1627: UnsafeMemoryAccessMark usmm(this, ((t == T_BYTE) && !aligned), true); usmm -> umam to be consistent. src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2631: > 2629: { > 2630: Label L_wordsTail, L_wordsLoop, L_wordsTailLoop; > 2631: UnsafeMemoryAccessMark usmm(this, true, true); usmm -> umam src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2645: > 2643: { > 2644: Label L_qwordLoop, L_qwordsTail, L_qwordsTailLoop; > 2645: UnsafeMemoryAccessMark usmm(this, true, true); usmm -> umam src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2662: > 2660: { > 2661: Label L_dwordLoop, L_dwordsTail, L_dwordsTailLoop; > 2662: UnsafeMemoryAccessMark usmm(this, true, true); usmm -> umam ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575205497 PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575205741 PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575205908 PR Review Comment: https://git.openjdk.org/jdk/pull/18889#discussion_r1575205988 From sgibbons at openjdk.org Mon Apr 22 18:41:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 18:41:53 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v3] In-Reply-To: References: Message-ID: > Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Missed a couple ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18889/files - new: https://git.openjdk.org/jdk/pull/18889/files/aaa3d416..e8b86eee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18889&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18889&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18889.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18889/head:pull/18889 PR: https://git.openjdk.org/jdk/pull/18889 From sgibbons at openjdk.org Mon Apr 22 18:41:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 18:41:53 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:23:40 GMT, Scott Gibbons wrote: >> Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Address review comment `usmm` => `umam` for consistency. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18889#issuecomment-2070593309 From kvn at openjdk.org Mon Apr 22 19:18:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 19:18:29 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:41:53 GMT, Scott Gibbons wrote: >> Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Missed a couple Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18889#pullrequestreview-2015617576 From sgibbons at openjdk.org Mon Apr 22 19:18:29 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 19:18:29 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:41:53 GMT, Scott Gibbons wrote: >> Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Missed a couple Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18889#issuecomment-2070719119 From sviswanathan at openjdk.org Mon Apr 22 19:57:31 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 Apr 2024 19:57:31 GMT Subject: RFR: 8330821: Rename UnsafeCopyMemory [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:41:53 GMT, Scott Gibbons wrote: >> Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Missed a couple Looks good. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18889#pullrequestreview-2015690658 From sgibbons at openjdk.org Mon Apr 22 20:47:52 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 20:47:52 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Undo move of arrays_equals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18893/files - new: https://git.openjdk.org/jdk/pull/18893/files/74d47302..0b95b3af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=00-01 Stats: 564 lines in 4 files changed: 282 ins; 282 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18893/head:pull/18893 PR: https://git.openjdk.org/jdk/pull/18893 From sgibbons at openjdk.org Mon Apr 22 20:51:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 20:51:28 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 20:47:52 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Undo move of arrays_equals Adding the `#ifdef COMPILER2` in `stubGenerator_x86_64_string.cpp` allows for good compilation for JDK-8320448, so I can undo the move. Thanks for spotting that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2070926706 From sgibbons at openjdk.org Mon Apr 22 21:00:45 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 21:00:45 GMT Subject: RFR: 8320448: Accelerate IndexOf using AVX2 [v17] In-Reply-To: References: Message-ID: <05mD2dSduIgyzdnDqUHlh6CEqjWDkJ3wa_XK58tJy4Y=.a1d85af3-17e1-471f-a665-66d0693fda25@github.com> > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Move arrays_equals back to c2_MacroAssembler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/8e0ce70a..1d141fde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=15-16 Stats: 576 lines in 5 files changed: 288 ins; 282 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From kvn at openjdk.org Mon Apr 22 21:37:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 21:37:29 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 20:47:52 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Undo move of arrays_equals Can you also remove changes in `arrays_equals` from this PR? It is fine to have them in JDK-8320448 changes. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4503: > 4501: > 4502: assert((!expand_ary2) || ((expand_ary2) && (UseAVX == 2)), > 4503: "Expansion only implemented for AVX2"); BTW, the check in assert could be simplified: `(!expand_ary2 || UseAVX == 2)` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2070990618 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1575383066 From sgibbons at openjdk.org Mon Apr 22 21:45:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 21:45:28 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: <1aaBuJuA_GcJXid7SX-6ZJzw-KJQS-_yB5xMKcHawYQ=.9c4fa086-94d9-48e0-90b6-6b3401bc4f2b@github.com> On Mon, 22 Apr 2024 20:47:52 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Undo move of arrays_equals A large part of this PR was to lessen the burden of reviewing JDK-8320448 changes. Am I hearing you say that this approach is not desired? The other PR is a big review and I was hoping to piecemeal some non-core algorithm changes in to make the review easier. It is, of course, trivial to revert the change to arrays_equals. Please let me know your final decision. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2071001776 From sgibbons at openjdk.org Mon Apr 22 21:45:29 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 21:45:29 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: <8JGuV7337PZtod10hBOBzADjhnGvHDEka7_L2a_KUio=.a7ddf05b-6a4b-4178-9d3f-33d060872f28@github.com> On Mon, 22 Apr 2024 21:35:05 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Undo move of arrays_equals > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4503: > >> 4501: >> 4502: assert((!expand_ary2) || ((expand_ary2) && (UseAVX == 2)), >> 4503: "Expansion only implemented for AVX2"); > > BTW, the check in assert could be simplified: `(!expand_ary2 || UseAVX == 2)` I thought this would make the intent explicitly clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1575388290 From kvn at openjdk.org Mon Apr 22 21:55:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 21:55:28 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: <1aaBuJuA_GcJXid7SX-6ZJzw-KJQS-_yB5xMKcHawYQ=.9c4fa086-94d9-48e0-90b6-6b3401bc4f2b@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> <1aaBuJuA_GcJXid7SX-6ZJzw-KJQS-_yB5xMKcHawYQ=.9c4fa086-94d9-48e0-90b6-6b3401bc4f2b@github.com> Message-ID: On Mon, 22 Apr 2024 21:42:27 GMT, Scott Gibbons wrote: > A large part of this PR was to lessen the burden of reviewing JDK-8320448 changes. Am I hearing you say that this approach is not desired? The other PR is a big review and I was hoping to piecemeal some non-core algorithm changes in to make the review easier. > > It is, of course, trivial to revert the change to arrays_equals. Please let me know your final decision. Thanks. I am for splitting big PRs if possible. And you are not limited how many self-containing sub-PRs you can create. But each PR should address one issue for easy review and testing. I consider this PR should address what in its title: aliases for jump instructions and adding missing cmp/jump instructions (which is related). Any changes to not related code, like arrays_equals, do not belong here. It could be separate sub-PR or even followup PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2071014632 From sgibbons at openjdk.org Mon Apr 22 22:10:56 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 22:10:56 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Revert changes to arrays_equals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18893/files - new: https://git.openjdk.org/jdk/pull/18893/files/0b95b3af..f7d7f7de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=01-02 Stats: 90 lines in 2 files changed: 0 ins; 67 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/18893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18893/head:pull/18893 PR: https://git.openjdk.org/jdk/pull/18893 From sgibbons at openjdk.org Mon Apr 22 22:10:56 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 22:10:56 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v2] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 20:47:52 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Undo move of arrays_equals OK. arrays_equals changes reverted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2071033103 From duke at openjdk.org Mon Apr 22 22:20:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 22 Apr 2024 22:20:28 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: On Fri, 19 Apr 2024 22:31:10 GMT, Joshua Cao wrote: > Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. > > Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR removes the emitting of `StoreStore` barriers in `Parse::do_exits()`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. > Doesn't this only work if the allocation and call to ctor are compiled together? Where is the StoreStore added if we compile a method by itself? The allocation could be in the interpreter but the when calling we call compiled code. Yes, I think you're right. We should probably still emit the barrier when it is the outermost method. Unfortunately, I don't think we can directly IR test on a constructor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18870#issuecomment-2071044716 From kvn at openjdk.org Mon Apr 22 22:33:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 22 Apr 2024 22:33:28 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 22:10:56 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to arrays_equals Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18893#pullrequestreview-2015899888 From sgibbons at openjdk.org Mon Apr 22 22:33:28 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 22:33:28 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 22:10:56 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; making arrays_equals accessible from stubs; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to arrays_equals Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2071059232 From sgibbons at openjdk.org Mon Apr 22 22:57:32 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 22 Apr 2024 22:57:32 GMT Subject: Integrated: 8330821: Rename UnsafeCopyMemory In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 13:48:41 GMT, Scott Gibbons wrote: > Renaming UnsafeCopyMemory to UnsafeMemoryAccess since this class is now being used for Unsafe::setMemory. This is a pure rename only. This pull request has now been integrated. Changeset: 58ad399d Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/58ad399d196bf2dd701df451004b7815b0820675 Stats: 159 lines in 18 files changed: 0 ins; 0 del; 159 mod 8330821: Rename UnsafeCopyMemory Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18889 From dlong at openjdk.org Tue Apr 23 01:19:29 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Apr 2024 01:19:29 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. src/hotspot/share/code/nmethod.cpp line 2738: > 2736: // which is typically called in a signal handler > 2737: _pc_desc_cache.add_pc_desc(upper); > 2738: } The special case for ASGCT, along with ThreadWXEnable changes here: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L484 https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L1755 https://github.com/openjdk/jdk/blob/master/src/hotspot/os/posix/signals_posix.cpp#L617 look like they will no longer be needed. I suggest filing a follow-up RFE for an ASGCT expert to take. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575522079 From fyang at openjdk.org Tue Apr 23 01:51:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 23 Apr 2024 01:51:37 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:45:45 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch for instrinsic VectorLoadShuffle? >> >> BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. >> >> Thanks >> >> ## Test >> test/jdk/jdk/incubator/vector/ >> test/hotspot/jtreg/compiler/vectorapi > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - refine code > - Merge branch 'master' into vector-load-shuffle > - add comment > - Initial commit Thanks for the quick update! One more question remain. Otherwise looks good. src/hotspot/cpu/riscv/riscv_v.ad line 81: > 79: case Op_VectorLoadShuffle: > 80: case Op_VectorRearrange: > 81: // vlen >= 4 is required, because min vector size for byte is 4 on riscv, I was not aware of such a constraint before. Is this a constraint at the ISA level or is a performance consideration? I didn't find where it is mentioned in the RVV spec. ------------- PR Review: https://git.openjdk.org/jdk/pull/18835#pullrequestreview-2016090731 PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1575541543 From dlong at openjdk.org Tue Apr 23 02:57:27 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Apr 2024 02:57:27 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 17:37:48 GMT, Coleen Phillimore wrote: > There are two threads though: > t1: 1. write resolved fields > 2. release_store get_code/put_code > 3. patch bytecode to fastpath (should this be a release_store?) Yes, I think you need a release_store for 3, because 2 happened in the same thread. If 2 and 3 happened in different threads, then maybe it could be avoid, but that seems risky. > but t2: 1. reads patched_bytecode > 2. goes fastpath > 3. loads pointer to cpCache entry for Resolved fields assuming they've been written in order > 4. gets a resolved field > > So the patch puts the LoadLoad membar between 2 and 3 because t2 is the thread that's loading the information. Would a LoadLoad barrier executed by t1 help? I don't see how t1 doing a LoadLoad helps. The LoadLoad needs to go anywhere between 1 and 4. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2071313420 From dlong at openjdk.org Tue Apr 23 02:57:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Apr 2024 02:57:28 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> On Sat, 20 Apr 2024 12:37:01 GMT, Andrew Haley wrote: > So, I guess the loadload fence being inserted here is the one we need between [2] and [3]. The way I would say it is we need a LoadLoad betwen [3] and [2] or between [3] and [1]. The code assumes that if it is a fast bytecode, then it can read [1] without checking [2] again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2071315360 From kvn at openjdk.org Tue Apr 23 03:57:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 03:57:29 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: <0_kbfe4cREdOXu8XdkBfFxl3TYo0e1RXU1v5a56w6NY=.71933e02-0418-4cf1-9ccb-7c03cec8e714@github.com> On Tue, 23 Apr 2024 01:16:40 GMT, Dean Long wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > src/hotspot/share/code/nmethod.cpp line 2738: > >> 2736: // which is typically called in a signal handler >> 2737: _pc_desc_cache.add_pc_desc(upper); >> 2738: } > > The special case for ASGCT, along with ThreadWXEnable changes here: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L484 > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L1755 > https://github.com/openjdk/jdk/blob/master/src/hotspot/os/posix/signals_posix.cpp#L617 > look like they will no longer be needed. I suggest filing a follow-up RFE for an ASGCT expert to take. Thank you, @dean-long, for pointing this. [JDK-8302736](https://bugs.openjdk.org/browse/JDK-8302736) added 2 ThreadWXEnable in sharedRuntime.cpp And later [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) added it to `PcDescCache::add_pc_desc()` What a mess ... We can now safely remove (after testing) all ThreadWXEnable which guards calls to `add_pc_desc()` And file separate RFE for runtime to look on `!Thread::current_in_asgct()` Is this what you are suggesting? I will do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575607477 From rcastanedalo at openjdk.org Tue Apr 23 03:57:46 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 03:57:46 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v3] In-Reply-To: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: > This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). > > The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: > > > import java.lang.invoke.VarHandle; > import java.lang.invoke.MethodHandles; > > public class Example { > static class Outer { > Object f; > } > > static final VarHandle fVarHandle; > static { > MethodHandles.Lookup l = MethodHandles.lookup(); > try { > fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); > } catch (Exception e) { > throw new Error(e); > } > } > > static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { > return fVarHandle.compareAndSet(o, oldVal, newVal); > } > > public static void main(String[] args) { > for (int i = 0; i < 10_000; i++) { > Outer o = new Outer(); > Object oldVal = new Object(); > o.f = oldVal; > Object newVal = new Object(); > testCompareAndSwap(o, oldVal, newVal); > } > } > } > > > Before this changeset, issuing this command: > > > $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP > > > gives the following dump: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) > > > After this changeset, we get: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Add @bug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18754/files - new: https://git.openjdk.org/jdk/pull/18754/files/409a1ef4..a842cb68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18754&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18754&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18754/head:pull/18754 PR: https://git.openjdk.org/jdk/pull/18754 From kvn at openjdk.org Tue Apr 23 04:01:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 04:01:28 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: <0_kbfe4cREdOXu8XdkBfFxl3TYo0e1RXU1v5a56w6NY=.71933e02-0418-4cf1-9ccb-7c03cec8e714@github.com> References: <0_kbfe4cREdOXu8XdkBfFxl3TYo0e1RXU1v5a56w6NY=.71933e02-0418-4cf1-9ccb-7c03cec8e714@github.com> Message-ID: On Tue, 23 Apr 2024 03:54:24 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/nmethod.cpp line 2738: >> >>> 2736: // which is typically called in a signal handler >>> 2737: _pc_desc_cache.add_pc_desc(upper); >>> 2738: } >> >> The special case for ASGCT, along with ThreadWXEnable changes here: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L484 >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L1755 >> https://github.com/openjdk/jdk/blob/master/src/hotspot/os/posix/signals_posix.cpp#L617 >> look like they will no longer be needed. I suggest filing a follow-up RFE for an ASGCT expert to take. > > Thank you, @dean-long, for pointing this. > > [JDK-8302736](https://bugs.openjdk.org/browse/JDK-8302736) added 2 ThreadWXEnable in sharedRuntime.cpp > And later [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) added it to `PcDescCache::add_pc_desc()` > What a mess ... > > We can now safely remove (after testing) all ThreadWXEnable which guards calls to `add_pc_desc()` > And file separate RFE for runtime to look on `!Thread::current_in_asgct()` > > Is this what you are suggesting? I will do it. ASGCT code was added by [JDK-8304725](https://github.com/openjdk/jdk/commit/d8af7a6014055295355a1242db6c2872299c6398) before [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) added ThreadWXEnable to PcDescCache::add_pc_desc(). As you suggested I will file follow up RFE to remove that code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1575609815 From kvn at openjdk.org Tue Apr 23 04:12:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 04:12:28 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: <3kJOb6xJKLp8-_uGfzbRJBlytOzN-0WKvHQTl7Wzcx0=.40b3695d-9cb0-4469-88d0-523394d51dd7@github.com> On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. Looks like I have to test up to tier8 where [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) failed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2071370693 From rcastanedalo at openjdk.org Tue Apr 23 04:12:32 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 04:12:32 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Mon, 22 Apr 2024 17:57:59 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example > > test/hotspot/jtreg/testlibrary_tests/ir_framework/examples/GCBarrierIRExample.java line 31: > >> 29: >> 30: /** >> 31: * @test > > Missing `@bug` Fixed, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18754#discussion_r1575614509 From kvn at openjdk.org Tue Apr 23 04:21:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 04:21:33 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v3] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Tue, 23 Apr 2024 03:57:46 GMT, Roberto Casta?eda Lozano wrote: >> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). >> >> The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: >> >> >> import java.lang.invoke.VarHandle; >> import java.lang.invoke.MethodHandles; >> >> public class Example { >> static class Outer { >> Object f; >> } >> >> static final VarHandle fVarHandle; >> static { >> MethodHandles.Lookup l = MethodHandles.lookup(); >> try { >> fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); >> } catch (Exception e) { >> throw new Error(e); >> } >> } >> >> static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { >> return fVarHandle.compareAndSet(o, oldVal, newVal); >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10_000; i++) { >> Outer o = new Outer(); >> Object oldVal = new Object(); >> o.f = oldVal; >> Object newVal = new Object(); >> testCompareAndSwap(o, oldVal, newVal); >> } >> } >> } >> >> >> Before this changeset, issuing this command: >> >> >> $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP >> >> >> gives the following dump: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) >> >> >> After this changeset, we get: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleRefer... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add @bug Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18754#pullrequestreview-2016201953 From rcastanedalo at openjdk.org Tue Apr 23 04:21:34 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 04:21:34 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v2] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Mon, 22 Apr 2024 17:56:45 GMT, Vladimir Kozlov wrote: > I don't see GHA testing for this repo. Did you enabled it? Interesting. GHA testing is enabled for the repo (see https://github.com/robcasloz/jdk/actions), but it seems to run only for some branches, see e.g. my recent PR #18108 from the same repo. I will investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18754#issuecomment-2071375783 From rcastanedalo at openjdk.org Tue Apr 23 04:21:34 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 04:21:34 GMT Subject: RFR: 8330153: C2: dump barrier information for all Mach nodes [v3] In-Reply-To: References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Tue, 23 Apr 2024 03:57:46 GMT, Roberto Casta?eda Lozano wrote: >> This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). >> >> The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: >> >> >> import java.lang.invoke.VarHandle; >> import java.lang.invoke.MethodHandles; >> >> public class Example { >> static class Outer { >> Object f; >> } >> >> static final VarHandle fVarHandle; >> static { >> MethodHandles.Lookup l = MethodHandles.lookup(); >> try { >> fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); >> } catch (Exception e) { >> throw new Error(e); >> } >> } >> >> static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { >> return fVarHandle.compareAndSet(o, oldVal, newVal); >> } >> >> public static void main(String[] args) { >> for (int i = 0; i < 10_000; i++) { >> Outer o = new Outer(); >> Object oldVal = new Object(); >> o.f = oldVal; >> Object newVal = new Object(); >> testCompareAndSwap(o, oldVal, newVal); >> } >> } >> } >> >> >> Before this changeset, issuing this command: >> >> >> $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP >> >> >> gives the following dump: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) >> >> >> After this changeset, we get: >> >> >> R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleRefer... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Add @bug Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18754#issuecomment-2071376378 From rcastanedalo at openjdk.org Tue Apr 23 04:21:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 04:21:35 GMT Subject: Integrated: 8330153: C2: dump barrier information for all Mach nodes In-Reply-To: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> References: <6e-pwE8Qf8nK1tcLgjQ6I8KLnev4m09qwOM5dybR8YQ=.3443af92-a005-42e5-afff-fee80a7af32f@github.com> Message-ID: On Fri, 12 Apr 2024 10:30:17 GMT, Roberto Casta?eda Lozano wrote: > This debug-only changeset ensures that GC-specific barrier information is dumped (via [`BarrierSetC2::dump_barrier_data()`](https://github.com/openjdk/jdk/blob/aebfd53e9d19f5939c81fa1a2fc75716c3355900/src/hotspot/share/gc/shared/c2/barrierSetC2.hpp#L311-L313)) for all C2 Mach nodes, not just `MachType` ones. This makes it possible to e.g. write IR tests that verify barrier properties of `CompareAndSwap`/`WeakCompareAndSwap` Mach implementations, which do not inherit from `MachTypeNode`. An example of such a test can be found [here](https://github.com/robcasloz/jdk/blob/e9b3c2a4cb5dd80d85af8320e559acea5920b2ff/test/hotspot/jtreg/compiler/gcbarriers/TestG1BarrierGeneration.java#L283-L292). > > The [following program](https://bugs.openjdk.org/secure/attachment/108921/Example.java) illustrates the effect of the change: > > > import java.lang.invoke.VarHandle; > import java.lang.invoke.MethodHandles; > > public class Example { > static class Outer { > Object f; > } > > static final VarHandle fVarHandle; > static { > MethodHandles.Lookup l = MethodHandles.lookup(); > try { > fVarHandle = l.findVarHandle(Outer.class, "f", Object.class); > } catch (Exception e) { > throw new Error(e); > } > } > > static boolean testCompareAndSwap(Outer o, Object oldVal, Object newVal) { > return fVarHandle.compareAndSet(o, oldVal, newVal); > } > > public static void main(String[] args) { > for (int i = 0; i < 10_000; i++) { > Outer o = new Outer(); > Object oldVal = new Object(); > o.f = oldVal; > Object newVal = new Object(); > testCompareAndSwap(o, oldVal, newVal); > } > } > } > > > Before this changeset, issuing this command: > > > $ java -Xbatch -XX:+UseZGC -XX:+ZGenerational -XX:CompileOnly=Example::testCompareAndSwap -XX:CompileCommand=PrintIdealPhase,Example::testCompareAndSwap,FINAL_CODE Example.java | grep zCompareAndSwapP > > > gives the following dump: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68) Example::testCompareAndSwap @ bci:6 (line 20) > > > After this changeset, we get: > > > R10 37 zCompareAndSwapP === 28 35 38 54 22 41 [[ 40 42 36 27 56 25 ]] barrier(strong ) !jvms: VarHandleReferences$FieldInstanceReadWrite::compareAndSet @ bci:44 (line 180) VarHandleGuards::guard_LLL_Z @ bci:50 (line 68... This pull request has now been integrated. Changeset: 57ebd045 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/57ebd045eae8ef1bdb5ec96d5eb11d252e08e6bb Stats: 92 lines in 3 files changed: 87 ins; 5 del; 0 mod 8330153: C2: dump barrier information for all Mach nodes Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18754 From stuefe at openjdk.org Tue Apr 23 05:26:53 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 Apr 2024 05:26:53 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report [v2] In-Reply-To: References: Message-ID: > Somewhat trivial change to reduce the chance of tearing the final compilation cost history report. See JBS for details. > > --- > > The patch: > - upon end of a compilation, we print the the offending log line and account the cost in the compilation cost history table. For the latter we lock over NMTCompilationCostHistory_lock. The patch swaps these two actions such that we print after pulling the lock. That greatly reduces, albeit not completely removes, the chance of printing log lines into the final report. (I did not want to widen the scope of that lock to include the printout) > - also moves the locking of NMTCompilationCostHistory_lock up to the start of the reporting function to include printing the report header into the locking Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: print newlines around report ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18866/files - new: https://git.openjdk.org/jdk/pull/18866/files/92049dda..58bf813f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18866&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18866&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18866.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18866/head:pull/18866 PR: https://git.openjdk.org/jdk/pull/18866 From stuefe at openjdk.org Tue Apr 23 05:35:28 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 Apr 2024 05:35:28 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:01:15 GMT, Vladimir Kozlov wrote: > Good. > Thank you, Vladimir > What kind of testing you do for changes in this code? Beside the GHAs, I added report printing at random intervals (so, not at shutdown) and saw with my patch a much reduced rate of tearing. The remaining tears come from the fact that the log line https://github.com/openjdk/jdk/blob/58bf813f3d60fe7bde734df68c64dbf12be01250/src/hotspot/share/compiler/compilationMemoryStatistic.cpp#L434 is printed outside the log scope still, so if report printing commences exactly at this point ( https://github.com/openjdk/jdk/blob/58bf813f3d60fe7bde734df68c64dbf12be01250/src/hotspot/share/compiler/compilationMemoryStatistic.cpp#L430), we still see the log line in middle of the report. I don't want to move the compiler log line into lock scope however since I don't want compiler threads to wait for each other after compilation. I also did not want to overcomplicate the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18866#issuecomment-2071443141 From kvn at openjdk.org Tue Apr 23 05:37:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 05:37:33 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. It seems I can't remove second ThreadWXEnable in sharedRuntime.cpp because there is call site patching there. I will move ThreadWXEnable near it and update comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2071445866 From rcastanedalo at openjdk.org Tue Apr 23 06:38:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 06:38:47 GMT Subject: RFR: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line Message-ID: The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. **Testing:** tier1-3, ongoing (windows-x64, linux-x64, linux-aarch64, and macosx-x64) ------------- Commit messages: - Enable test only when using generational ZGC Changes: https://git.openjdk.org/jdk/pull/18906/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18906&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330862 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18906.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18906/head:pull/18906 PR: https://git.openjdk.org/jdk/pull/18906 From thartmann at openjdk.org Tue Apr 23 06:38:48 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 Apr 2024 06:38:48 GMT Subject: RFR: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 06:29:57 GMT, Roberto Casta?eda Lozano wrote: > The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. > > **Testing:** tier1-3, ongoing (windows-x64, linux-x64, linux-aarch64, and macosx-x64) Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18906#pullrequestreview-2016349327 From dholmes at openjdk.org Tue Apr 23 06:38:48 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 23 Apr 2024 06:38:48 GMT Subject: RFR: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 06:29:57 GMT, Roberto Casta?eda Lozano wrote: > The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. > > **Testing:** tier1-3, ongoing (windows-x64, linux-x64, linux-aarch64, and macosx-x64) LGTM. Thanks for the quick fix. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18906#pullrequestreview-2016351256 From stefank at openjdk.org Tue Apr 23 06:41:27 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 23 Apr 2024 06:41:27 GMT Subject: RFR: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 06:29:57 GMT, Roberto Casta?eda Lozano wrote: > The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. > > **Testing:** tier1-3, ongoing (windows-x64, linux-x64, linux-aarch64, and macosx-x64) Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18906#pullrequestreview-2016358077 From rcastanedalo at openjdk.org Tue Apr 23 06:41:27 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 06:41:27 GMT Subject: RFR: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line In-Reply-To: References: Message-ID: <0a_xOv0GVQWEYV5JYvqlCqnaYKnbrY4VsWkGSBvOIIU=.6c3a2ff4-cee0-4702-9030-705294623726@github.com> On Tue, 23 Apr 2024 06:29:57 GMT, Roberto Casta?eda Lozano wrote: > The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. > > **Testing:** tier1-3, ongoing (windows-x64, linux-x64, linux-aarch64, and macosx-x64) Thanks for reviewing, Tobias, David, and Stefan! I will integrate as soon as testing is done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18906#issuecomment-2071515574 From rcastanedalo at openjdk.org Tue Apr 23 06:55:33 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 06:55:33 GMT Subject: Integrated: 8330862: GCBarrierIRExample fails when a different GC is selected via the command line In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 06:29:57 GMT, Roberto Casta?eda Lozano wrote: > The example IR framework test introduced by [JDK-8330153](https://bugs.openjdk.org/browse/JDK-8330153) runs on (generational) ZGC only (`TestFramework.runWithFlags("-XX:+UseZGC", "-XX:+ZGenerational")`), but allows the jtreg user to select a conflicting GC externally, which causes the VM to fail. This changeset restricts the test so that it only runs if either ZGC or no GC is selected externally. > > **Testing:** tier1-3 (windows-x64, linux-x64, linux-aarch64, and macosx-x64) This pull request has now been integrated. Changeset: 574ba140 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/574ba1400e015bf579190828fbdf0618eed48bdf Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8330862: GCBarrierIRExample fails when a different GC is selected via the command line Reviewed-by: thartmann, dholmes, stefank ------------- PR: https://git.openjdk.org/jdk/pull/18906 From mli at openjdk.org Tue Apr 23 07:26:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Apr 2024 07:26:36 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v3] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 01:46:33 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - refine code >> - Merge branch 'master' into vector-load-shuffle >> - add comment >> - Initial commit > > src/hotspot/cpu/riscv/riscv_v.ad line 81: > >> 79: case Op_VectorLoadShuffle: >> 80: case Op_VectorRearrange: >> 81: // vlen >= 4 is required, because min vector size for byte is 4 on riscv, > > I was not aware of such a constraint before. Is this a constraint at the ISA level or is a performance consideration? I didn't find where it is mentioned in the RVV spec. No, it's an existing constraint in jdk itself, please check the code in `Matcher::min_vector_size`, I think it's partially for performance consideration. It's the same as aarch64 and x86. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18835#discussion_r1575773427 From fyang at openjdk.org Tue Apr 23 07:34:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 23 Apr 2024 07:34:35 GMT Subject: RFR: 8317720: RISC-V: Implement Adler32 intrinsic [v6] In-Reply-To: References: Message-ID: <3juhaO3iNbSdakSMDzcjgpARY7O4XMCe1pZMsxBFsis=.d8747866-c563-4fcd-af1d-62383857cbd8@github.com> On Thu, 18 Apr 2024 08:39:35 GMT, ArsenyBochkarev wrote: >> Hello everyone! Please review this ~non-vectorized~ implementation of `_updateBytesAdler32` intrinsic. Reference implementation for AArch64 can be found [here](https://github.com/openjdk/jdk9/blob/master/hotspot/src/cpu/aarch64/vm/stubGenerator_aarch64.cpp#L3281). >> >> ### Correctness checks >> >> Test `test/hotspot/jtreg/compiler/intrinsics/zip/TestAdler32.java` is ok. All tier1 also passed. >> >> ### Performance results on T-Head board >> >> Enabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> | Adler32.TestAdler32.testAdler32Update | 64 | thrpt | 25 | 5522.693 | 23.387 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 128 | thrpt | 25 | 3430.761 | 9.210 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 256 | thrpt | 25 | 1962.888 | 5.323 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 512 | thrpt | 25 | 1050.938 | 0.144 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 1024 | thrpt | 25 | 549.227 | 0.375 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 2048 | thrpt | 25 | 280.829 | 0.170 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 5012 | thrpt | 25 | 116.333 | 0.057 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 8192 | thrpt | 25 | 71.392 | 0.060 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 16384 | thrpt | 25 | 35.784 | 0.019 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 32768 | thrpt | 25 | 17.924 | 0.010 | ops/ms | >> | Adler32.TestAdler32.testAdler32Update | 65536 | thrpt | 25 | 8.940 | 0.003 | ops/ms | >> >> Disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | ------------------------------------- | ----------- | ------ | --------- | ------ | --------- | ---------- | >> |Adler32.TestAdler32.testAdler32Update|64|thrpt|25|655.633|5.845|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|128|thrpt|25|587.418|10.062|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|256|thrpt|25|546.675|11.598|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|512|thrpt|25|432.328|11.517|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|1024|thrpt|25|311.771|4.238|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|2048|thrpt|25|202.648|2.486|ops/ms| >> |Adler32.TestAdler32.testAdler32Update|5012|thrpt|... > > ArsenyBochkarev has updated the pull request incrementally with 12 additional commits since the last revision: > > - Use mv instead of li > - Prettify function > - Remove unnecessary zeroing of vtemp1, vtemp2 > - Remove unnecessary zeroing of v4, ..., v27 > - Remove unnecessary assert > - Move similar unroll code to a function > - Fix comment > - Dispose of unnecessary arguments in accum function > - Accelerate vectorization > - Use two vredsum instead of vadd + vwredsum > - Make use of more vector registers > - Dispose of most of vsetivli instructions > - Prettify loop remainder > - ... and 2 more: https://git.openjdk.org/jdk/compare/8a74349c...3cf649c9 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5090: > 5088: > 5089: __ vsetivli(temp0, 16, Assembler::e8, Assembler::m1); > 5090: for (int i = 0; i < unroll_factor; i++) Does it make sense to limit the vector lenth to 16 bytes and do loop unrolling here? I think the aarch64 version of `generate_updateBytesAdler32_accum` has this constraint because they use NEON which only has 128-bit vector registers. But for RVV, we can combine several vector registers into register group (LMUL greater than 1). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18382#discussion_r1575783492 From epeter at openjdk.org Tue Apr 23 07:42:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 07:42:59 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop Message-ID: Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. **Example where we get the "bad dominance"** This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. The address is parsed into its components by `VPointer`: `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. **Why does this happen?** Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but also the pre-loop. The pinning after pre-loop and before main-loop via `1513 CastLL` seems to be quite rare. The `CastLL` comes from the long `checkIndex` for the Unsafe / MemorySegment load, somehow in combination with the constant-size array / constant range of the main-loop. Another realization: the `adr` is basically an addition `raw_base + offset`. I would have expected the `offset` to be an invariant and end up in the `invar`. But the `VPointer` parsing seems to only parse through the `AddP`, and stop at the `CastX2P`, and take this as the `adr`, even though we could parse further through the `1590 AddL`, and separate the `11 Parm = long offset` and the `602 LoadL = raw_base`. **Solution** For now, and for the simplicity of backports, I simply check that the `adr` is not just `is_loop_member` (of the main-loop), but that it is `invariant` (of the main-loop and the pre-loop). We also already do this check for invariants `invar`, and the base/adr is essencially another invariant. That solution means that in our regression tests, we would mark the problematic `VPointer`s as not valid, and they will not be vectorized. This case is very rare (after all, we have never hit this bad-dominance assert before), and therefore there should not be any relevant performance regression. In a **future RFE**, I can then look into improving the `VPointer` parsing. The `VPointer` code is already very convoluted, and adding much more functionality will be difficult to manage. I want to completely refactor `VPointer` in a few months anyway. With improved parsing, this regression test could vectorize again. ------------- Commit messages: - fix and more test runs - ms test - 8330819 Changes: https://git.openjdk.org/jdk/pull/18892/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330819 Stats: 163 lines in 3 files changed: 162 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18892/head:pull/18892 PR: https://git.openjdk.org/jdk/pull/18892 From fyang at openjdk.org Tue Apr 23 07:51:33 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 23 Apr 2024 07:51:33 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:45:45 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review the patch for instrinsic VectorLoadShuffle? >> >> BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. >> >> Thanks >> >> ## Test >> test/jdk/jdk/incubator/vector/ >> test/hotspot/jtreg/compiler/vectorapi > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - refine code > - Merge branch 'master' into vector-load-shuffle > - add comment > - Initial commit Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18835#pullrequestreview-2016509770 From thartmann at openjdk.org Tue Apr 23 08:09:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 23 Apr 2024 08:09:30 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> Message-ID: On Fri, 19 Apr 2024 17:16:24 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Comment on not allowing macro nodes after we start expanding. Rename > dont_allow_macro_nodes to reset_allow_macro_nodes. All tests passed. > The CMoveNode is processed, but its input Bool and Cmp are never processed. Maybe we need to transform the CMove's inputs in Maybe verify if a `_igvn._worklist.push(...)` helps. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18824#issuecomment-2071681459 From epeter at openjdk.org Tue Apr 23 08:10:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 08:10:38 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 15:37:13 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> improve RC comment for Vladimir > > New comment is good now. Thanks! @vnkozlov so you are approving of the current state of the code? Just asking because you have not explicitly re-approved the code ; @rwestrel @TobiHartmann Would you mind re-reviewing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2071681950 From galder at openjdk.org Tue Apr 23 08:41:35 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 23 Apr 2024 08:41:35 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 05:38:34 GMT, Boris Ulasevich wrote: > ``` > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > ``` > > 1234 tests does not seem to be enough for a low-level feature. I would also check :hotspot_gc :hotspot_serviceability :hotspot_runtime and jdk tier1-3 targets. Fair point. I went ahead and run the following tests on linux/x64: $ make test TEST="hotspot_compiler hotspot_gc hotspot_serviceability hotspot_runtime tier1 tier2 tier3" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1 -Xcomp" And got some failures: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:hotspot_compiler 1247 1247 0 0 jtreg:test/hotspot/jtreg:hotspot_gc 512 512 0 0 jtreg:test/hotspot/jtreg:hotspot_serviceability 340 340 0 0 jtreg:test/hotspot/jtreg:hotspot_runtime 793 793 0 0 jtreg:test/hotspot/jtreg:tier1 2050 2050 0 0 >> jtreg:test/jdk:tier1 2369 2368 1 0 << jtreg:test/langtools:tier1 4477 4477 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 33 33 0 0 jtreg:test/hotspot/jtreg:tier2 649 649 0 0 >> jtreg:test/jdk:tier2 4100 4098 1 1 << >> jtreg:test/langtools:tier2 11 10 1 0 << jtreg:test/jaxp:tier2 515 515 0 0 jtreg:test/hotspot/jtreg:tier3 245 245 0 0 >> jtreg:test/jdk:tier3 1413 1406 4 3 << jtreg:test/langtools:tier3 0 0 0 0 jtreg:test/jaxp:tier3 0 0 0 0 ============================== TEST FAILURE On closer inspection, the tests that failed are: jtreg:test/jdk/jdk/jfr/event/compiler/TestCompilerCompile.java jtreg:test/jdk/java/security/AccessController/DoPrivAccompliceTest.java jtreg:test/langtools/jdk/jshell/FailOverDirectExecutionControlTest.java jtreg:test/jdk/java/rmi/server/RemoteServer/AddrInUse.java jtreg:test/jdk/java/rmi/server/RMISocketFactory/useSocketFactory/unicast/TCPEndpointReadBug.java jtreg:test/jdk/jdk/jfr/event/gc/stacktrace/TestMetaspaceParallelGCAllocationPendingStackTrace.java jtreg:test/jdk/jdk/jfr/event/gc/stacktrace/TestMetaspaceSerialGCAllocationPendingStackTrace.java I tried with the commit before my changes (https://github.com/openjdk/jdk/commit/eebcc218) and the tests also fail there when run with `JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1 -Xcomp"` ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2071741244 From epeter at openjdk.org Tue Apr 23 08:52:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 08:52:46 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v2] In-Reply-To: References: Message-ID: > Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. > > **Example where we get the "bad dominance"** > > This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). > > The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. > > The address is parsed into its components by `VPointer`: > `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` > > We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. > > The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). > > ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) > > During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: > > ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) > > You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. > > **Why does this happen?** > > Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but also the pre-loop. The pinning after pre-loop and befor... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: test updates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18892/files - new: https://git.openjdk.org/jdk/pull/18892/files/9be1e090..d5e51590 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=00-01 Stats: 136 lines in 2 files changed: 45 ins; 89 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18892/head:pull/18892 PR: https://git.openjdk.org/jdk/pull/18892 From galder at openjdk.org Tue Apr 23 09:04:32 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 23 Apr 2024 09:04:32 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 05:32:10 GMT, Boris Ulasevich wrote: > please add @summary +1 > is the purpose of the test to check that array clone throws NPE for null input and does not throw otherwise? I added this test to verify that null values for the array where handled correctly when the array clone call had been C1 compiled. I added this test at the time because I discovered a bug in the implementation and I had not seen any existing tests fail. > Don't we want to check the contents of the copied data? > Don't we want to check different sizes and array types? Verifying the contents sounds good. Different sizes and types sounds good as well, but what sizes would you choose? For the types, I would limit it to primitive types since the intrinsic is only implemented for those. > Is 1K iterations enough to compile the method? That seemed to be enough in my case to trigger the issue, what number should I use instead? I found `Tier3CompileThreshold` that is 2000, so maybe increase it to that just in case? Is there a way to verify from the test that the method has been C1 compiled before going ahead and invoking the method with null? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1575906297 From galder at openjdk.org Tue Apr 23 09:14:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 23 Apr 2024 09:14:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v11] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/c1/c1_GraphBuilder.cpp Co-authored-by: Boris <42576543+bulasevich at users.noreply.github.com> - Added test summary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/2d8854d0..595d1e99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=09-10 Stats: 13 lines in 2 files changed: 1 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Tue Apr 23 09:14:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 23 Apr 2024 09:14:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:01:57 GMT, Galder Zamarre?o wrote: >> test/hotspot/jtreg/compiler/c1/TestNullArrayClone.java line 26: >> >>> 24: /* >>> 25: * @test >>> 26: * @bug 8302850 >> >> please add @summary >> is the purpose of the test to check that array clone throws NPE for null input and does not throw otherwise? >> Don't we want to check the contents of the copied data? >> Don't we want to check different sizes and array types? >> Is 1K iterations enough to compile the method? > >> please add @summary > > +1 > >> is the purpose of the test to check that array clone throws NPE for null input and does not throw otherwise? > > I added this test to verify that null values for the array where handled correctly when the array clone call had been C1 compiled. I added this test at the time because I discovered a bug in the implementation and I had not seen any existing tests fail. > >> Don't we want to check the contents of the copied data? >> Don't we want to check different sizes and array types? > > Verifying the contents sounds good. > > Different sizes and types sounds good as well, but what sizes would you choose? For the types, I would limit it to primitive types since the intrinsic is only implemented for those. > >> Is 1K iterations enough to compile the method? > > That seemed to be enough in my case to trigger the issue, what number should I use instead? I found `Tier3CompileThreshold` that is 2000, so maybe increase it to that just in case? Is there a way to verify from the test that the method has been C1 compiled before going ahead and invoking the method with null? Added brief test summary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1575916707 From mli at openjdk.org Tue Apr 23 09:46:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Apr 2024 09:46:08 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v12] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Add vectorized and scalar version Float tests checking full 32 bits range ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/ec51c774..02d7600f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=10-11 Stats: 243 lines in 4 files changed: 241 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From mli at openjdk.org Tue Apr 23 09:46:08 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Apr 2024 09:46:08 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v9] In-Reply-To: <9l5M9DRG3AT-3aGSAr7VE6jm3WCpmY9DkQCoDg5026c=.cdca309d-6bf0-4879-9b83-3379ec7e5799@github.com> References: <8NTtFb2VzNSiEVMTzHz0An84ZlpYqYTK0n7gMZyfZOE=.71abbac7-9284-4231-b4ca-b15c4c407424@github.com> <28zrJcUXq46dwhe4M7Z98i62jqMjRAR9Cq7M8uY50R8=.66828c14-04b2-4c07-a309-760dff5e20e5@github.com> <8uhnDq5Ed45GTULdU3IKvFG3ssKPm2EYoEE 5p8Gflqo=.ddffb8e2-9429-4252-89cf-6f9586d30eed@github.com> <9l5M9DRG3AT-3aGSAr7VE6jm3WCpmY9DkQCoDg5026c=.cdca309d-6bf0-4879-9b83-3379ec7e5799@github.com> Message-ID: On Mon, 22 Apr 2024 15:51:03 GMT, Andrew Haley wrote: >> @theRealAph out of office, so don't have much time to think this through. But maybe we want both, a slower IR test which ensures we have the desired IR (with random input values), and also a non-IR test that is faster and checks the correct results more exhaustively? > > Totally. Then, at least for float, you've got it all. Thanks. I've added new tests which check full 32 bits range for Float. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1575964785 From epeter at openjdk.org Tue Apr 23 09:52:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 09:52:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. src/hotspot/share/opto/memnode.cpp line 2830: > 2828: val->in(MemNode::Memory )->eqv_uncast(mem) && > 2829: val->as_Load()->store_Opcode() == Opcode()) { > 2830: // Handle StoreVector with offsets and masks Also: the indendation is not right: it should only be 2 spaces from the `if`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1575974443 From epeter at openjdk.org Tue Apr 23 10:31:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 10:31:31 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 12:20:34 GMT, Damon Fenacci wrote: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. I've randomly thought about this change again (while trying to sleep yesterday...). And started worrying about this kind of case: import jdk.incubator.vector.*; public class Test { private static final VectorSpecies I_SPECIES = IntVector.SPECIES_MAX; public static void main(String[] args) { // create mask boolean[] intMask = new boolean[I_SPECIES.length()]; for (int i = 0; i < intMask.length; i++) { intMask[i] = (i % 2 == 0); } int[] intArray1 = new int[I_SPECIES.length()]; int[] intArray2 = new int[I_SPECIES.length()]; int[] intArray3 = new int[I_SPECIES.length()]; for (int i = 0; i < intArray1.length; i++) { intArray1[i] = i; intArray2[i] = -2 * i; intArray3[i] = 0; } for (int i = 0; i < 10_000; i++) { test1(intMask, intArray1, intArray2, intArray3); } for (int i = 0; i < intArray1.length; i++) { System.out.println("i: " + i + " " + intArray1[i] + " " +intArray2[i] + " " + intArray3[i]); } } static void test1(boolean[] intMask, int[] intArray1, int[] intArray2, int[] intArray3) { VectorMask intVectorMask = VectorMask.fromArray(I_SPECIES, intMask, 0); // Load values: 0 1 2 3 4 5 ... IntVector a = IntVector.fromArray(I_SPECIES, intArray1, 0); // Store, but only every second value: // Store: 0 x 2 x 4 x ... // Already there: 0 -2 -4 -6 -8 -10 ... // Result: 0 -2 2 -6 4 -10 ... a.intoArray(intArray2, 0, intVectorMask); // Load, but we cannot just take the value from vector a, since we mask it, and where the mask is // off, we must have zero. // Load: 0 0 2 0 4 0 ... IntVector b = IntVector.fromArray(I_SPECIES, intArray2, 0, intVectorMask); b.intoArray(intArray3, 0); } } This is not a bug yet, I think, but it could be if you do the change I suggested with `store_Opcode()`. Currently, if we have: 2035 StoreVectorMasked === 934 7 2031 1979 1956 |893 [[ 2034 2041 2066 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact[0] *, idx=7; !jvms: IntVector::intoArray0Template @ bci:50 (line 3556) IntMaxVector::intoArray0 @ bci:9 (line 911) IntVector::intoArray @ bci:53 (line 3269) Test::test1 @ bci:26 (line 34) 2041 LoadVectorMasked === 934 2035 2031 1956 |893 [[ 1767 2066 1750 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched #vectorz[16]:{int} !jvms: IntVector::fromArray0Template @ bci:53 (line 3455) IntMaxVector::fromArray0 @ bci:11 (line 874) IntVector::fromArray @ bci:32 (line 2986) Test::test1 @ bci:36 (line 35) The load does not succeed in `MemNode::can_see_stored_value`, because the load's `store_Opcode() == Op_StoreVector`, and not `Op_StoreVectorMasked`. But if you were to implement `LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }`, then you have to be careful: A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked. But it seems to me that you can actually never have a successful `MemNode::can_see_stored_value` case for masked operations, with your current code. It would always fail the `store_Opcode() == st->Opcode()` check. And for that gives the correct result, but it is still a bit strange that we don't override the `store_Opcode` for the masked/offset vector stores. I don't know which way you want to go now. I these options: - Keep disallowing masked load/store "look-throughs". - Do that by having the "incorrect" `store_Opcode` as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct. - OR: have the correct `store_Opcode`, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those. - Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off). Not sure if this is all very clear, feel free to ask. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2071956396 From aph at openjdk.org Tue Apr 23 10:35:31 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 Apr 2024 10:35:31 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 20:54:03 GMT, Smita Kamath wrote: > Hi, I've attached the alternative fix here. Please let me know if you have any questions. Thank you. [alternative-fix-8330611.txt](https://github.com/openjdk/jdk/files/15045540/alternative-fix-8330611.txt) That looks nice, and is a good stylistic match for the rest of the code. The comment should be here, though: @@ -2614,8 +2615,11 @@ void StubGenerator::aesctr_encrypt(Register src_addr, Register dest_addr, Regist __ bind(EXTRACT_TAILBYTES); // Save encrypted counter value in xmm0 for next invocation, before XOR operation __ movdqu(Address(saved_encCounter_start, 0), xmm0); // XOR encryted block cipher in xmm0 with PT to produce CT + __ mov64(tail, -1L); + __ bzhiq(tail, tail, len_reg); + __ kmovql(k1, tail); - __ evpxorq(xmm0, xmm0, Address(src_addr, pos, Address::times_1, 0), Assembler::AVX_128bit); + __ evpxorq(xmm0, k1, xmm0, Address(src_addr, pos, Address::times_1, 0), true, Assembler::AVX_128bit); // extract up to 15 bytes of CT from xmm0 as specified by length register __ testptr(len_reg, 8); __ jcc(Assembler::zero, EXTRACT_TAIL_4BYTES); ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2071963021 From shade at openjdk.org Tue Apr 23 11:14:27 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Apr 2024 11:14:27 GMT Subject: RFR: 8330805: ARM32 build is broken after JDK-8139457 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:19:47 GMT, Aleksei Voitylov wrote: > The JDK-8139457 patch changes the header_size argument of C1_MacroAssembler::allocate_array, the input value now means offset in bytes. The ARM32 allocate_array implementation is fixed accordingly. > > Testing: jtreg hotspot, jtreg jdk tier1-3 TBH, this also looks trivial, so I think it can already go in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18890#issuecomment-2072027957 From coleenp at openjdk.org Tue Apr 23 11:29:30 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Apr 2024 11:29:30 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments Dean, can you change your numbers to say which thread? In the code someone has written that you need a LoadLoad between t2's [t2-3] and [t2-4] because t2 assumes that it's a fast path, so writes to the ResolvedFieldEntry [t1-2] are complete. The missing LoadLoad there was the cause of the crash anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2072051989 From rcastanedalo at openjdk.org Tue Apr 23 11:34:32 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 11:34:32 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 03:55:33 GMT, Joshua Zhu wrote: >> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. >> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, >> even the use of a floating point may cause the maximum 2048 bits stack occupied. >> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. >> >> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 >> >> >> ...... >> 0x0000ffff684cfad8: stp x15, x18, [sp, #80] >> 0x0000ffff684cfadc: sub sp, sp, #0x100 >> 0x0000ffff684cfae0: str z16, [sp] >> 0x0000ffff684cfae4: add x1, x13, #0x10 >> 0x0000ffff684cfae8: mov x0, x16 >> ;; 0xFFFF803F5414 >> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 >> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 >> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfaf8: blr x8 >> 0x0000ffff684cfafc: mov x16, x0 >> 0x0000ffff684cfb00: ldr z16, [sp] >> 0x0000ffff684cfb04: add sp, sp, #0x100 >> 0x0000ffff684cfb08: ptrue p7.b >> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] >> ...... >> >> >> could be optimized into: >> >> >> ...... >> 0x0000ffff684cfa50: stp x15, x18, [sp, #80] >> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() >> 0x0000ffff684cfa58: add x1, x13, #0x10 >> 0x0000ffff684cfa5c: mov x0, x16 >> ;; 0xFFFF7FA942A8 >> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 >> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 >> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfa6c: blr x8 >> 0x0000ffff684cfa70: mov x16, x0 >> 0x0000ffff684cfa74: ldr d16, [sp], #16 >> 0x0000ffff684cfa78: ptrue p7.b >> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] >> ...... >> >> >> Besides the above benefit, when we know what size of register is live, >> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. >> >> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Add more output for easy debugging once the jtreg test case fails Looks good. I also tested the changeset on Oracle's internal CI (ZGC tests within tiers 1-7, on Neon machines) with an additional patch (https://github.com/openjdk/jdk/commit/963def0415830bc5979c5bb6064a566b1c8040dd) that forces ZGC read barriers to always take the slow path and clears all vector registers upon the slow path's runtime call. Testing succeeded. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17977#pullrequestreview-2016978929 From shade at openjdk.org Tue Apr 23 12:09:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Apr 2024 12:09:29 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v12] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 16:56:28 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add riscv64 to test All right, checking, are we still good here? We need someone else to ack as well. Maybe @theRealAph would be interested to sanity-check this? @vnkozlov, would you like to run this through Oracle testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2072130825 From bulasevich at openjdk.org Tue Apr 23 13:32:34 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 23 Apr 2024 13:32:34 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:09:02 GMT, Galder Zamarre?o wrote: >>> please add @summary >> >> +1 >> >>> is the purpose of the test to check that array clone throws NPE for null input and does not throw otherwise? >> >> I added this test to verify that null values for the array where handled correctly when the array clone call had been C1 compiled. I added this test at the time because I discovered a bug in the implementation and I had not seen any existing tests fail. >> >>> Don't we want to check the contents of the copied data? >>> Don't we want to check different sizes and array types? >> >> Verifying the contents sounds good. >> >> Different sizes and types sounds good as well, but what sizes would you choose? For the types, I would limit it to primitive types since the intrinsic is only implemented for those. >> >>> Is 1K iterations enough to compile the method? >> >> That seemed to be enough in my case to trigger the issue, what number should I use instead? I found `Tier3CompileThreshold` that is 2000, so maybe increase it to that just in case? Is there a way to verify from the test that the method has been C1 compiled before going ahead and invoking the method with null? > > Added brief test summary. Thank you! > what sizes would you choose? For the types, I would limit it to primitive types Yes, checking the primitive types is fine. Let it be int[], long[], and byte[]. For size I would pick an odd. static final int ITER = 2000; // ~ Tier3CompileThreshold static final int ARRAY_SIZE = 999; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1576268353 From chagedorn at openjdk.org Tue Apr 23 13:32:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 Apr 2024 13:32:37 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v2] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 08:52:46 GMT, Emanuel Peter wrote: >> Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. >> >> **Example where we get the "bad dominance"** >> >> This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). >> >> The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. >> >> The address is parsed into its components by `VPointer`: >> `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` >> >> We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. >> >> The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) >> >> During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) >> >> You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. >> >> **Why does this happen?** >> >> Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but als... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > test updates That looks good to me. > In a future RFE, I can then look into improving the VPointer parsing. The VPointer code is already very convoluted, and adding much more functionality will be difficult to manage. I want to completely refactor VPointer in a few months anyway. With improved parsing, this regression test could vectorize again. Sounds good! Maybe you can link this bug to this RFE (if there is already one) to not forget about updating this test to check if it indeed vectorizes. test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegmentMainLoopAlignment.java line 32: > 30: * @modules java.base/jdk.internal.misc > 31: * @modules java.base/jdk.internal.util > 32: * @library /test/lib / Since you don't use anything from the test libs, I think you can remove this line. Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18892#pullrequestreview-2017223726 PR Review Comment: https://git.openjdk.org/jdk/pull/18892#discussion_r1576235069 From bulasevich at openjdk.org Tue Apr 23 13:40:34 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 23 Apr 2024 13:40:34 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 13:29:51 GMT, Boris Ulasevich wrote: >> Added brief test summary. > > Thank you! > >> what sizes would you choose? For the types, I would limit it to primitive types > > Yes, checking the primitive types is fine. Let it be int[], long[], and byte[]. For size I would pick an odd. > > > static final int ITER = 2000; // ~ Tier3CompileThreshold > static final int ARRAY_SIZE = 999; If you like, you can inspect the output of the -XX:+PrintLIR option to see if C1 applies arraycopy as expected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1576280836 From chagedorn at openjdk.org Tue Apr 23 13:48:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 Apr 2024 13:48:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v5] In-Reply-To: References: Message-ID: > **Update: April 22** > > After splitting off and integrating the following PRs from this PR: > https://github.com/openjdk/jdk/pull/18080 > https://github.com/openjdk/jdk/pull/18293 > https://github.com/openjdk/jdk/pull/18628 > https://github.com/openjdk/jdk/pull/18723 > > we are only left with a few renaming and clean-ups from this PR. Directly merging the master branch in was quite hard. I therefore reverted all commits to get back to a clean master and then applied all remaining code changes manually (required a force push). > >
>
> > _------------ Original PR description --------------_ > > This patch is intended for JDK 23. > > While preparing the patch for the full fix for Assertion Predicates [JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981), I still noticed that some changes are not required for the actual fix and could be split off and reviewed separately in this PR. > > The patch applies the following cleanup changes: > - The complete fix had to add slightly different cloning cases in `PhaseIdealLoop::create_bool_from_template_assertion_predicate()` which already has quite some logic to switch between different cases. Additionally, the algorithm in the method itself was already hard to understand and difficult to adapt. I therefore re-implemented it in a separate class `CloneTemplateAssertionPredicateBool` together with some helper classes like `DFSNodeStack`. To use it, I've added a `TemplateAssertionPredicateBool` class that offers three cloning possibilities: > - `clone()`: Clone without modification > - `clone_and_replace_opaque_loop_nodes()`: Clone and replace the `OpaqueLoop*Nodes` with a new init and stride node. > - `clone_and_replace_init()`: Special case of `clone_and_replace_opaque_loop_nodes()` which only replaces `OpaqueLoopInitNode` and clones `OpaqueLoopStrideNode`. > > This refactoring could be extracted from the complete fix. > - The Split If code to detect (`subgraph_has_opaque()`) and clone Template Assertion Predicate Bools was extracted to a separate class `CloneTemplateAssertionPredicateBoolDown` and uses the new `TemplateAssertionPredicateBool` class to do the actual cloning. > - In the process of coding the complete fix, I've refactored the Loop Unswitching code quite a bit. This change could also be extracted into a separate RFE. Changes include: > - Renaming > - Extracting code to separate classes/methods > - Adding comments > - Some small refactoring including: > - Removing unused parameters > - Renaming variables/parameters/methods > > Th... Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Fix useful Template Assertion Predicate marking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16877/files - new: https://git.openjdk.org/jdk/pull/16877/files/f9f74276..2b646e15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16877&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16877.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16877/head:pull/16877 PR: https://git.openjdk.org/jdk/pull/16877 From chagedorn at openjdk.org Tue Apr 23 13:48:57 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 Apr 2024 13:48:57 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v5] In-Reply-To: References: Message-ID: On Fri, 22 Dec 2023 15:02:02 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix useful Template Assertion Predicate marking > > src/hotspot/share/opto/loopnode.cpp line 4353: > >> 4351: // Initialize Dominators. >> 4352: // Checked in clone_loop_predicate() during beautify_loops(). >> 4353: _idom_size = 0; > > looks like a bug-fix? Is now required due to changing `can_apply_loop_predication()` which is checked before calling this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16877#discussion_r1576293173 From epeter at openjdk.org Tue Apr 23 14:29:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 14:29:05 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v3] In-Reply-To: References: Message-ID: > Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. > > **Example where we get the "bad dominance"** > > This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). > > The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. > > The address is parsed into its components by `VPointer`: > `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` > > We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. > > The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). > > ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) > > During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: > > ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) > > You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. > > **Why does this happen?** > > Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but also the pre-loop. The pinning after pre-loop and befor... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rm lib from test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18892/files - new: https://git.openjdk.org/jdk/pull/18892/files/d5e51590..ffdafa80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18892&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18892.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18892/head:pull/18892 PR: https://git.openjdk.org/jdk/pull/18892 From epeter at openjdk.org Tue Apr 23 14:38:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 14:38:29 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v3] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 14:29:05 GMT, Emanuel Peter wrote: >> Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. >> >> **Example where we get the "bad dominance"** >> >> This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). >> >> The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. >> >> The address is parsed into its components by `VPointer`: >> `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` >> >> We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. >> >> The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) >> >> During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) >> >> You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. >> >> **Why does this happen?** >> >> Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but als... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm lib from test [JDK-8330991](https://bugs.openjdk.org/browse/JDK-8330991) C2 SuperWord: refactor VPointer Here the RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18892#issuecomment-2072499141 From kvn at openjdk.org Tue Apr 23 14:40:49 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 14:40:49 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call Message-ID: In Leyden testing CI we start hitting assert: # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 # assert(i < _max) failed: oob: i=2, _max=2 Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. The fix is to add missing checks for If, Bool and Cmp nodes. The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. ------------- Commit messages: - 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call Changes: https://git.openjdk.org/jdk/pull/18916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18916&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330853 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18916/head:pull/18916 PR: https://git.openjdk.org/jdk/pull/18916 From szaldana at openjdk.org Tue Apr 23 14:44:43 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 23 Apr 2024 14:44:43 GMT Subject: RFR: 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags Message-ID: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Hi all, This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. Testing: - [x] Verified unrecognized option error is reported after removing options. Thanks, Sonia ------------- Commit messages: - 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags Changes: https://git.openjdk.org/jdk/pull/18904/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18904&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8327240 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18904.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18904/head:pull/18904 PR: https://git.openjdk.org/jdk/pull/18904 From epeter at openjdk.org Tue Apr 23 14:58:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 14:58:27 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> References: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> Message-ID: <1sErSyhhGq2PGSzZR3oKjj50hKfOJDA8UxOABMdd2n0=.7d9d378a-ec12-4c7a-a6e0-3db12b88462a@github.com> On Mon, 22 Apr 2024 12:29:18 GMT, Roberto Casta?eda Lozano wrote: >> This is an enhancement for AutoVectorization. >> >> I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). >> >> Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. >> >> **Solution Sketch: "canonicalize" the invar** >> >> - Extract all summands of the `invar`: make a list. >> - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. >> - Bypass `CastLL` and `CastII` >> - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. >> >> - Sort all extracted summands by node idx. >> - Add up all summands in new order. >> >> If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. >> >> **Example** >> >> >> invar1 = b + c + d + a >> invar2 = d + b + a + c >> >> -> equivalent but not identical nodes >> >> Sort, and add up again: >> >> invar1 = a + b + c + d >> invar2 = a + b + c + d >> >> -> now the nodes are identical >> >> **Motivation: MemorySegment with invar** >> >> One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? >> >> This example did not vectorize, even though it should: >> https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 >> >> Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. >> >> Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. >> >> The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. >> >> Why does this happen? After parsing, the graph looks like this: >> ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) >> >> We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. So... > > Fair enough, I agree that, even if there was a solution for this specific case, we would probably encounter other cases where GVN would not be powerful enough to detect the equivalence. I still wonder though what could be causing the `CastLL(invar + iv)` vs. `invar + iv` divergence in your example and whether anything could be done to get rid of it. Maybe worth filing a RFE for further investigation. @robcasloz are you intending to review, or was that just a drive-by comment/question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18795#issuecomment-2072578666 From mli at openjdk.org Tue Apr 23 15:06:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Apr 2024 15:06:36 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI Message-ID: Hi, Can you help to review the patch? The motivation is to implement `MulAddVS2VI`. But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. Thanks ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/18919/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18919&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321008 Stats: 73 lines in 4 files changed: 72 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18919/head:pull/18919 PR: https://git.openjdk.org/jdk/pull/18919 From aph at openjdk.org Tue Apr 23 15:34:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 Apr 2024 15:34:29 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> References: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> Message-ID: On Tue, 23 Apr 2024 02:55:14 GMT, Dean Long wrote: >>> If I understand correctly, the order of writes must be: >>> >>> 1. ResolvedFieldEntry fields, except _get_code and _put_code >> >> So, release fence here? >> >>> 2. _get_code, _put_code >> >> and another here >> >>> 3. patch_bytecode(fast_bytecode) >>> >>> >>> so the order of reads must be reversed. That's why there are load-acquires when reading _get_code and _put_code. After [3] is done, after dispatching to fast_bytecode, we need to do a LoadLoad between the already read fast bytecode [3] and the "cache" fields [1]. The LoadLoad is not for the load of the next bytecode that will be done in dispatch_next(). >> >> So, I guess the loadload fence being inserted here is the one we need between [2] and [3]. > >> So, I guess the loadload fence being inserted here is the one we need between [2] and [3]. > > The way I would say it is we need a LoadLoad betwen [3] and [2] or between [3] and [1]. The code assumes that if it is a fast bytecode, then it can read [1] without checking [2] again. My confusion is because @dean-long said _If I understand correctly, the order of writes must be: ResolvedFieldEntry fields, except _get_code and _put_code _get_code, _put_code patch_bytecode(fast_bytecode)_ therefore, if that ordering must be maintained, we'll need two store fences. And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2072723032 From rcastanedalo at openjdk.org Tue Apr 23 15:47:29 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 23 Apr 2024 15:47:29 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> References: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> Message-ID: On Mon, 22 Apr 2024 12:29:18 GMT, Roberto Casta?eda Lozano wrote: >> This is an enhancement for AutoVectorization. >> >> I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). >> >> Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. >> >> **Solution Sketch: "canonicalize" the invar** >> >> - Extract all summands of the `invar`: make a list. >> - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. >> - Bypass `CastLL` and `CastII` >> - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. >> >> - Sort all extracted summands by node idx. >> - Add up all summands in new order. >> >> If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. >> >> **Example** >> >> >> invar1 = b + c + d + a >> invar2 = d + b + a + c >> >> -> equivalent but not identical nodes >> >> Sort, and add up again: >> >> invar1 = a + b + c + d >> invar2 = a + b + c + d >> >> -> now the nodes are identical >> >> **Motivation: MemorySegment with invar** >> >> One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? >> >> This example did not vectorize, even though it should: >> https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 >> >> Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. >> >> Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. >> >> The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. >> >> Why does this happen? After parsing, the graph looks like this: >> ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) >> >> We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. So... > > Fair enough, I agree that, even if there was a solution for this specific case, we would probably encounter other cases where GVN would not be powerful enough to detect the equivalence. I still wonder though what could be causing the `CastLL(invar + iv)` vs. `invar + iv` divergence in your example and whether anything could be done to get rid of it. Maybe worth filing a RFE for further investigation. > @robcasloz are you intending to review, or was that just a drive-by comment/question? I'm happy to review this, just give me a few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18795#issuecomment-2072773080 From kvn at openjdk.org Tue Apr 23 15:53:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 15:53:28 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 14:36:36 GMT, Vladimir Kozlov wrote: > In Leyden testing CI we start hitting assert: > > > # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 > # assert(i < _max) failed: oob: i=2, _max=2 > > Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) > V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) > V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) > > > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) > which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. > > The fix is to add missing checks for If, Bool and Cmp nodes. > > The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. linux-x86 failed to upload results but tests passed: 2024-04-23T15:24:09.6443002Z TEST TOTAL PASS FAIL ERROR 2024-04-23T15:24:09.6443822Z jtreg:test/hotspot/jtreg:tier1_gc 312 312 0 0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2072793761 From chagedorn at openjdk.org Tue Apr 23 15:56:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 23 Apr 2024 15:56:27 GMT Subject: RFR: 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags In-Reply-To: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> References: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Message-ID: On Mon, 22 Apr 2024 20:23:49 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. > > Testing: > - [x] Verified unrecognized option error is reported after removing options. > > Thanks, > Sonia Since these are product flags, you need a CSR and then first obsolete the flag (I think deprecation can be skipped since the flag is unused). You can have a look at the following PR which also aimed to remove an unused product flag recently by first obsoleting it: https://github.com/openjdk/jdk/pull/18648 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18904#issuecomment-2072801562 From epeter at openjdk.org Tue Apr 23 16:11:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 16:11:31 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 22:10:56 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to arrays_equals Looks reasonable. Can you apply the indentation issue, please? src/hotspot/cpu/x86/macroAssembler_x86.hpp line 992: > 990: // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } > 991: > 992: // Short versions of the above Suggestion: // * No condition for this * void ALWAYSINLINE jcxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } // Short versions of the above src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1024: > 1022: void ALWAYSINLINE jpo_b(Label& L) { jccb(Assembler::noParity, L); } > 1023: // * No condition for this * void ALWAYSINLINE jcxz_b(Label& L) { jccb(Assembler::cxz, L); } > 1024: // * No condition for this * void ALWAYSINLINE jecxz_b(Label& L) { jccb(Assembler::cxz, L); } Suggestion: // * No condition for this * void ALWAYSINLINE jcxz_b(Label& L) { jccb(Assembler::cxz, L); } // * No condition for this * void ALWAYSINLINE jecxz_b(Label& L) { jccb(Assembler::cxz, L); } ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18893#pullrequestreview-2017688168 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576516938 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576517298 From epeter at openjdk.org Tue Apr 23 16:11:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 16:11:31 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: <296gr70D3-VHUuwQSXcoRpK9jeNArpSYmJEN6u5rc8Y=.06167515-96dd-45dc-8bb2-ae26c7e967e9@github.com> On Tue, 23 Apr 2024 16:03:42 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert changes to arrays_equals > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 992: > >> 990: // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } >> 991: >> 992: // Short versions of the above > > Suggestion: > > // * No condition for this * void ALWAYSINLINE jcxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } > // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } > > // Short versions of the above Everywhere else it is indented, so it would be nice if this kept the style ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576523367 From kvn at openjdk.org Tue Apr 23 16:14:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 16:14:38 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: > In Leyden testing CI we start hitting assert: > > > # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 > # assert(i < _max) failed: oob: i=2, _max=2 > > Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) > V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) > V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) > > > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) > which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. > > The fix is to add missing checks for If, Bool and Cmp nodes. > > The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Remove redundant opcode check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18916/files - new: https://git.openjdk.org/jdk/pull/18916/files/943fda52..6c19152f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18916&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18916&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18916/head:pull/18916 PR: https://git.openjdk.org/jdk/pull/18916 From sgibbons at openjdk.org Tue Apr 23 16:18:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 16:18:42 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v4] In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Comment indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18893/files - new: https://git.openjdk.org/jdk/pull/18893/files/f7d7f7de..fe5f3060 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18893/head:pull/18893 PR: https://git.openjdk.org/jdk/pull/18893 From epeter at openjdk.org Tue Apr 23 16:18:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 16:18:42 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v4] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 16:15:48 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Comment indentation Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18893#pullrequestreview-2017712371 From epeter at openjdk.org Tue Apr 23 16:18:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 16:18:42 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 22:10:56 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to arrays_equals Thanks for the update. I can sponsor as soon as you attempt integration again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2072847673 From sgibbons at openjdk.org Tue Apr 23 16:18:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 16:18:42 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: <296gr70D3-VHUuwQSXcoRpK9jeNArpSYmJEN6u5rc8Y=.06167515-96dd-45dc-8bb2-ae26c7e967e9@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> <296gr70D3-VHUuwQSXcoRpK9jeNArpSYmJEN6u5rc8Y=.06167515-96dd-45dc-8bb2-ae26c7e967e9@github.com> Message-ID: <4o4Twie-xnr15IYktMLGyAv8CC6Gp-A0hrwQO8Xozy0=.e58aafe0-bc28-456e-a8de-5973b3a10388@github.com> On Tue, 23 Apr 2024 16:08:25 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 992: >> >>> 990: // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } >>> 991: >>> 992: // Short versions of the above >> >> Suggestion: >> >> // * No condition for this * void ALWAYSINLINE jcxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } >> // * No condition for this * void ALWAYSINLINE jecxz(Label& L, bool maybe_short = true) { jcc(Assembler::cxz, L, maybe_short); } >> >> // Short versions of the above > > Everywhere else it is indented, so it would be nice if this kept the style Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576529956 From sgibbons at openjdk.org Tue Apr 23 16:18:42 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 16:18:42 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v3] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 16:03:55 GMT, Emanuel Peter wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert changes to arrays_equals > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1024: > >> 1022: void ALWAYSINLINE jpo_b(Label& L) { jccb(Assembler::noParity, L); } >> 1023: // * No condition for this * void ALWAYSINLINE jcxz_b(Label& L) { jccb(Assembler::cxz, L); } >> 1024: // * No condition for this * void ALWAYSINLINE jecxz_b(Label& L) { jccb(Assembler::cxz, L); } > > Suggestion: > > // * No condition for this * void ALWAYSINLINE jcxz_b(Label& L) { jccb(Assembler::cxz, L); } > // * No condition for this * void ALWAYSINLINE jecxz_b(Label& L) { jccb(Assembler::cxz, L); } Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576529780 From kvn at openjdk.org Tue Apr 23 16:19:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 16:19:33 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: <3o4Znojp0GLQrK3XaAdb0Dym0GRmMd1xFGTqL8FryLw=.95e9c231-551b-4fee-8851-e2c698546f6f@github.com> On Tue, 23 Apr 2024 16:14:38 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant opcode check I removed redundant Cmp node Opcode check because now in both call sites caller check it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2072849225 From sgibbons at openjdk.org Tue Apr 23 16:56:31 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 16:56:31 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v4] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 16:18:42 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Comment indentation Thank you. I'm waiting on @sviswa7 review before integrating again ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2072920449 From epeter at openjdk.org Tue Apr 23 16:56:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Apr 2024 16:56:33 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. If I understand right, this depends on the start-address of the array? If you created a huge number of arrays, how likely would it be to get such an overflow? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2072920135 From kvn at openjdk.org Tue Apr 23 16:56:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 16:56:33 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v12] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 16:56:28 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add riscv64 to test Good. @TobiHartmann submitted tier1-3,stress,xcopm testing for version v11 yesterday. Testing passed - no new failures. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18505#pullrequestreview-2017789026 PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2072917179 From avoitylov at openjdk.org Tue Apr 23 17:38:29 2024 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 23 Apr 2024 17:38:29 GMT Subject: RFR: 8330805: ARM32 build is broken after JDK-8139457 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:19:47 GMT, Aleksei Voitylov wrote: > The JDK-8139457 patch changes the header_size argument of C1_MacroAssembler::allocate_array, the input value now means offset in bytes. The ARM32 allocate_array implementation is fixed accordingly. > > Testing: jtreg hotspot, jtreg jdk tier1-3 Thanks Aleksey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18890#issuecomment-2073003615 From avoitylov at openjdk.org Tue Apr 23 18:14:32 2024 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 23 Apr 2024 18:14:32 GMT Subject: Integrated: 8330805: ARM32 build is broken after JDK-8139457 In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 14:19:47 GMT, Aleksei Voitylov wrote: > The JDK-8139457 patch changes the header_size argument of C1_MacroAssembler::allocate_array, the input value now means offset in bytes. The ARM32 allocate_array implementation is fixed accordingly. > > Testing: jtreg hotspot, jtreg jdk tier1-3 This pull request has now been integrated. Changeset: 88a5dcea Author: Aleksei Voitylov Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/88a5dcead21f50e367f8ad77197e6ffdb98cbb20 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8330805: ARM32 build is broken after JDK-8139457 Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/18890 From sviswanathan at openjdk.org Tue Apr 23 18:31:31 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 Apr 2024 18:31:31 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v4] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 16:18:42 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Comment indentation src/hotspot/cpu/x86/assembler_x86.cpp line 1835: > 1833: prefix(dst, reg); > 1834: emit_int8((unsigned char)0x39); > 1835: emit_operand(reg, dst, 1); This should be emit_operand(reg, dst, 0); src/hotspot/cpu/x86/assembler_x86.cpp line 4459: > 4457: } > 4458: > 4459: void Assembler::vpcmpeqb(XMMRegister dst, XMMRegister src1, Address src2, int vector_len) { InstructionMark missing in this instruction as well. src/hotspot/cpu/x86/assembler_x86.cpp line 4576: > 4574: // In this context, the dst vector contains the components that are equal, non equal components are zeroed in dst > 4575: void Assembler::vpcmpeqw(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { > 4576: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : VM_Version::supports_avx2(), ""); InstructionMark missing in this instruction which takes Address as operand? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3573: > 3571: } > 3572: > 3573: void MacroAssembler::vpcmpeqb(XMMRegister dst, XMMRegister src1, Address src2, int vector_len) { The assert is missing here: assert(((dst->encoding() < 16 && src1->encoding() < 16) || VM_Version::supports_avx512vlbw()),"XMM register should be 0-15"); src/hotspot/cpu/x86/macroAssembler_x86.hpp line 961: > 959: void ALWAYSINLINE jo(Label& L, bool maybe_short = true) { jcc(Assembler::overflow, L, maybe_short); } > 960: void ALWAYSINLINE jno(Label& L, bool maybe_short = true) { jcc(Assembler::noOverflow, L, maybe_short); } > 961: void ALWAYSINLINE js(Label& L, bool maybe_short = true) { jcc(Assembler::positive, L, maybe_short); } Isn't js -> jump is sign flag is set -> Assembler::negative? Correspondingly jns, js_b, jns_b should also be corrected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576695214 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576714792 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576711681 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576666734 PR Review Comment: https://git.openjdk.org/jdk/pull/18893#discussion_r1576673354 From kvn at openjdk.org Tue Apr 23 18:56:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 18:56:29 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: <8RdAeZIFAJ7-cg9zqN0clPKHtYVKrQIpJP3-XWh3lp4=.c05d42d4-3b73-4a15-bf5a-82b45d46f668@github.com> On Tue, 23 Apr 2024 16:14:38 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant opcode check @JohnTortugo, please look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2073196499 From sgibbons at openjdk.org Tue Apr 23 19:03:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 19:03:59 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v5] In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18893/files - new: https://git.openjdk.org/jdk/pull/18893/files/fe5f3060..f4db9a1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18893&range=03-04 Stats: 8 lines in 3 files changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18893.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18893/head:pull/18893 PR: https://git.openjdk.org/jdk/pull/18893 From sgibbons at openjdk.org Tue Apr 23 19:03:59 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 19:03:59 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v4] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 16:18:42 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Comment indentation @sviswa7 Thanks for the good catches. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2073207547 From smonteith at openjdk.org Tue Apr 23 19:51:31 2024 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 23 Apr 2024 19:51:31 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: On Wed, 20 Mar 2024 03:55:33 GMT, Joshua Zhu wrote: >> Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. >> Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, >> even the use of a floating point may cause the maximum 2048 bits stack occupied. >> Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. >> >> In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 >> >> >> ...... >> 0x0000ffff684cfad8: stp x15, x18, [sp, #80] >> 0x0000ffff684cfadc: sub sp, sp, #0x100 >> 0x0000ffff684cfae0: str z16, [sp] >> 0x0000ffff684cfae4: add x1, x13, #0x10 >> 0x0000ffff684cfae8: mov x0, x16 >> ;; 0xFFFF803F5414 >> 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 >> 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 >> 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfaf8: blr x8 >> 0x0000ffff684cfafc: mov x16, x0 >> 0x0000ffff684cfb00: ldr z16, [sp] >> 0x0000ffff684cfb04: add sp, sp, #0x100 >> 0x0000ffff684cfb08: ptrue p7.b >> 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] >> ...... >> >> >> could be optimized into: >> >> >> ...... >> 0x0000ffff684cfa50: stp x15, x18, [sp, #80] >> 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() >> 0x0000ffff684cfa58: add x1, x13, #0x10 >> 0x0000ffff684cfa5c: mov x0, x16 >> ;; 0xFFFF7FA942A8 >> 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 >> 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 >> 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 >> 0x0000ffff684cfa6c: blr x8 >> 0x0000ffff684cfa70: mov x16, x0 >> 0x0000ffff684cfa74: ldr d16, [sp], #16 >> 0x0000ffff684cfa78: ptrue p7.b >> 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] >> ...... >> >> >> Besides the above benefit, when we know what size of register is live, >> we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. >> >> Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Add more output for easy debugging once the jtreg test case fails Hello - I have no other comments - looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17977#issuecomment-2073301839 From dlong at openjdk.org Tue Apr 23 19:55:30 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Apr 2024 19:55:30 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments I don't think t2-3 does any loads that need to be ordered, so the LoadLoad could be done earlier, anywhere between t2-1 and t2-4. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2073305883 From dlong at openjdk.org Tue Apr 23 19:55:31 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Apr 2024 19:55:31 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> Message-ID: On Tue, 23 Apr 2024 15:32:19 GMT, Andrew Haley wrote: > And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. On the read side, I don't think we read _get_code or _put_code for the fast bytecode path, so that's why there is only one barrier needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2073318233 From kvn at openjdk.org Tue Apr 23 20:21:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 20:21:27 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: <0_kbfe4cREdOXu8XdkBfFxl3TYo0e1RXU1v5a56w6NY=.71933e02-0418-4cf1-9ccb-7c03cec8e714@github.com> Message-ID: On Tue, 23 Apr 2024 03:59:10 GMT, Vladimir Kozlov wrote: >> Thank you, @dean-long, for pointing this. >> >> [JDK-8302736](https://bugs.openjdk.org/browse/JDK-8302736) added 2 ThreadWXEnable in sharedRuntime.cpp >> And later [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) added it to `PcDescCache::add_pc_desc()` >> What a mess ... >> >> We can now safely remove (after testing) all ThreadWXEnable which guards calls to `add_pc_desc()` >> And file separate RFE for runtime to look on `!Thread::current_in_asgct()` >> >> Is this what you are suggesting? I will do it. > > ASGCT code was added by [JDK-8304725](https://github.com/openjdk/jdk/commit/d8af7a6014055295355a1242db6c2872299c6398) before [JDK-8316392](https://bugs.openjdk.org/browse/JDK-8316392) added ThreadWXEnable to PcDescCache::add_pc_desc(). > > As you suggested I will file follow up RFE to remove that code. RFE: https://bugs.openjdk.org/browse/JDK-8331012 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18895#discussion_r1576846664 From sviswanathan at openjdk.org Tue Apr 23 20:22:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 Apr 2024 20:22:30 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v5] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: <3Fe5my9OAHDHhdUaKfM-0jSO6UbNvxu9p7hVCyLJLtc=.e246103e-769e-46fb-a20c-937338f6017f@github.com> On Tue, 23 Apr 2024 19:03:59 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18893#pullrequestreview-2018229472 From sgibbons at openjdk.org Tue Apr 23 20:22:30 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 20:22:30 GMT Subject: RFR: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 [v5] In-Reply-To: References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Tue, 23 Apr 2024 19:03:59 GMT, Scott Gibbons wrote: >> Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18893#issuecomment-2073375198 From mbalao at openjdk.org Tue Apr 23 20:25:28 2024 From: mbalao at openjdk.org (Martin Balao) Date: Tue, 23 Apr 2024 20:25:28 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . The proposed alternative does not look good to us. The `k1` mask has 8-bytes granularity: each bit of the mask represents 64 bits of the `xmm0` register in this case. Thus, it is not possible to avoid an out of bounds read for all scenarios that we intend to cover. We verified this with memory watchpoints ?hit upon read? and looking at the `xmm0` register value after the `xor` operation, for an execution in which the tail had 15 bytes. What follows is a simplified execution that shows the behavior for `k1` masks of 0x1 and 0x2. k1 == 0x1: (gdb) x/2i $pc => 0x7fffe4730bc6: vpxorq (%rdi,%r12,1),%xmm0,%xmm0{%k1} 0x7fffe4730bcd: test $0x8,%r8b (gdb) print/x $xmm0 $21 = { v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 }, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0 } (gdb) print/x $k1 $22 = 0x1 (gdb) x/16xb 0x45f33ddc8 0x45f33ddc8: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x45f33ddd0: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0xbd (gdb) si 0x00007fffe4730bcd in ?? () (gdb) print/x $xmm0 $23 = { v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x8080, 0x8080, 0x8080, 0x8080, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x80808080, 0x80808080, 0x0, 0x0}, v2_int64 = {0x8080808080808080, 0x0}, uint128 = 0x8080808080808080 } A mask of 0x1 permitted the write of the lower 64 bits of the `xmm0` register. This corresponds to the first 8 bytes in memory (little endian). k1 == 0x2: (gdb) x/2i $pc => 0x7fffe4730bc6: vpxorq (%rdi,%r12,1),%xmm0,%xmm0{%k1} 0x7fffe4730bcd: test $0x8,%r8b (gdb) print/x $xmm0 $18 = { v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0 }, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v2_int64 = {0x0, 0x0}, uint128 = 0x0 } (gdb) print/x $k1 $19 = 0x2 (gdb) x/16xb 0x45f33ddc8 0x45f33ddc8: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x45f33ddd0: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0xbd (gdb) si 0x00007fffe4730bcd in ?? () (gdb) print/x $xmm0 $20 = { v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v2_double = {0x0, 0x0}, v16_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0xbd}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x8080, 0x8080, 0x8080, 0xbd80}, v4_int32 = {0x0, 0x0, 0x80808080, 0xbd808080}, v2_int64 = {0x0, 0xbd80808080808080}, uint128 = 0xbd808080808080800000000000000000 } A mask of 0x2 permitted the write of the higher 64 bits of the `xmm0` register. This corresponds to the second 8 bytes in memory (little endian). Note: we are not sure if the microarchitecture reads the first 8 bytes and if this might trigger a segmentation fault ?all we know is that they are not written to the register?, but in any case it is useless for our tail processing purposes because there aren't gaps in the input. Finally, @franferrax found an error in the `VPXORQ` pseudo-code documentation of the [Intel? 64 and IA-32 Architectures Software Developer?s Manual, Vol. 2B 4-527](https://software.intel.com/en-us/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4): the operation when the mask applies should be `DEST[i+63:i] := 0` instead of `DEST[63:0] := 0`. Otherwise, the mask would apply to the lower bits irrespective of its value. This observation applies to both merging and zeroing masks. If there are no further objections, our intention is to integrate the original fix. What do you think, @theRealAph ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2073379174 From coleenp at openjdk.org Tue Apr 23 20:43:31 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 Apr 2024 20:43:31 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 17:19:30 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fei comments Yes, the fast path assumes that put_code and get_code are already set since we're in the fast path. For this patch, t2 (the reader) has the LoadLoad membar between t2-3 and t2-4. I assume it doesn't matter where, but there were a few places where we loaded the pointer to ResolvedFieldEntry that had the LoadLoad membar. The bug was because one was missing. That's why I thought it should be moved to inside of load_field_entry, so that all readers would have the membar. There were also some fast-path jvmti cases where the membar was missing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2073407449 From kvn at openjdk.org Tue Apr 23 21:13:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 21:13:28 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v3] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 14:29:05 GMT, Emanuel Peter wrote: >> Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. >> >> **Example where we get the "bad dominance"** >> >> This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). >> >> The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. >> >> The address is parsed into its components by `VPointer`: >> `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` >> >> We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. >> >> The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) >> >> During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: >> >> ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) >> >> You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. >> >> **Why does this happen?** >> >> Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but als... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > rm lib from test Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18892#pullrequestreview-2018307381 From kvn at openjdk.org Tue Apr 23 21:16:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 21:16:29 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR In-Reply-To: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 19 Apr 2024 13:12:21 GMT, Thomas Stuefe wrote: > We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). > > Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. > > --- > > This patch adds "arena usage" to CompilationEvent. We know see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). > > ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18864#pullrequestreview-2018311795 From kvn at openjdk.org Tue Apr 23 21:31:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 23 Apr 2024 21:31:29 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v3] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 17:22:44 GMT, Jatin Bhateja wrote: >> - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. >> - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. >> - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. >> - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding some comments for clarity. Looks good. It would be interesting to look on splitting intrinsic through Phi so that we can generate vector on each branch. In separate RFE. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18749#pullrequestreview-2018339663 From sgibbons at openjdk.org Tue Apr 23 23:38:33 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 23 Apr 2024 23:38:33 GMT Subject: Integrated: 8330844: Add aliases for conditional jumps and additional instruction forms for x86 In-Reply-To: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> References: <-wAKj3RvMqUO3iphA6bA34ilTcM9LkZACKco20ppkE0=.a5d31aa7-9423-477e-9a90-749018d2a12d@github.com> Message-ID: On Mon, 22 Apr 2024 16:20:39 GMT, Scott Gibbons wrote: > Adding infrastructure for JDK-8320448. Aliasing conditional jump instructions; adding some x86 instructions. This pull request has now been integrated. Changeset: 7a895552 Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/7a895552c8eb9ae19f8d6eb8c35a0393445305fa Stats: 160 lines in 4 files changed: 160 ins; 0 del; 0 mod 8330844: Add aliases for conditional jumps and additional instruction forms for x86 Reviewed-by: kvn, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18893 From sviswanathan at openjdk.org Tue Apr 23 23:58:29 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 23 Apr 2024 23:58:29 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 2184: > 2182: const Register rounds = rax; > 2183: const Register pos = r12; > 2184: const Register tail = r13; Better to use tail = r15 here. It looks to me that using tail as r13 will cause problems on Windows platform. used_addr is set as r13 in generate_counterMode_VectorAESCrypt() (line 398) for Windows platform and is needed at line 2655 so there is a conflict if we overwrite r13 as tail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18849#discussion_r1577032982 From sviswanathan at openjdk.org Wed Apr 24 00:03:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 24 Apr 2024 00:03:28 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: <7hZf_T6BA5UQikGCA_h90Z29En7aduunz0iqEf3_ND0=.7770faa2-f9c1-4f97-b080-c66dade56767@github.com> On Tue, 23 Apr 2024 20:22:59 GMT, Martin Balao wrote: >> We would like to propose a fix for 8330611. >> >> To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. >> >> While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. >> >> A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. >> >> This work is in collaboration with @franferrax . > > The proposed alternative does not look good to us. The `k1` mask has 8-bytes granularity: each bit of the mask represents 64 bits of the `xmm0` register in this case. Thus, it is not possible to avoid an out of bounds read for all scenarios that we intend to cover. We verified this with memory watchpoints ?hit upon read? and looking at the `xmm0` register value after the `xor` operation, for an execution in which the tail had 15 bytes. What follows is a simplified execution that shows the behavior for `k1` masks of 0x1 and 0x2. > > k1 == 0x1: > > (gdb) x/2i $pc > => 0x7fffe4730bc6: vpxorq (%rdi,%r12,1),%xmm0,%xmm0{%k1} > 0x7fffe4730bcd: test $0x8,%r8b > (gdb) print/x $xmm0 > $21 = { > v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_float = {0x0, 0x0, 0x0, 0x0}, > v2_double = {0x0, 0x0}, > v16_int8 = {0x0 }, > v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_int32 = {0x0, 0x0, 0x0, 0x0}, > v2_int64 = {0x0, 0x0}, > uint128 = 0x0 > } > (gdb) print/x $k1 > $22 = 0x1 > (gdb) x/16xb 0x45f33ddc8 > 0x45f33ddc8: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 > 0x45f33ddd0: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0xbd > (gdb) si > 0x00007fffe4730bcd in ?? () > (gdb) print/x $xmm0 > $23 = { > v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_float = {0x0, 0x0, 0x0, 0x0}, > v2_double = {0x0, 0x0}, > v16_int8 = {0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v8_int16 = {0x8080, 0x8080, 0x8080, 0x8080, 0x0, 0x0, 0x0, 0x0}, > v4_int32 = {0x80808080, 0x80808080, 0x0, 0x0}, > v2_int64 = {0x8080808080808080, 0x0}, > uint128 = 0x8080808080808080 > } > > > A mask of 0x1 permitted the write of the lower 64 bits of the `xmm0` register. This corresponds to the first 8 bytes in memory (little endian). > > k1 == 0x2: > > (gdb) x/2i $pc > => 0x7fffe4730bc6: vpxorq (%rdi,%r12,1),%xmm0,%xmm0{%k1} > 0x7fffe4730bcd: test $0x8,%r8b > (gdb) print/x $xmm0 > $18 = { > v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_float = {0x0, 0x0, 0x0, 0x0}, > v2_double = {0x0, 0x0}, > v16_int8 = {0x0 }, > v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_int32 = {0x0, 0x0, 0x0, 0x0}, > v2_int64 = {0x0, 0x0}, > uint128 = 0x0 > } > (gdb) print/x $k1 > $19 = 0x2 > (gdb) x/16xb 0x45f33ddc8 > 0x45f33ddc8: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0x80 > 0x45f33ddd0: 0x80 0x80 0x80 0x80 0x80 0x80 0x80 0xbd > (gdb) si > 0x00007fffe4730bcd in ?? () > (gdb) print/x $xmm0 > $20 = { > v8_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, > v4_float = {0x0, 0x0,... @martinuy You are right that evpxorq has a 8 byte granularity for mask and so cannot be used for tail processing. I have one comment on your original PR, please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2073680623 From kvn at openjdk.org Wed Apr 24 00:06:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 00:06:34 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 07:05:29 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve RC comment for Vladimir Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-2018525767 From mbalao at openjdk.org Wed Apr 24 00:21:40 2024 From: mbalao at openjdk.org (Martin Balao) Date: Wed, 24 Apr 2024 00:21:40 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: References: Message-ID: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . Martin Balao has updated the pull request incrementally with one additional commit since the last revision: Avoid register conflict in Windows. Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18849/files - new: https://git.openjdk.org/jdk/pull/18849/files/455f7062..cbb7bf70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18849&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18849&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18849/head:pull/18849 PR: https://git.openjdk.org/jdk/pull/18849 From mbalao at openjdk.org Wed Apr 24 00:21:40 2024 From: mbalao at openjdk.org (Martin Balao) Date: Wed, 24 Apr 2024 00:21:40 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 23:55:24 GMT, Sandhya Viswanathan wrote: >> Martin Balao has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid register conflict in Windows. >> >> Co-authored-by: Francisco Ferrari Bihurriet >> Co-authored-by: Martin Balao > > src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp line 2184: > >> 2182: const Register rounds = rax; >> 2183: const Register pos = r12; >> 2184: const Register tail = r13; > > Better to use tail = r15 here. It looks to me that using tail as r13 will cause problems on Windows platform. used_addr is set as r13 in generate_counterMode_VectorAESCrypt() (line 398) for Windows platform and is needed at line 2655 so there is a conflict if we overwrite r13 as tail. Good point. I'll change to `r15` then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18849#discussion_r1577043157 From mbalao at openjdk.org Wed Apr 24 00:39:28 2024 From: mbalao at openjdk.org (Martin Balao) Date: Wed, 24 Apr 2024 00:39:28 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> References: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> Message-ID: On Wed, 24 Apr 2024 00:21:40 GMT, Martin Balao wrote: >> We would like to propose a fix for 8330611. >> >> To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. >> >> While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. >> >> A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. >> >> This work is in collaboration with @franferrax . > > Martin Balao has updated the pull request incrementally with one additional commit since the last revision: > > Avoid register conflict in Windows. > > Co-authored-by: Francisco Ferrari Bihurriet > Co-authored-by: Martin Balao We changed to `r15` the register used for the tail, so we avoid conflicts in Windows. Code generated: 0x7fffe4730bb2: test $0x8,%r8b 0x7fffe4730bb6: je 0x7fffe4730bd3 0x7fffe4730bbc: vpextrq $0x0,%xmm0,%r15 0x7fffe4730bc2: xor (%rdi,%r12,1),%r15 0x7fffe4730bc6: mov %r15,(%rsi,%r12,1) 0x7fffe4730bca: vpsrldq $0x8,%xmm0,%xmm0 0x7fffe4730bcf: add $0x8,%r12d ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2073719278 From dlong at openjdk.org Wed Apr 24 01:27:27 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 Apr 2024 01:27:27 GMT Subject: RFR: 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags In-Reply-To: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> References: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Message-ID: On Mon, 22 Apr 2024 20:23:49 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. > > Testing: > - [x] Verified unrecognized option error is reported after removing options. > > Thanks, > Sonia Shouldn't we first ask the question: why aren't they used? Maybe there is a bug, and in fact they should be used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18904#issuecomment-2073804200 From dlong at openjdk.org Wed Apr 24 01:40:27 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 Apr 2024 01:40:27 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: On Fri, 19 Apr 2024 22:31:10 GMT, Joshua Cao wrote: > Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. > > Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR removes the emitting of `StoreStore` barriers in `Parse::do_exits()`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. If the subclass ctor inlines the super ctor, it might be safe to omit the barrier from the super ctor. However, the current approach seem to remove barriers that are needed when the `` is called from the interpter or another compiled method. ------------- Changes requested by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18870#pullrequestreview-2018617323 From jbhateja at openjdk.org Wed Apr 24 02:21:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 Apr 2024 02:21:33 GMT Subject: RFR: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads [v3] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 21:29:07 GMT, Vladimir Kozlov wrote: > Looks good. > > It would be interesting to look on splitting intrinsic through Phi so that we can generate vector on each branch. In separate RFE. Thanks @vnkozlov , vector Intrinsic expect to receive absolute lane and element type in order to associate concrete ideal types with generated vector IR. Splitting intrinsic by pushing it backwards across phi may lead to multiple inline expansions of an intrinsic given that multiple arguments (element type, lane count, mask etc..) may be phi nodes, drawback could be code bloating which may have other side effects, since these Phi node are generated based on profiles. I agree with you, it will interesting to study its effects in a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18749#issuecomment-2073874365 From jbhateja at openjdk.org Wed Apr 24 02:21:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 24 Apr 2024 02:21:34 GMT Subject: Integrated: 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads In-Reply-To: References: Message-ID: On Fri, 12 Apr 2024 03:57:10 GMT, Jatin Bhateja wrote: > - Problem is related to bi-morphic inlining of ms.unsafeBase() in presence of multiple receiver type profiles (OfFloat, OfByte) which results into formation of an abstract type phi node at JVM state convergence points. > - Due to this, for mismatched memory segment access, inline expander is unable to infer the element type of array wrapped within the memory segment and this results into an assertion failure while computing the source lane count. > - For non-masked mismatched memory segment vector read/write accesses, intrinsification can continue with unknown backing storage type and compiler can skip inserting explicit reinterpretation IR after loading from or before storing to backing storage which is mandatory for semantic correctness of big-endian memory layout. > - For vector write access, this may prevent value forwarding, which may result into subsequent redundant vector loads from same address, but preventing intensification failure will offset that cost. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 80b381e9 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/80b381e91bb649e440321a440ce641a54f89dfb4 Stats: 80 lines in 2 files changed: 79 ins; 0 del; 1 mod 8329555: Crash in intrinsifying heap-based MemorySegment Vector store/loads Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18749 From jzhu at openjdk.org Wed Apr 24 03:44:30 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Wed, 24 Apr 2024 03:44:30 GMT Subject: RFR: 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers [v4] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 19:49:12 GMT, Stuart Monteith wrote: >> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add more output for easy debugging once the jtreg test case fails > > Hello - I have no other comments - looks good. Thank you a lot for the reviews! @stooart-mon @fisk @robcasloz ------------- PR Comment: https://git.openjdk.org/jdk/pull/17977#issuecomment-2073964078 From jkarthikeyan at openjdk.org Wed Apr 24 05:25:34 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 24 Apr 2024 05:25:34 GMT Subject: Integrated: 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 01:38:28 GMT, Jasmine Karthikeyan wrote: > This patch fixes an issue with the `TestIfMinMax` IR test where specific random seeds cause the branch probability to be so low that the branch cannot CMove, causing the IR check to fail for min/max reductions. For example, if the first value returned by `Random#nextInt` was `int_min` then the branch will be only taken once, which works out to `1/512 => ~0.002`. This value is smaller than the CMove threshold `0.01`, so it cannot CMove. > I've added sequential values from 1 to 50 and -1 to -50 before inserting random values in the array, so there will be a guaranteed 50 successes and 50 failures for each run. I've also replaced the multiplication by two with an opaque multiplication by one to prevent the randomly generated numbers from becoming larger, like what the test `MinMaxRed_Int` does. > > Thoughts and reviews would be appreciated! This pull request has now been integrated. Changeset: 438e6431 Author: Jasmine Karthikeyan Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/438e64310dc186d05de758103df1ea5504dcf33e Stats: 49 lines in 1 file changed: 29 ins; 0 del; 20 mod 8329531: compiler/c2/irTests/TestIfMinMax.java fails with IRViolationException: There were one or multiple IR rule failures. Reviewed-by: epeter, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/18734 From thartmann at openjdk.org Wed Apr 24 05:28:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Apr 2024 05:28:27 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report [v2] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 05:26:53 GMT, Thomas Stuefe wrote: >> Somewhat trivial change to reduce the chance of tearing the final compilation cost history report. See JBS for details. >> >> --- >> >> The patch: >> - upon end of a compilation, we print the the offending log line and account the cost in the compilation cost history table. For the latter we lock over NMTCompilationCostHistory_lock. The patch swaps these two actions such that we print after pulling the lock. That greatly reduces, albeit not completely removes, the chance of printing log lines into the final report. (I did not want to widen the scope of that lock to include the printout) >> - also moves the locking of NMTCompilationCostHistory_lock up to the start of the reporting function to include printing the report header into the locking > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > print newlines around report Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18866#pullrequestreview-2018855555 From thartmann at openjdk.org Wed Apr 24 05:45:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Apr 2024 05:45:38 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 07:05:29 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve RC comment for Vladimir Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-2018891005 From thartmann at openjdk.org Wed Apr 24 05:45:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Apr 2024 05:45:39 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v11] In-Reply-To: References: Message-ID: On Fri, 23 Feb 2024 07:50:03 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> make it happen in post_loop_opts > > src/hotspot/share/opto/compile.cpp line 933: > >> 931: _failure_reason(nullptr), >> 932: _first_failure_details(nullptr), >> 933: _for_post_loop_igvn(comp_arena(), 8, 0, nullptr), > > Probably need to do this in a separate RFE. Looks like a few data-structures get allocated over `comp_arena` in one constructor and `ResourceArea` in the other. Is this still relevant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1577294041 From jzhu at openjdk.org Wed Apr 24 05:47:36 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Wed, 24 Apr 2024 05:47:36 GMT Subject: Integrated: 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers In-Reply-To: References: Message-ID: On Fri, 23 Feb 2024 08:11:24 GMT, Joshua Zhu wrote: > Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64. > Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits, > even the use of a floating point may cause the maximum 2048 bits stack occupied. > Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub. > > In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2 > > > ...... > 0x0000ffff684cfad8: stp x15, x18, [sp, #80] > 0x0000ffff684cfadc: sub sp, sp, #0x100 > 0x0000ffff684cfae0: str z16, [sp] > 0x0000ffff684cfae4: add x1, x13, #0x10 > 0x0000ffff684cfae8: mov x0, x16 > ;; 0xFFFF803F5414 > 0x0000ffff684cfaec: mov x8, #0x5414 // #21524 > 0x0000ffff684cfaf0: movk x8, #0x803f, lsl #16 > 0x0000ffff684cfaf4: movk x8, #0xffff, lsl #32 > 0x0000ffff684cfaf8: blr x8 > 0x0000ffff684cfafc: mov x16, x0 > 0x0000ffff684cfb00: ldr z16, [sp] > 0x0000ffff684cfb04: add sp, sp, #0x100 > 0x0000ffff684cfb08: ptrue p7.b > 0x0000ffff684cfb0c: ldp x4, x5, [sp, #16] > ...... > > > could be optimized into: > > > ...... > 0x0000ffff684cfa50: stp x15, x18, [sp, #80] > 0x0000ffff684cfa54: str d16, [sp, #-16]! // extra 8 bytes to align 16 bytes in push_fp() > 0x0000ffff684cfa58: add x1, x13, #0x10 > 0x0000ffff684cfa5c: mov x0, x16 > ;; 0xFFFF7FA942A8 > 0x0000ffff684cfa60: mov x8, #0x42a8 // #17064 > 0x0000ffff684cfa64: movk x8, #0x7fa9, lsl #16 > 0x0000ffff684cfa68: movk x8, #0xffff, lsl #32 > 0x0000ffff684cfa6c: blr x8 > 0x0000ffff684cfa70: mov x16, x0 > 0x0000ffff684cfa74: ldr d16, [sp], #16 > 0x0000ffff684cfa78: ptrue p7.b > 0x0000ffff684cfa7c: ldp x4, x5, [sp, #16] > ...... > > > Besides the above benefit, when we know what size of register is live, > we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers. > > Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced. This pull request has now been integrated. Changeset: 5c383860 Author: Joshua Zhu Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/5c3838605d48d7f2db981c5e821c08d84856c53c Stats: 710 lines in 7 files changed: 645 ins; 8 del; 57 mod 8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers Reviewed-by: eosterlund, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/17977 From forax at univ-mlv.fr Wed Apr 24 06:37:42 2024 From: forax at univ-mlv.fr (Remi Forax) Date: Wed, 24 Apr 2024 08:37:42 +0200 (CEST) Subject: Weird performance behavior involving VarHandles Message-ID: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> Hello, i'm trying to build an API on top of the foreign memory API and i've found a performance difference i'm not able to explain. I'm using a guardWithTest to try to provide a simple way to access a VarHandle on a MemoryLayout without having to declare each VarHandle by hand, so instead of private static final StructLayout LAYOUT = MemoryLayout.structLayout( ValueLayout.JAVA_INT.withName("x"), ValueLayout.JAVA_INT.withName("y") ); private static final VarHandle HANDLE_X = LAYOUT.varHandle(MemoryLayout.PathElement.groupElement("x")); private static final VarHandle HANDLE_Y = LAYOUT.varHandle(MemoryLayout.PathElement.groupElement("y")); I want something like private static final MethodHandle MH = guardWithTest( TEST.bindTo("x"), dropArguments(constant(VarHandle.class, HANDLE_X), 0, String.class), guardWithTest( TEST.bindTo("y"), dropArguments(constant(VarHandle.class, HANDLE_Y), 0, String.class), BOOM )); (TEST does an == on the strings and BOOM throws an exception) which if called with "x" returns the VarHandle for "x" and if called with "y" returns the VarHandle for "y". Now if I try to benchmark the performance with JMH, private final MemorySegment segment = Arena.ofAuto().allocate(LAYOUT); @Benchmark public int control() { var x = (int) HANDLE_X.get(segment, 0L); var y = (int) HANDLE_Y.get(segment, 0L); return x + y; } @Benchmark public int gwt2_methodhandle() throws Throwable { var x = (int) ((VarHandle) MH.invokeExact("x")).get(segment, 0L); var y = (int) ((VarHandle) MH.invokeExact("y")).get(segment, 0L); return x + y; } I get Benchmark Mode Cnt Score Error Units ReproducerBenchmarks.control avgt 5 1.250 ? 0.024 ns/op ReproducerBenchmarks.gwt2_methodhandle avgt 5 1.852 ? 0.024 ns/op and I don't understand why there is a difference in performance because for c2, the strings "x" and "y" are constant so the corresponding VarHandles should be constant thus optimized the same way. The full benchmark is available here: https://raw.githubusercontent.com/forax/memory-mapper/master/src/main/java/com/github/forax/memorymapper/bench/ReproducerBenchmarks.java regards, R?mi From epeter at openjdk.org Wed Apr 24 06:46:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 06:46:39 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 15:37:13 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> improve RC comment for Vladimir > > New comment is good now. Thanks! Thanks @vnkozlov @TobiHartmann for the reviews! Thanks @rwestrel for the helpful comments earlier on. Thanks @cl4es @RogerRiggs for bringing up the idea for such an optimization, and cheering me on with it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2074188596 From epeter at openjdk.org Wed Apr 24 06:46:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 06:46:40 GMT Subject: Integrated: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... This pull request has now been integrated. Changeset: 3ccb64c0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3ccb64c0216c72008578b904d0e7e5bba5e11134 Stats: 2652 lines in 9 files changed: 2648 ins; 0 del; 4 mod 8318446: C2: optimize stores into primitive arrays by combining values into larger store Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16245 From chagedorn at openjdk.org Wed Apr 24 07:08:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 24 Apr 2024 07:08:29 GMT Subject: RFR: 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags In-Reply-To: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> References: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Message-ID: On Mon, 22 Apr 2024 20:23:49 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. > > Testing: > - [x] Verified unrecognized option error is reported after removing options. > > Thanks, > Sonia I had a quick look at the git history. The only use of `Tier2CompileThreshold` that has been around since initial load was removed with https://github.com/openjdk/jdk/pull/888. For `Tier2BackEdgeThreshold`, the only use was removed when tiered compilation was introduced with [JDK-6953144](https://bugs.openjdk.org/browse/JDK-6953144). There is, however, a single test that uses `Tier2BackEdgeThreshold` which should also be updated if these flags are going to be removed: https://github.com/openjdk/jdk/blob/20546c1ea064daa8e2faa71142904ea2c62b3311/test/hotspot/jtreg/vmTestbase/jit/t/t105/t105.java#L34 This suggests that it's fine to remove these. But maybe @veresov can comment on this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18904#issuecomment-2074223780 From galder at openjdk.org Wed Apr 24 07:14:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 24 Apr 2024 07:14:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v12] In-Reply-To: References: Message-ID: > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Expanded testing in TestNullArrayClone * Added byte[] and long[] tests. * Verified that the cloned array has the same contents. * Increase number of iterations reach tier 3 threshold. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/595d1e99..f1f6edd0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=10-11 Stats: 83 lines in 1 file changed: 71 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From galder at openjdk.org Wed Apr 24 07:14:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 24 Apr 2024 07:14:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v10] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 13:37:49 GMT, Boris Ulasevich wrote: >> Thank you! >> >>> what sizes would you choose? For the types, I would limit it to primitive types >> >> Yes, checking the primitive types is fine. Let it be int[], long[], and byte[]. For size I would pick an odd. >> >> >> static final int ITER = 2000; // ~ Tier3CompileThreshold >> static final int ARRAY_SIZE = 999; > > If you like, you can inspect the output of the -XX:+PrintLIR option to see if C1 applies arraycopy as expected. Expanded the tests in this class. I verified with `PrintLIR` that the methods are being C1 compiled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1577391774 From tholenstein at openjdk.org Wed Apr 24 07:27:40 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 24 Apr 2024 07:27:40 GMT Subject: RFR: 8330587: IGV: remove ControlFlowTopComponent In-Reply-To: References: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> Message-ID: On Mon, 22 Apr 2024 07:41:00 GMT, Roberto Casta?eda Lozano wrote: >> The control flow window (very right, next to Bytecodes) implemented by ControlFlowTopComponent is no longer used with the availability of the new CFG view. >> >> Therefore ControlFlowTopComponent is removed > > Looks good, thanks for cleaning it up! Thanks @robcasloz and @chhagedorn for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18859#issuecomment-2074255569 From tholenstein at openjdk.org Wed Apr 24 07:27:41 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 24 Apr 2024 07:27:41 GMT Subject: Integrated: 8330587: IGV: remove ControlFlowTopComponent In-Reply-To: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> References: <-gO01SKkzrpPVDfWHkykHlwCqsriTMuOrAEBlq9vdPY=.6322dfc7-7e15-459e-85ff-9be9dd88b231@github.com> Message-ID: <0PX9suzsGhHyZQrMVAgS37X6jG49FtDw34gmRDlbavc=.fa324c93-eecb-4141-8b48-b7ec977676b9@github.com> On Fri, 19 Apr 2024 09:33:21 GMT, Tobias Holenstein wrote: > The control flow window (very right, next to Bytecodes) implemented by ControlFlowTopComponent is no longer used with the availability of the new CFG view. > > Therefore ControlFlowTopComponent is removed This pull request has now been integrated. Changeset: 165ba87e Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/165ba87e5732c382b3e97315e959dd5e32cf2984 Stats: 1236 lines in 16 files changed: 0 ins; 1236 del; 0 mod 8330587: IGV: remove ControlFlowTopComponent Reviewed-by: chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/18859 From bulasevich at openjdk.org Wed Apr 24 08:08:37 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 24 Apr 2024 08:08:37 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v12] In-Reply-To: References: Message-ID: <8byYEt1oliqKzBLbPc58r16lbcBh9QPvBS2biE4t19I=.1597c5bf-c697-486e-8715-c9afade4ff63@github.com> On Wed, 24 Apr 2024 07:14:59 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Expanded testing in TestNullArrayClone > > * Added byte[] and long[] tests. > * Verified that the cloned array has the same contents. > * Increase number of iterations reach tier 3 threshold. LGTM. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17667#issuecomment-2074333748 From galder at openjdk.org Wed Apr 24 08:23:37 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 24 Apr 2024 08:23:37 GMT Subject: Integrated: 8323429: Missing C2 optimization for FP min/max when both inputs are same In-Reply-To: References: Message-ID: On Thu, 11 Apr 2024 12:10:38 GMT, Galder Zamarre?o wrote: > Added C2 identity optimization for min/max calls, whereby if both inputs are the same, either is returned. > > It includes an IR test to verify that the optimization gets applied. The optimization applies not only to floating points, but also long and ints. The test includes tests for all of those. > > `BasicDoubleOpTest.vectorMax_8322090` has also been adjusted to match expectations after implementing the optimization. > > I've run hotspot compiler tests successfully on x86_64. This pull request has now been integrated. Changeset: c439c8c7 Author: Galder Zamarre?o Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/c439c8c73cf07966e3517ecbaf14d79dcbaeabb3 Stats: 189 lines in 5 files changed: 187 ins; 0 del; 2 mod 8323429: Missing C2 optimization for FP min/max when both inputs are same Reviewed-by: roland, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18738 From epeter at openjdk.org Wed Apr 24 08:48:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 08:48:29 GMT Subject: RFR: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop [v3] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 21:10:54 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> rm lib from test > > Good. Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18892#issuecomment-2074418178 From epeter at openjdk.org Wed Apr 24 08:48:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 08:48:31 GMT Subject: Integrated: 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 15:53:42 GMT, Emanuel Peter wrote: > Summary: the address `adr` of the vector we want to align the main-loop for has a `CastLL` after the pre-loop and before the main-loop. When we use this address to adjust the pre-loop limit, we create a use before the `CastLL`, which leads to a "bad dominance" assert. Solution: make sure that all such base addresses `adr` are not just invariant in the main-loop, but also are invariant of/before the pre-loop. > > **Example where we get the "bad dominance"** > > This code shape comes from the attached regression tests (no matter if with Unsafe or MemorySegment). > > The loop is PreMainPost-ed and the main-loop unrolled. `1326 CountedLoop` is the pre-loop, and `1657 CountedLoop` is the main-loop, which contains the `1648 LoadI`. During `SuperWord`, we take this load's address to align the main-loop. > > The address is parsed into its components by `VPointer`: > `VPointer[mem: 1648 LoadI, base: 1, adr: 1669, base[ 1] + offset( 0) + invar( 0) + scale( 4) * iv]` > > We note that this is the access to native memory via Unsafe / MemorySegment, and so there is no array pointer base, and the `base = 1 TOP`. `VPointer` tries still to find a "base" adress `adr` by parsing the very left-most input to the chain of `AddP`s. Here, there is only a single `1711 AddP`, and the left input is `adr = 1669 CastX2P`. The right side of the `AddP` is also parsed, and determined to be `4 * iv`. > > The problematic part: `1669 CastX2P` is "pinned" down below the pre-loop by the `1513CastLL` that is applied to `11 Parm` (= `long offset` parameter in the test). > > ![image](https://github.com/openjdk/jdk/assets/32593061/d5579226-797c-489e-8aa1-0c906ca59755) > > During `SuperWord`, we want to align the main-loop vectors. We do this by adjusting the pre-loop limit `1439 Opaque1`: > > ![image](https://github.com/openjdk/jdk/assets/32593061/23bafa67-1438-4057-88a7-fb72e8b06c5c) > > You can see the dark-green IR nodes, which compute the `new_limit = old_limit + adjustment`, where the adjustment is a modulo `1734 AndI` of the address of the `1738 LoadVector` for which we are aligning. In this computation we are also using the `adr` of our `VPointer`, which depends on the `1513 CastLL` which is pinned below the pre-loop. Thus, we are using a node inside the pre-loop that is pinned after the pre-loop. Hence the "bad dominance" assert. > > **Why does this happen?** > > Usually, the `base` and/or `adr` of a `VPointer` are invariant not just of the main-loop but also the pre-loop. The pinning after pre-loop and befor... This pull request has now been integrated. Changeset: e681e9b4 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e681e9b4d78c57d031e08e11dfa6250d1f52f5f5 Stats: 118 lines in 2 files changed: 117 ins; 0 del; 1 mod 8330819: C2 SuperWord: bad dominance after pre-loop limit adjustment with base that has CastLL after pre-loop Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18892 From epeter at openjdk.org Wed Apr 24 09:02:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 09:02:49 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal [v2] In-Reply-To: References: Message-ID: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into JDK-8330274-invar-sum-equality - IR rules for test only on 64 bit - more tests, more comments, rm trace code - more int/long tests: where offsetPlain moves away - add long tests - verify cfg case - test: handle AlignVector - some int tests - allow LShift for scaling - better comments - ... and 6 more: https://git.openjdk.org/jdk/compare/c9d8cb01...fdfd7ca2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18795/files - new: https://git.openjdk.org/jdk/pull/18795/files/687611a0..fdfd7ca2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=00-01 Stats: 76302 lines in 1200 files changed: 38704 ins; 30576 del; 7022 mod Patch: https://git.openjdk.org/jdk/pull/18795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18795/head:pull/18795 PR: https://git.openjdk.org/jdk/pull/18795 From dfenacci at openjdk.org Wed Apr 24 09:07:48 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 24 Apr 2024 09:07:48 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v2] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/e331b5c5..2c83876d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From thartmann at openjdk.org Wed Apr 24 09:08:43 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Apr 2024 09:08:43 GMT Subject: RFR: 8329331: Intrinsify Unsafe::setMemory [v26] In-Reply-To: References: <5bNiITzJzFEdC6ARozUJBF2NCQaCLdHe_QwKIkcgwfU=.b87cab09-81b8-43f3-bf7a-e2b641881f9c@github.com> Message-ID: On Sat, 20 Apr 2024 22:31:48 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around this change. >> >> Overall, making this an intrinsic improves overall performance of `Unsafe::setMemory` by up to 4x for all buffer sizes. >> >> Tested with tier-1 (and full CI). I've added a table of the before and after numbers for the JMH I ran (`MemorySegmentZeroUnsafe`). >> >> [setMemoryBM.txt](https://github.com/openjdk/jdk/files/14808974/setMemoryBM.txt) > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: > > - Merge branch 'openjdk:master' into setMemory > - Fix UnsafeCopyMemoryMark scope issue > - Long to short jmp; other cleanup > - Review comments > - Address review comments; update copyright years > - Add enter() and leave(); remove Windows-specific register stuff > - Fix memory mark after sync to upstream > - Merge branch 'openjdk:master' into setMemory > - Set memory test (#23) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > > * Remove MUSL_LIBC ifdef > > * Remove MUSL_LIBC ifdef > - Set memory test (#22) > > * Even more review comments > > * Re-write of atomic copy loops > > * Change name of UnsafeCopyMemory{,Mark} to UnsafeMemory{Access,Mark} > > * Only add a memory mark for byte unaligned fill > - ... and 27 more: https://git.openjdk.org/jdk/compare/6d569961...1122b500 This introduced a regression, see [JDK-8331033](https://bugs.openjdk.org/browse/JDK-8331033). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18555#issuecomment-2074459781 From epeter at openjdk.org Wed Apr 24 09:11:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 09:11:33 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal [v2] In-Reply-To: References: Message-ID: <77-emkh68Hx14QXlJC2wMB5SlF7f5YNlB2U7w3HgIcU=.abb5e36d-cfbc-4aa1-a727-144ccf58eeb2@github.com> On Wed, 24 Apr 2024 09:02:49 GMT, Emanuel Peter wrote: >> This is an enhancement for AutoVectorization. >> >> I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). >> >> Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. >> >> **Solution Sketch: "canonicalize" the invar** >> >> - Extract all summands of the `invar`: make a list. >> - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. >> - Bypass `CastLL` and `CastII` >> - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. >> >> - Sort all extracted summands by node idx. >> - Add up all summands in new order. >> >> If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. >> >> **Example** >> >> >> invar1 = b + c + d + a >> invar2 = d + b + a + c >> >> -> equivalent but not identical nodes >> >> Sort, and add up again: >> >> invar1 = a + b + c + d >> invar2 = a + b + c + d >> >> -> now the nodes are identical >> >> **Motivation: MemorySegment with invar** >> >> One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? >> >> This example did not vectorize, even though it should: >> https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 >> >> Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. >> >> Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. >> >> The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. >> >> Why does this happen? After parsing, the graph looks like this: >> ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) >> >> We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. So... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8330274-invar-sum-equality > - IR rules for test only on 64 bit > - more tests, more comments, rm trace code > - more int/long tests: where offsetPlain moves away > - add long tests > - verify cfg case > - test: handle AlignVector > - some int tests > - allow LShift for scaling > - better comments > - ... and 6 more: https://git.openjdk.org/jdk/compare/a5cd6d8c...fdfd7ca2 src/hotspot/share/opto/vectorization.hpp line 726: > 724: NONCOPYABLE(VPointer); > 725: > 726: Node* convI2L(Node* n); TODO: remove ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18795#discussion_r1577557220 From enikitin at openjdk.org Wed Apr 24 09:33:54 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 24 Apr 2024 09:33:54 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main [v2] In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: ccp/TestShiftConvertAndNotification.java: change run mode from main/othervm to main ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18854/files - new: https://git.openjdk.org/jdk/pull/18854/files/9a5d6d90..6f017899 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18854&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18854&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18854.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18854/head:pull/18854 PR: https://git.openjdk.org/jdk/pull/18854 From enikitin at openjdk.org Wed Apr 24 09:33:54 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Wed, 24 Apr 2024 09:33:54 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main [v2] In-Reply-To: <1RdDiF2OtiIazT9XvjQ1-6piyNXawEjEz1owMSOEUnA=.b6176d03-99e2-4d23-a16b-3b47dec91ab3@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> <1RdDiF2OtiIazT9XvjQ1-6piyNXawEjEz1owMSOEUnA=.b6176d03-99e2-4d23-a16b-3b47dec91ab3@github.com> Message-ID: On Mon, 22 Apr 2024 09:23:26 GMT, Christian Hagedorn wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> ccp/TestShiftConvertAndNotification.java: change run mode from main/othervm to main > > test/hotspot/jtreg/compiler/ccp/TestShiftConvertAndNotification.java line 41: > >> 39: * @summary Test CCP notification for value update of AndL through LShiftI and >> 40: * ConvI2L (no flags). >> 41: * @run main/othervm compiler.ccp.TestShiftConvertAndNotification > > Can we just use `main` instead of `main/othervm`? Let's try (adjusted). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18854#discussion_r1577588492 From dfenacci at openjdk.org Wed Apr 24 10:23:57 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 24 Apr 2024 10:23:57 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v3] In-Reply-To: References: Message-ID: <3Ek18rDpzQE69LUJWtFKCKn7JErHWfhNhAkP4mw4T7I=.31f56f25-92aa-45d1-80ed-6c8eb3956f5f@github.com> > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'master' into JDK-8325520 - Update test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Emanuel Peter - JDK-8325520: fix load gather mask avx condition - JDK-8325520: fix tests for small species - JDK-8325520: add store tests - JDK-8325520: fix copyright notices - JDK-8325520: remove trailing whitespaces - JDK-8325520: use IR framework in tests - JDK-8325520: handle same offsets/masks in store identity - ... and 12 more: https://git.openjdk.org/jdk/compare/174d6265...b7e5fe02 ------------- Changes: https://git.openjdk.org/jdk/pull/18347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=02 Stats: 1009 lines in 5 files changed: 1006 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From chagedorn at openjdk.org Wed Apr 24 10:40:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 24 Apr 2024 10:40:38 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: References: Message-ID: <6yGJwtJYdyIbVW18O-TaxffcNlQHi0KxVwgrA7NNllk=.94102491-8fab-4b3b-a988-975ad0cb589b@github.com> On Tue, 16 Apr 2024 02:21:24 GMT, Jasmine Karthikeyan wrote: >> Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. >> >> I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. >> >> Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Rename to Type::equals, changes from code review Looks good, thanks for the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18533#pullrequestreview-2019510029 From epeter at openjdk.org Wed Apr 24 10:50:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 10:50:54 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal [v3] In-Reply-To: References: Message-ID: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'master' into JDK-8330274-invar-sum-equality - Merge branch 'master' into JDK-8330274-invar-sum-equality - IR rules for test only on 64 bit - more tests, more comments, rm trace code - more int/long tests: where offsetPlain moves away - add long tests - verify cfg case - test: handle AlignVector - some int tests - allow LShift for scaling - ... and 7 more: https://git.openjdk.org/jdk/compare/a361d943...50706c5f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18795/files - new: https://git.openjdk.org/jdk/pull/18795/files/fdfd7ca2..50706c5f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=01-02 Stats: 139 lines in 11 files changed: 21 ins; 101 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/18795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18795/head:pull/18795 PR: https://git.openjdk.org/jdk/pull/18795 From shade at openjdk.org Wed Apr 24 14:31:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Apr 2024 14:31:32 GMT Subject: RFR: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit [v12] In-Reply-To: References: Message-ID: <7btJB4N2oN6ALpnodMMqAteDjLaQZi4iMoq2_z6Zw8c=.5109a438-6e95-4103-803d-adc41744e026@github.com> On Fri, 19 Apr 2024 16:56:28 GMT, Joshua Cao wrote: >> The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. >> >> This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. >> >> I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. >> >> I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). >> >> Passes hotspot tier1 locally on a Linux machine. >> >> ### Benchmarks >> >> Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. >> >> Baseline: >> >> Result "org.renaissance.jdk.streams.JmhParMnemonics.run": >> N = 25 >> mean = 3309.611 ?(99.9%) 86.699 ms/op >> >> Histogram, ms/op: >> [3000.000, 3050.000) = 0 >> [3050.000, 3100.000) = 4 >> [3100.000, 3150.000) = 1 >> [3150.000, 3200.000) = 0 >> [3200.000, 3250.000) = 0 >> [3250.000, 3300.000) = 0 >> [3300.000, 3350.000) = 9 >> [3350.000, 3400.000) = 6 >> [3400.000, 3450.000) = 5 >> >> Percentiles, ms/op: >> p(0... > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Add riscv64 to test Our internal testing completes fine as well, including aggressive compiler testing with jcstress, CTW runs and Fuzzers. Looks like we are ready to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18505#issuecomment-2075087569 From epeter at openjdk.org Wed Apr 24 15:03:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 15:03:52 GMT Subject: RFR: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 Message-ID: I just pushed [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446), and @jatin-bhateja just pushed a new test with [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555). Jatin created a test case, where we have an `array pointer`, but its element type is `bottom`, because it merges multiple different arrays. I attached such an example in the regression test: https://github.com/openjdk/jdk/blob/52877b8d5c0c24c256611ceb9cef1d4fb0c40f68/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L1275-L1297 The issue is that I assume that array ptr always have a native type. But they can be bottom type, and the BasicType is then T_ILLEGAL, and trying to get the size in bytes leads to the assert. **Solution:** just check if the BasicTypes are `is_java_primitive(bt)`. ------------- Commit messages: - 8331054 Changes: https://git.openjdk.org/jdk/pull/18935/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18935&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331054 Stats: 68 lines in 2 files changed: 61 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18935/head:pull/18935 PR: https://git.openjdk.org/jdk/pull/18935 From mbalao at openjdk.org Wed Apr 24 15:05:30 2024 From: mbalao at openjdk.org (Martin Balao) Date: Wed, 24 Apr 2024 15:05:30 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int [v2] In-Reply-To: <4sjqJ79xOh4Mt_SxaNxUfKDXRNredCyAFe4OGW8c60w=.6ecf3afb-0060-47aa-9d4c-d33f81eef18a@github.com> References: <4sjqJ79xOh4Mt_SxaNxUfKDXRNredCyAFe4OGW8c60w=.6ecf3afb-0060-47aa-9d4c-d33f81eef18a@github.com> Message-ID: On Fri, 19 Apr 2024 12:59:16 GMT, Roland Westrelin wrote: >> This fixes 3 calls to ABS with a min int argument. I think all of them >> are harmless: >> >> - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The >> check is for a stride of 1 or -1. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the >> computation of `scaled_iters_long`, the stride is passed to `ABS()` >> and then implicitly casted to long. I now cast the stride to long >> before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` >> overflows the int range for all values of `LoopStripMiningIter` >> except 0 or 1. Those values are handled early on in that method. So >> for a min in stride: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> is always true and the method returns early. >> >> - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the >> computation of `short_scaled_iters` also calls `ABS()` with the >> stride as argument. But the result of that computation is only used >> if the test for: >> ``` >> (jlong)scaled_iters != scaled_iters_long >> ``` >> doesn't cause an early return of the method. I reordered statements >> so the `ABS()` calls happens after that test which will cause an early >> return if the stride is min int. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > more Looks good to me. ------------- Marked as reviewed by mbalao (Committer). PR Review: https://git.openjdk.org/jdk/pull/18813#pullrequestreview-2020142576 From epeter at openjdk.org Wed Apr 24 15:15:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 15:15:40 GMT Subject: RFR: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 [v2] In-Reply-To: References: Message-ID: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> > I just pushed [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446), and @jatin-bhateja just pushed a new test with [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555). > > Note: this is definitely due to the MergeStores logic from [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446). > With [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555), Jatin only added a test that triggers the bug. > > Jatin created a test case, where we have an `array pointer`, but its element type is `bottom`, because it merges multiple different arrays. I attached such an example in the regression test: > > https://github.com/openjdk/jdk/blob/52877b8d5c0c24c256611ceb9cef1d4fb0c40f68/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L1275-L1297 > > The issue is that I assume that array ptr always have a native type. But they can be bottom type, and the BasicType is then T_ILLEGAL, and trying to get the size in bytes leads to the assert. > > **Solution:** just check if the BasicTypes are `is_java_primitive(bt)`. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18935/files - new: https://git.openjdk.org/jdk/pull/18935/files/52877b8d..ff75d165 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18935&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18935&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/18935.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18935/head:pull/18935 PR: https://git.openjdk.org/jdk/pull/18935 From thartmann at openjdk.org Wed Apr 24 15:15:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Apr 2024 15:15:40 GMT Subject: RFR: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 [v2] In-Reply-To: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> References: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> Message-ID: <_L6DUAPNe1cRfFe4Rjct3eJyWfsBLcFPBVu-qg21E5g=.012eea2e-e997-4a6d-8fe8-df91e2cf1288@github.com> On Wed, 24 Apr 2024 15:13:13 GMT, Emanuel Peter wrote: >> I just pushed [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446), and @jatin-bhateja just pushed a new test with [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555). >> >> Note: this is definitely due to the MergeStores logic from [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446). >> With [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555), Jatin only added a test that triggers the bug. >> >> Jatin created a test case, where we have an `array pointer`, but its element type is `bottom`, because it merges multiple different arrays. I attached such an example in the regression test: >> >> https://github.com/openjdk/jdk/blob/52877b8d5c0c24c256611ceb9cef1d4fb0c40f68/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L1275-L1297 >> >> The issue is that I assume that array ptr always have a native type. But they can be bottom type, and the BasicType is then T_ILLEGAL, and trying to get the size in bytes leads to the assert. >> >> **Solution:** just check if the BasicTypes are `is_java_primitive(bt)`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18935#pullrequestreview-2020168087 From kvn at openjdk.org Wed Apr 24 15:19:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 15:19:30 GMT Subject: RFR: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 [v2] In-Reply-To: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> References: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> Message-ID: On Wed, 24 Apr 2024 15:15:40 GMT, Emanuel Peter wrote: >> I just pushed [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446), and @jatin-bhateja just pushed a new test with [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555). >> >> Note: this is definitely due to the MergeStores logic from [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446). >> With [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555), Jatin only added a test that triggers the bug. >> >> Jatin created a test case, where we have an `array pointer`, but its element type is `bottom`, because it merges multiple different arrays. I attached such an example in the regression test: >> >> https://github.com/openjdk/jdk/blob/52877b8d5c0c24c256611ceb9cef1d4fb0c40f68/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L1275-L1297 >> >> The issue is that I assume that array ptr always have a native type. But they can be bottom type, and the BasicType is then T_ILLEGAL, and trying to get the size in bytes leads to the assert. >> >> **Solution:** just check if the BasicTypes are `is_java_primitive(bt)`. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18935#pullrequestreview-2020177458 From kvn at openjdk.org Wed Apr 24 15:33:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 15:33:42 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Remove unneeded ThreadWXEnable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18895/files - new: https://git.openjdk.org/jdk/pull/18895/files/88617209..cd1453b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18895&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18895&range=00-01 Stats: 6 lines in 2 files changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18895/head:pull/18895 PR: https://git.openjdk.org/jdk/pull/18895 From kvn at openjdk.org Wed Apr 24 15:33:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 15:33:42 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. I removed two ThreadWXEnable pointed by Dean and changed comment for third. The update passed extensive tier1-8 testing. @dean-long, please look on update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2075221722 PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2075223952 From duke at openjdk.org Wed Apr 24 17:09:35 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 24 Apr 2024 17:09:35 GMT Subject: Integrated: 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit In-Reply-To: References: Message-ID: On Wed, 27 Mar 2024 05:58:34 GMT, Joshua Cao wrote: > The [JSR 133 cookbook](https://gee.cs.oswego.edu/dl/jmm/cookbook.html) has long recommended using a `StoreStore` barrier at the end of constructors that write to final fields. `StoreStore` barriers are much cheaper on arm machines as shown in benchmarks in this issue as well as https://bugs.openjdk.org/browse/JDK-8324186. > > This change does not improve the case for constructors for objects with volatile fields because [MemBarRelease is emitted for volatile stores](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/gc/shared/c2/barrierSetC2.cpp#L211). This is demonstrated in test case `classWithVolatile`, where this patch does not impact the IR. > > I had to modify some code around escape analysis to make sure there are no regressions in eliminating allocations and `StoreStore`'s. The [current handling of StoreStore's in escape analysis](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/escape.cpp#L2590) makes the assumption that the barriers input is a `Proj` to an `Allocate` ([example](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/library_call.cpp#L1553)). This is contrary to the barriers in the end of the constructor where there the barrier directly takes in an `Allocate` without an in between `Proj`. I opted to instead eliminate `StoreStore`s in GVN, exactly how `MemBarRelease` is handled. > > I had to add [checks for StoreStore in macro.cpp](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/src/hotspot/share/opto/macro.cpp#L636), or else we fail some [cases for reducing allocation merges](https://github.com/openjdk/jdk/blob/8fc9097b3720314ef7efaf1f3ac31898c8d6ca19/test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java#L1233-L1256). > > Passes hotspot tier1 locally on a Linux machine. > > ### Benchmarks > > Running Renaissance ParNnemonics on an Amazon Graviton (arm) instance. > > Baseline: > > Result "org.renaissance.jdk.streams.JmhParMnemonics.run": > N = 25 > mean = 3309.611 ?(99.9%) 86.699 ms/op > > Histogram, ms/op: > [3000.000, 3050.000) = 0 > [3050.000, 3100.000) = 4 > [3100.000, 3150.000) = 1 > [3150.000, 3200.000) = 0 > [3200.000, 3250.000) = 0 > [3250.000, 3300.000) = 0 > [3300.000, 3350.000) = 9 > [3350.000, 3400.000) = 6 > [3400.000, 3450.000) = 5 > > Percentiles, ms/op: > p(0.0000) = 3069.910 ms/op > p(50.0000) = 3348.140 ms/op > ... This pull request has now been integrated. Changeset: 1d061707 Author: Joshua Cao Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1d06170758bd76a0ea32e5bb7d4a017e829ae710 Stats: 610 lines in 9 files changed: 605 ins; 0 del; 5 mod 8300148: Consider using a StoreStore barrier instead of Release barrier on ctor exit Reviewed-by: shade, kvn, dlong ------------- PR: https://git.openjdk.org/jdk/pull/18505 From duke at openjdk.org Wed Apr 24 17:14:51 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 24 Apr 2024 17:14:51 GMT Subject: RFR: 8032218: Emit single post-constructor barrier for chain of superclass constructors [v2] In-Reply-To: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> References: <9Gvg3HswFVpqRnVZQ4HLf6WJKcM1_St_iVTC0GKhMgk=.b1081e57-b73e-43ee-ab1b-d8d6b0bf9cc5@github.com> Message-ID: > Opening this PR on top of https://github.com/openjdk/jdk/pull/18505. This PR is only valid if we agree it is sufficient to use `StoreStore` barriers at the end of constructors instead of `Release` barriers. > > Currently on master, [C2 emits a Release barrier for each constructor call](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/parse1.cpp#L1019) in a chain of superclass constructor calls. After https://github.com/openjdk/jdk/pull/18505 is merged, it is the same except that the barrier is a `StoreStore`. It is unnecessary. We only need to emit a single barrier for each object allocation / each pair of `Allocation/InitializeNode`. > > [Macro expansion emits a trailing StoreStore after an InitializeNode](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/macro.cpp#L1610-L1628). This `StoreStore` is sufficient as the post-constructor barrier. From the [InitializeNode definition](https://github.com/openjdk/jdk/blob/32946e1882e9b22c983cbba3c6bda3cc7295946a/src/hotspot/share/opto/memnode.cpp#L3639-L3642): > >> // An InitializeNode collects and isolates object initialization after > // an AllocateNode and before the next possible safepoint. As a > // memory barrier (MemBarNode), it keeps critical stores from drifting > // down past any safepoint or any publication of the allocation. > > All the writes that occur in the constructor will come before the `InitializeNode/StoreStore`. This PR removes the emitting of `StoreStore` barriers in `Parse::do_exits()`. This PR subsumes https://github.com/openjdk/jdk/pull/18505. > > Passes hotspot tier1 locally on x86 linux machine. New tests make sure that there is a single `StoreStore` for chained constructors. Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - 8032218: Emit single post-constructor barrier for chain of superclass constructors - Add riscv64 to test - Merge branch 'master' into storestore - Merge branch 'master' into storestore - Apply suggestions from code review some formatting suggestions from @shipilev Co-authored-by: Aleksey Shipil?v - Guard everything by feature flag - Revert "Statistics for barriers generated/eliminated" This reverts commit 33d23635048afd3a1b40ae91e6fadf577742fa4f. - Make flag product diagnostic and guard string concat storestore by flag - Statistics for barriers generated/eliminated - global flag to turn on storestore barrier emission and membar acquires IR tests - ... and 12 more: https://git.openjdk.org/jdk/compare/235ba9a7...717fe65b ------------- Changes: https://git.openjdk.org/jdk/pull/18870/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18870&range=01 Stats: 967 lines in 13 files changed: 844 ins; 116 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/18870.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18870/head:pull/18870 PR: https://git.openjdk.org/jdk/pull/18870 From iveresov at openjdk.org Wed Apr 24 17:49:28 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 24 Apr 2024 17:49:28 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 16:14:38 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant opcode check src/hotspot/share/opto/escape.cpp line 576: > 574: if ((opc == Op_CmpP || opc == Op_CmpN) && !can_reduce_cmp(n, iff_cmp)) { > 575: NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Can NOT reduce Phi %d on invocation %d. CastPP %d doesn't have simple control.", n->_idx, _invocation, use->_idx);) > 576: NOT_PRODUCT(n->dump(5);) It's a preexisting problem but do you want to call `dump()` unconditionally? Shouldn't it be under `if (TraceReduceAllocationMerges)` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18916#discussion_r1578291680 From kvn at openjdk.org Wed Apr 24 18:14:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 18:14:28 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 17:46:47 GMT, Igor Veresov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove redundant opcode check > > src/hotspot/share/opto/escape.cpp line 576: > >> 574: if ((opc == Op_CmpP || opc == Op_CmpN) && !can_reduce_cmp(n, iff_cmp)) { >> 575: NOT_PRODUCT(if (TraceReduceAllocationMerges) tty->print_cr("Can NOT reduce Phi %d on invocation %d. CastPP %d doesn't have simple control.", n->_idx, _invocation, use->_idx);) >> 576: NOT_PRODUCT(n->dump(5);) > > It's a preexisting problem but do you want to call `dump()` unconditionally? Shouldn't it be under `if (TraceReduceAllocationMerges)` ? Thank oyu, @veresov Yes, it should be under flag. Thank you for spotting it. I will check other places in EA too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18916#discussion_r1578322472 From kvn at openjdk.org Wed Apr 24 18:31:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 18:31:41 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: > In Leyden testing CI we start hitting assert: > > > # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 > # assert(i < _max) failed: oob: i=2, _max=2 > > Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) > V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) > V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) > > > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) > which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. > > The fix is to add missing checks for If, Bool and Cmp nodes. > > The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. > Running our regular tier1-3,stress,xcomp too. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Put nodes dump under flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18916/files - new: https://git.openjdk.org/jdk/pull/18916/files/6c19152f..acbfa8ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18916&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18916&range=01-02 Stats: 12 lines in 1 file changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18916/head:pull/18916 PR: https://git.openjdk.org/jdk/pull/18916 From iveresov at openjdk.org Wed Apr 24 18:31:41 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 24 Apr 2024 18:31:41 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: <8rFAX3S2h34wuE1eBJrXL5-bKkHv0ARk-hao2F1nPNg=.d50328f6-d616-4f32-a214-fd82c6d95f2a@github.com> On Wed, 24 Apr 2024 18:28:54 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag Looks good ------------- Marked as reviewed by iveresov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18916#pullrequestreview-2020632690 From kvn at openjdk.org Wed Apr 24 18:31:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 18:31:42 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v2] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 16:14:38 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove redundant opcode check I found only one other place where `dump()` was not under flag. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075569825 From sviswanathan at openjdk.org Wed Apr 24 18:32:31 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 24 Apr 2024 18:32:31 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> References: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> Message-ID: On Wed, 24 Apr 2024 00:21:40 GMT, Martin Balao wrote: >> We would like to propose a fix for 8330611. >> >> To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. >> >> While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. >> >> A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. >> >> This work is in collaboration with @franferrax . > > Martin Balao has updated the pull request incrementally with one additional commit since the last revision: > > Avoid register conflict in Windows. > > Co-authored-by: Francisco Ferrari Bihurriet > Co-authored-by: Martin Balao Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18849#pullrequestreview-2020636260 From kvn at openjdk.org Wed Apr 24 18:34:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 18:34:29 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable GHA failure is again failing upload results when tests passed: 2024-04-24T16:27:18.1181315Z TEST TOTAL PASS FAIL ERROR 2024-04-24T16:27:18.1182310Z jtreg:test/hotspot/jtreg:tier1_gc 366 366 0 0 2024-04-24T16:28:14.1398632Z ##[error]Failed to CreateArtifact: Failed to make request after 5 attempts: Request timeout: /twirp/github.actions.results.api.v1.ArtifactService/CreateArtifact ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2075578454 From kvn at openjdk.org Wed Apr 24 18:41:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 18:41:30 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 18:31:41 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag Thank you, Igor ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075592035 From kvn at openjdk.org Wed Apr 24 19:06:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 19:06:31 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable Someone from runtime group, please look on changes because they touch runtime code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2075641831 From dlong at openjdk.org Wed Apr 24 19:06:30 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 Apr 2024 19:06:30 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 18:31:41 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18916#pullrequestreview-2020700715 From epeter at openjdk.org Wed Apr 24 19:09:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 19:09:34 GMT Subject: RFR: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 [v2] In-Reply-To: <_L6DUAPNe1cRfFe4Rjct3eJyWfsBLcFPBVu-qg21E5g=.012eea2e-e997-4a6d-8fe8-df91e2cf1288@github.com> References: <_L7t63O53gxI1qeNEuGhQHzmdMajHd7WNQmrTPCyr7Y=.5083f937-fb22-48c7-af4d-dd26cbb80575@github.com> <_L6DUAPNe1cRfFe4Rjct3eJyWfsBLcFPBVu-qg21E5g=.012eea2e-e997-4a6d-8fe8-df91e2cf1288@github.com> Message-ID: On Wed, 24 Apr 2024 15:12:39 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > Looks good to me. Thanks for the reviews @TobiHartmann @vnkozlov ! This is a relatively straight-forward fix. To reduce noise on the stress-job, I'm integrating a little early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18935#issuecomment-2075645909 From epeter at openjdk.org Wed Apr 24 19:09:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 19:09:34 GMT Subject: Integrated: 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 14:53:49 GMT, Emanuel Peter wrote: > I just pushed [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446), and @jatin-bhateja just pushed a new test with [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555). > > Note: this is definitely due to the MergeStores logic from [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446). > With [JDK-8329555](https://bugs.openjdk.org/browse/JDK-8329555), Jatin only added a test that triggers the bug. > > Jatin created a test case, where we have an `array pointer`, but its element type is `bottom`, because it merges multiple different arrays. I attached such an example in the regression test: > > https://github.com/openjdk/jdk/blob/52877b8d5c0c24c256611ceb9cef1d4fb0c40f68/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L1275-L1297 > > The issue is that I assume that array ptr always have a native type. But they can be bottom type, and the BasicType is then T_ILLEGAL, and trying to get the size in bytes leads to the assert. > > **Solution:** just check if the BasicTypes are `is_java_primitive(bt)`. This pull request has now been integrated. Changeset: ea3909ac Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/ea3909acd117cab97c6c0b496f98f9a4a3a22be4 Stats: 68 lines in 2 files changed: 61 ins; 0 del; 7 mod 8331054: C2 MergeStores: assert failed: unexpected basic type after JDK-8318446 and JDK-8329555 Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/18935 From kvn at openjdk.org Wed Apr 24 19:20:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 19:20:28 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 18:31:41 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag Thank you, Dean, for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075666389 From epeter at openjdk.org Wed Apr 24 19:38:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Apr 2024 19:38:49 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal [v4] In-Reply-To: References: Message-ID: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge branch 'master' into JDK-8330274-invar-sum-equality - Merge branch 'master' into JDK-8330274-invar-sum-equality - Merge branch 'master' into JDK-8330274-invar-sum-equality - IR rules for test only on 64 bit - more tests, more comments, rm trace code - more int/long tests: where offsetPlain moves away - add long tests - verify cfg case - test: handle AlignVector - some int tests - ... and 8 more: https://git.openjdk.org/jdk/compare/3ca33f3f...b7e66999 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18795/files - new: https://git.openjdk.org/jdk/pull/18795/files/50706c5f..b7e66999 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18795&range=02-03 Stats: 1528 lines in 69 files changed: 1266 ins; 136 del; 126 mod Patch: https://git.openjdk.org/jdk/pull/18795.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18795/head:pull/18795 PR: https://git.openjdk.org/jdk/pull/18795 From dlong at openjdk.org Wed Apr 24 20:05:31 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 24 Apr 2024 20:05:31 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18895#pullrequestreview-2020839010 From szaldana at openjdk.org Wed Apr 24 20:07:55 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Apr 2024 20:07:55 GMT Subject: RFR: 8327240: Obsolete Tier2CompileThreshold/Tier2BackEdgeThreshold product flags [v2] In-Reply-To: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> References: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Message-ID: <3D-6KPaZVvisIZDOSyxSmtq8a7FIM-WPLnXpdS8VJhg=.8651e3f3-51a5-4f7d-9deb-3a862bf15fd6@github.com> > Hi all, > > This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. > > Testing: > - [x] Verified unrecognized option error is reported after removing options. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Deleting usage of flag in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18904/files - new: https://git.openjdk.org/jdk/pull/18904/files/2eeab209..ed6b0048 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18904&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18904&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18904.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18904/head:pull/18904 PR: https://git.openjdk.org/jdk/pull/18904 From szaldana at openjdk.org Wed Apr 24 20:17:45 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 24 Apr 2024 20:17:45 GMT Subject: RFR: 8327240: Obsolete Tier2CompileThreshold/Tier2BackEdgeThreshold product flags [v3] In-Reply-To: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> References: <86N-93rC4Q2Q1d_YQSARfjQAHNNCEMvCXMq0_fk5A48=.9c621bb8-b724-40fb-afd7-835773a0e942@github.com> Message-ID: <9GcIqBKgoA6aBHea2WAQYfmYxA8V1hPUmGwm8GW3OWk=.7bd916e4-c8a0-4a54-a29c-d7b4b5ac6579@github.com> > Hi all, > > This PR removes the unused options ```Tier2CompileThreshold``` and ```Tier2BackEdgeThreshold```. > > Testing: > - [x] Verified warning is issued as support was removed. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - formatting - Merge master - Adding to obsolete list - Deleting usage of flag in test - 8327240: Remove unused Tier2CompileThreshold/Tier2BackEdgeThreshold product flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18904/files - new: https://git.openjdk.org/jdk/pull/18904/files/ed6b0048..ac2dc109 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18904&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18904&range=01-02 Stats: 9528 lines in 289 files changed: 5822 ins; 2693 del; 1013 mod Patch: https://git.openjdk.org/jdk/pull/18904.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18904/head:pull/18904 PR: https://git.openjdk.org/jdk/pull/18904 From kvn at openjdk.org Wed Apr 24 20:21:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 20:21:31 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable Thank you, Dean ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2075770753 From kvn at openjdk.org Wed Apr 24 20:25:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 20:25:30 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: <5fng1HCx5IVh5lMIZ5tlyJn1O_kP9Ysot8E-b7vkn4Y=.ab99aa6c-6c15-4d00-8b55-82169687f048@github.com> On Wed, 24 Apr 2024 18:31:41 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag GHA failure in linux-x86 (failed compilation of test/langtools/tools/javac/lambda/TargetType69.java test) looks unrelated to my changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075776880 From mbalao at openjdk.org Wed Apr 24 20:26:34 2024 From: mbalao at openjdk.org (Martin Balao) Date: Wed, 24 Apr 2024 20:26:34 GMT Subject: Integrated: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 00:04:41 GMT, Martin Balao wrote: > We would like to propose a fix for 8330611. > > To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. > > While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. > > A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. > > This work is in collaboration with @franferrax . This pull request has now been integrated. Changeset: 8a8d9288 Author: Martin Balao URL: https://git.openjdk.org/jdk/commit/8a8d9288980513db459f7d6b36554b65844951ca Stats: 23 lines in 3 files changed: 18 ins; 1 del; 4 mod 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) Co-authored-by: Francisco Ferrari Bihurriet Co-authored-by: Martin Balao Reviewed-by: aph, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/18849 From cslucas at openjdk.org Wed Apr 24 20:39:28 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 24 Apr 2024 20:39:28 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: <5fng1HCx5IVh5lMIZ5tlyJn1O_kP9Ysot8E-b7vkn4Y=.ab99aa6c-6c15-4d00-8b55-82169687f048@github.com> References: <5fng1HCx5IVh5lMIZ5tlyJn1O_kP9Ysot8E-b7vkn4Y=.ab99aa6c-6c15-4d00-8b55-82169687f048@github.com> Message-ID: <9RfwS6_CBSI8nVDyeipmxYwVyGS6gEWtR_zA8bjXlO0=.da32ef43-e59e-46d8-b247-e5000b61424a@github.com> On Wed, 24 Apr 2024 20:22:46 GMT, Vladimir Kozlov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Put nodes dump under flag > > GHA failure in linux-x86 (failed compilation of test/langtools/tools/javac/lambda/TargetType69.java test) looks unrelated to my changes. LGTM @vnkozlov . Thank you for fixing these! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075797936 From cslucas at openjdk.org Wed Apr 24 20:45:33 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 24 Apr 2024 20:45:33 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 18:31:41 GMT, Vladimir Kozlov wrote: >> In Leyden testing CI we start hitting assert: >> >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 >> # assert(i < _max) failed: oob: i=2, _max=2 >> >> Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) >> V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) >> V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) >> >> >> [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) >> which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. >> >> The fix is to add missing checks for If, Bool and Cmp nodes. >> >> The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. >> Running our regular tier1-3,stress,xcomp too. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Put nodes dump under flag Marked as reviewed by cslucas (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/18916#pullrequestreview-2020900828 From kvn at openjdk.org Wed Apr 24 20:45:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 20:45:34 GMT Subject: RFR: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call [v3] In-Reply-To: <9RfwS6_CBSI8nVDyeipmxYwVyGS6gEWtR_zA8bjXlO0=.da32ef43-e59e-46d8-b247-e5000b61424a@github.com> References: <5fng1HCx5IVh5lMIZ5tlyJn1O_kP9Ysot8E-b7vkn4Y=.ab99aa6c-6c15-4d00-8b55-82169687f048@github.com> <9RfwS6_CBSI8nVDyeipmxYwVyGS6gEWtR_zA8bjXlO0=.da32ef43-e59e-46d8-b247-e5000b61424a@github.com> Message-ID: On Wed, 24 Apr 2024 20:36:26 GMT, Cesar Soares Lucas wrote: >> GHA failure in linux-x86 (failed compilation of test/langtools/tools/javac/lambda/TargetType69.java test) looks unrelated to my changes. > > LGTM @vnkozlov . Thank you for fixing these! Thank you, @JohnTortugo ------------- PR Comment: https://git.openjdk.org/jdk/pull/18916#issuecomment-2075806246 From kvn at openjdk.org Wed Apr 24 20:45:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 24 Apr 2024 20:45:34 GMT Subject: Integrated: 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 14:36:36 GMT, Vladimir Kozlov wrote: > In Leyden testing CI we start hitting assert: > > > # Internal Error (/workspace/open/src/hotspot/share/opto/node.hpp:407), pid=3216007, tid=3216030 > # assert(i < _max) failed: oob: i=2, _max=2 > > Stack: [0x00007f20b011b000,0x00007f20b021b000], sp=0x00007f20b02157e0, free space=1001k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xbf5bb4] Node::in(unsigned int) const [clone .part.0]+0x24 (node.hpp:407) > V [libjvm.so+0xbf69ea] ConnectionGraph::can_reduce_cmp(Node*, Node*) const+0x9a (node.hpp:407) > V [libjvm.so+0xbf74a6] ConnectionGraph::can_reduce_check_users(Node*, unsigned int) const+0x996 (escape.cpp:578) > > > [JDK-8316991](https://bugs.openjdk.org/browse/JDK-8316991) added new code which assumes that we always have sequence : IfNode->Bool->Cmp: [escape.cpp#L569](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L569) > which is not true in some cases. In failed cases there is additional Opaque4 node between If and Bool nodes. > > The fix is to add missing checks for If, Bool and Cmp nodes. > > The fix is verified in Leyden CI testing. The failure happens only with special C2 mode compilation in Leyden. > Running our regular tier1-3,stress,xcomp too. This pull request has now been integrated. Changeset: a44ac026 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/a44ac026c599df629305588e09fbbcff9be2a5c0 Stats: 21 lines in 1 file changed: 12 ins; 1 del; 8 mod 8330853: Add missing checks for ConnectionGraph::can_reduce_cmp() call Reviewed-by: iveresov, dlong, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/18916 From duke at openjdk.org Wed Apr 24 21:17:33 2024 From: duke at openjdk.org (Charles Connell) Date: Wed, 24 Apr 2024 21:17:33 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: References: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> Message-ID: <2DiChgWl3nabnGCtslaIbRC4g6ukkOh5E594HckgIOw=.7c5813e3-5ee6-4128-a5cf-00f66c670048@github.com> On Wed, 24 Apr 2024 00:36:45 GMT, Martin Balao wrote: >> Martin Balao has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid register conflict in Windows. >> >> Co-authored-by: Francisco Ferrari Bihurriet >> Co-authored-by: Martin Balao > > We changed to `r15` the register used for the tail, so we avoid conflicts in Windows. > > Code generated: > > 0x7fffe4730bb2: test $0x8,%r8b > 0x7fffe4730bb6: je 0x7fffe4730bd3 > 0x7fffe4730bbc: vpextrq $0x0,%xmm0,%r15 > 0x7fffe4730bc2: xor (%rdi,%r12,1),%r15 > 0x7fffe4730bc6: mov %r15,(%rsi,%r12,1) > 0x7fffe4730bca: vpsrldq $0x8,%xmm0,%xmm0 > 0x7fffe4730bcf: add $0x8,%r12d > ... Glad to see this merged! @martinuy would you consider backporting this to 21? That would be helpful to my organization because we use LTS versions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2075857500 From maurizio.cimadamore at oracle.com Wed Apr 24 22:28:09 2024 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Wed, 24 Apr 2024 23:28:09 +0100 Subject: Weird performance behavior involving VarHandles In-Reply-To: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> References: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> Message-ID: Cool benchmark/test case! I don't know off-hand where the difference could be coming from - but just curious: did you try accessing in a loop (e.g. to see if checks are hoisted as expected) ? I seem to recall that the lambda forms for guards-with-test are rather complex, as they need to profile the various branches. I wonder if some "leftover" from the profiling code stays there and pollutes the benchmark? Maurizio On 24/04/2024 07:37, Remi Forax wrote: > I get > > Benchmark Mode Cnt Score Error Units > ReproducerBenchmarks.control avgt 5 1.250 ? 0.024 ns/op > ReproducerBenchmarks.gwt2_methodhandle avgt 5 1.852 ? 0.024 ns/op > > and I don't understand why there is a difference in performance because > for c2, the strings "x" and "y" are constant so the corresponding > VarHandles should be constant thus optimized the same way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkarthikeyan at openjdk.org Wed Apr 24 23:09:29 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 24 Apr 2024 23:09:29 GMT Subject: RFR: 8329194: Cleanup Type::cmp definition and usage [v2] In-Reply-To: <6yGJwtJYdyIbVW18O-TaxffcNlQHi0KxVwgrA7NNllk=.94102491-8fab-4b3b-a988-975ad0cb589b@github.com> References: <6yGJwtJYdyIbVW18O-TaxffcNlQHi0KxVwgrA7NNllk=.94102491-8fab-4b3b-a988-975ad0cb589b@github.com> Message-ID: On Wed, 24 Apr 2024 10:37:27 GMT, Christian Hagedorn wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename to Type::equals, changes from code review > > Looks good, thanks for the updates! Thanks for taking another look @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18533#issuecomment-2076007875 From mbalao at openjdk.org Thu Apr 25 03:55:41 2024 From: mbalao at openjdk.org (Martin Balao) Date: Thu, 25 Apr 2024 03:55:41 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: References: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> Message-ID: On Wed, 24 Apr 2024 00:36:45 GMT, Martin Balao wrote: >> Martin Balao has updated the pull request incrementally with one additional commit since the last revision: >> >> Avoid register conflict in Windows. >> >> Co-authored-by: Francisco Ferrari Bihurriet >> Co-authored-by: Martin Balao > > We changed to `r15` the register used for the tail, so we avoid conflicts in Windows. > > Code generated: > > 0x7fffe4730bb2: test $0x8,%r8b > 0x7fffe4730bb6: je 0x7fffe4730bd3 > 0x7fffe4730bbc: vpextrq $0x0,%xmm0,%r15 > 0x7fffe4730bc2: xor (%rdi,%r12,1),%r15 > 0x7fffe4730bc6: mov %r15,(%rsi,%r12,1) > 0x7fffe4730bca: vpsrldq $0x8,%xmm0,%xmm0 > 0x7fffe4730bcf: add $0x8,%r12d > ... > Glad to see this merged! @martinuy would you consider backporting this to 21? That would be helpful to my organization because we use LTS versions. Sure, I can try that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2076298220 From duke at openjdk.org Thu Apr 25 04:38:32 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Apr 2024 04:38:32 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v4] In-Reply-To: <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> <3TlFNBaZuJn-5Iq9Ddw2V-eZSuWXtoMdi-eycSjG0YY=.408453bf-1aa3-4f4d-83d2-02b047c2443b@github.com> Message-ID: On Fri, 19 Apr 2024 17:16:24 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Comment on not allowing macro nodes after we start expanding. Rename > dont_allow_macro_nodes to reset_allow_macro_nodes. Correction: The `Bool` does get processed, but does not get transformed. `BoolNode::Ideal` actually does not canonicalize bools. They are canonicalized in [IfNode::idealize_test](https://github.com/openjdk/jdk/blob/21480a7ae8dce67cf3a844d8caafb0b96c37ac0e/src/hotspot/share/opto/ifnode.cpp#L1836). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18824#issuecomment-2076342027 From duke at openjdk.org Thu Apr 25 04:44:53 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Apr 2024 04:44:53 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v5] In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into shen - Merge branch 'master' into shen - Comment on not allowing macro nodes after we start expanding. Rename dont_allow_macro_nodes to reset_allow_macro_nodes. - Update comment for MinL/MaxL based on renaming of allow_macro_nodes - Rename began_macro_expansion to allow_macro_nodes. Remove shenandoah flag from test. - Merge branch 'master' into shen - 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) ------------- Changes: https://git.openjdk.org/jdk/pull/18824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18824&range=04 Stats: 50 lines in 5 files changed: 45 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18824/head:pull/18824 PR: https://git.openjdk.org/jdk/pull/18824 From thartmann at openjdk.org Thu Apr 25 07:13:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Apr 2024 07:13:34 GMT Subject: RFR: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) [v5] In-Reply-To: References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Thu, 25 Apr 2024 04:44:53 GMT, Joshua Cao wrote: >> The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. >> >> This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). >> >> The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. >> >> Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. >> >> Passing hotspot tier1 locally on Linux machine. > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge branch 'master' into shen > - Merge branch 'master' into shen > - Comment on not allowing macro nodes after we start expanding. Rename > dont_allow_macro_nodes to reset_allow_macro_nodes. > - Update comment for MinL/MaxL based on renaming of allow_macro_nodes > - Rename began_macro_expansion to allow_macro_nodes. Remove shenandoah > flag from test. > - Merge branch 'master' into shen > - 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) Thanks for the additional details and filing [JDK-8331090](https://bugs.openjdk.org/browse/JDK-8331090)! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18824#issuecomment-2076519498 From duke at openjdk.org Thu Apr 25 07:13:35 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Apr 2024 07:13:35 GMT Subject: Integrated: 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) In-Reply-To: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> References: <9rc-dLX_yY3rogoMCTcyJkJlN7RiMqWgeEgdhiSTyMk=.a6935fa1-1ae1-49a4-869d-f3629ab1f609@github.com> Message-ID: On Wed, 17 Apr 2024 18:44:57 GMT, Joshua Cao wrote: > The bug occurs when [Shenandoah optimizations resets post_loop_opts](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp#L52), and we may [create a MaxL after macro expansion](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L198). `MaxL` does not have a matcher rule, and we run into an assertion failure. > > This PR guards the `MaxL` creation with a new `began_macro_expansion()` flag. I think there are many other instances in code that should use the new flag instead of `post_loop_opts()`, which can be explored in [JDK-8330531](https://bugs.openjdk.org/browse/JDK-8330531). > > The bug was originally found in [h2 Index::getCostRangeIndex()](https://github.com/h2database/h2database/blob/master/h2/src/main/org/h2/index/Index.java#L579) through Dacapo. Its easy to reproduce by creating a loop that includes a `ShenandoahLoadReferenceBarrier` (load any object) and a `MaxL`. > > Caveat: I created test cases for both `MaxL` and `MinL` for completeness. The `MinL` test case does not actually fail before this PR. Somehow the `CMove` condition is converted to non-canonical `>`, which is [not accepted by the Idealization](https://github.com/openjdk/jdk/blob/040c93565c0dff6270911eb9e58d78aa01bbb925/src/hotspot/share/opto/movenode.cpp#L219). The `MinL` is never created and there is no crash. > > Passing hotspot tier1 locally on Linux machine. This pull request has now been integrated. Changeset: d32f1092 Author: Joshua Cao Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/d32f10921985031505718ec29fb97a36f9ba24c0 Stats: 50 lines in 5 files changed: 45 ins; 2 del; 3 mod 8329797: Shenandoah: Default case invoked for: "MaxL" (bad AD file) Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18824 From dfenacci at openjdk.org Thu Apr 25 08:01:00 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 25 Apr 2024 08:01:00 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: > # Issue > When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. > > # Causes > On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. > While running GVN loops, we can get to the situation in which there are multiple loads from the same address. If successive loads are deemed to be identical to the first one, they might get folded, which is what happens in the problematic examples of this issue. The check for identity happens in > https://github.com/openjdk/jdk/blob/481c866df87c693a90a1da20e131e5654b084ddd/src/hotspot/share/opto/memnode.cpp#L1253 > This version of `Identity` is defined in `LoadNode` but it is also the one used by the subclasses `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked`. Although this definition of `Identity` is enough for `LoadVector` nodes it is not sufficient for `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` ones, as the value being loaded also depends on the offsets and mask (different offsets and/or masks load completely different values). This is the reason why these nodes get folded even if they shouldn't. > > # Solution > `LoadVectorGather`, `LoadVectorMasked` and `LoadVectorGatherMasked` need their own version of the `Identity` method, which specialize `LoadVector::Identity` by restricting the results to nodes that also have the same offsets and masks. > > The same issue exists for _StoreVector_ nodes (i.e. `StoreVectorScatter`, `StoreVectorMasked` and `StoreVectorScatterMasked`). So, `Identity` has to be redefined there as well. > > Regression tests for all versions of `Load/StoreVectorGather/Masked` have been added too. Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: - JDK-8325520: take advantage of store_Opcode to avoid more checks - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load - JDK-8325520: merge - JDK-8325520: fix missing semicolon - JDK-8325520: address PR comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18347/files - new: https://git.openjdk.org/jdk/pull/18347/files/b7e5fe02..d25bcacf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18347&range=02-03 Stats: 143 lines in 3 files changed: 15 ins; 126 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347 PR: https://git.openjdk.org/jdk/pull/18347 From dfenacci at openjdk.org Thu Apr 25 08:18:30 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 25 Apr 2024 08:18:30 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 10:28:57 GMT, Emanuel Peter wrote: > The load does not succeed in `MemNode::can_see_stored_value`, because the load's `store_Opcode() == Op_StoreVector`, and not `Op_StoreVectorMasked`. But if you were to implement `LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }`, then you have to be careful: A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked. > > But it seems to me that you can actually never have a successful `MemNode::can_see_stored_value` case for masked operations, with your current code. It would always fail the `store_Opcode() == st->Opcode()` check. And for that gives the correct result, but it is still a bit strange that we don't override the `store_Opcode` for the masked/offset vector stores. > > I don't know which way you want to go now. I these options: > > * Keep disallowing masked load/store "look-throughs". > > * Do that by having the "incorrect" `store_Opcode` as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct. > * OR: have the correct `store_Opcode`, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those. > * Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off). > > Not sure if this is all very clear, feel free to ask. Yep it is actually very clear, thanks a lot @eme64! Actually your example is with masks, but there is a similar problem with offsets as well. Think of what happens if the offset array is shorter than the length of the array or if there are duplicate indices in the offset array: the result of a load-store or a store-load sequence doesn?t produce the same original array or vector. So actually to avoid folding in those 2 cases (load->store and store->load) for now I made specific `store_Opcode` implementations that are never the same as a Store opcode, so that the `store_Opcode() == st->Opcode()` and `val->as_Load()->store_Opcode() == Opcode()` is always false (I?ve used the opcode of the `LoadNode` itself but possibly there is a better solution). As you hint one could also think of a solution involving substituting load/store masked vector nodes with ?simple? vector mask nodes and load/store gather/scatter vector nodes with ?simple? vector shuffle nodes (if these exist) but I?m not sure how much this would bring and anyway for this issue I would keep the fix simple. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2076627767 From dfenacci at openjdk.org Thu Apr 25 08:18:31 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 25 Apr 2024 08:18:31 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: <5v86xc40euhW4fEetmIkt_XyP7B4Wdeh7WaMWiarg50=.5c2b42a9-96a0-498d-bdbd-7548e1753eff@github.com> On Mon, 15 Apr 2024 08:52:01 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 1173: >> >>> 1171: if (in_vt != out_vt) { >>> 1172: return nullptr; >>> 1173: } >> >> I see there is a vector type check here. Do we not need that for the code in `StoreNode::Identity`? "Normal" stores like `StoreB` and `StoreI` have the type implicit, but for vector nodes, this type is hidden in the `vect_type()`, so I suspect you need to check it. >> >> I imagine a scenario where we store a float-vector, and then read from the same address as int-vector. Is that ok, or would we need a ReinterpretCast node? I'm not sure, but it would be worth trying to create some tests to check that. > > Can you actually do that: store a float-vector to an int-array? Or is that maybe only possible with Unsafe somehow? Or maybe completely impossible? The API doesn't seem to allow this. I'm not sure of what one could do with Unsafe. Would we have to take potential Unsafe operations into account? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579068650 From dfenacci at openjdk.org Thu Apr 25 08:18:32 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 25 Apr 2024 08:18:32 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:50:15 GMT, Emanuel Peter wrote: >> Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: >> >> - JDK-8325520: take advantage of store_Opcode to avoid more checks >> - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load >> - JDK-8325520: merge >> - JDK-8325520: fix missing semicolon >> - JDK-8325520: address PR comments > > src/hotspot/share/opto/memnode.cpp line 2830: > >> 2828: val->in(MemNode::Memory )->eqv_uncast(mem) && >> 2829: val->as_Load()->store_Opcode() == Opcode()) { >> 2830: // Handle StoreVector with offsets and masks > > Also: the indendation is not right: it should only be 2 spaces from the `if`. Yep, fixed, thanks. > src/hotspot/share/opto/memnode.cpp line 2858: > >> 2856: } else { >> 2857: result = mem; >> 2858: } > > You already have the condition `val->as_Load()->store_Opcode() == Opcode()`. > So once you have `is_StoreVectorScatter()`, I think `val->is_LoadVectorGather()` is implied, no? > > Oh wow. I think actually that this is another bug here: we only have > `virtual int store_Opcode() const { return Op_StoreVector; }` for `LoadVectorNode`, but not for all masked/gather/scatter vector nodes! I think that should be fixed. > > That would also simplify your code. I?ve updated the `store_Opcode` methods. Actually I?ve used them to always avoid being the same as a Store opcode (I?ve used the Load own opcodes (a bit hacky) but that can be changed). See the comment below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579069186 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579068284 From dfenacci at openjdk.org Thu Apr 25 08:18:33 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 25 Apr 2024 08:18:33 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: <4VsAbkEXJ6UucDnbBsnti-FC59KtjkB5nutBU66mNKk=.a6d16ce2-c760-4f48-b7a7-e7d321b6b073@github.com> References: <4VsAbkEXJ6UucDnbBsnti-FC59KtjkB5nutBU66mNKk=.a6d16ce2-c760-4f48-b7a7-e7d321b6b073@github.com> Message-ID: On Mon, 22 Apr 2024 09:27:09 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: >> >> - JDK-8325520: take advantage of store_Opcode to avoid more checks >> - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load >> - JDK-8325520: merge >> - JDK-8325520: fix missing semicolon >> - JDK-8325520: address PR comments > > test/hotspot/jtreg/compiler/vectorapi/VectorGatherMaskFoldingTest.java line 151: > >> 149: >> 150: @Test >> 151: @Warmup(10000) > > Since all tests use the same warmup, I would suggest to set it once via `testFrameworkobject.setDefaultWarmup(10000)`, see https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md ? thanks for the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579069114 From enikitin at openjdk.org Thu Apr 25 08:35:54 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Thu, 25 Apr 2024 08:35:54 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main [v3] In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Fix header ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18854/files - new: https://git.openjdk.org/jdk/pull/18854/files/6f017899..05ef7f12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18854&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18854&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18854.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18854/head:pull/18854 PR: https://git.openjdk.org/jdk/pull/18854 From galder at openjdk.org Thu Apr 25 09:29:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 25 Apr 2024 09:29:59 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: Message-ID: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> > Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. > > The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: > > > $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > Benchmark (size) Mode Cnt Score Error Units > ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op > ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op > ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op > ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op > ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op > ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op > ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op > ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op > ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op > ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op > ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op > ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op > ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op > ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op > ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op > ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op > > > It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. > > I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. > > > $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" > ... > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 > > > One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? > > Thanks @rwestrel for his help shaping this up :) Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: Remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17667/files - new: https://git.openjdk.org/jdk/pull/17667/files/f1f6edd0..9376e9ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17667&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17667.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17667/head:pull/17667 PR: https://git.openjdk.org/jdk/pull/17667 From epeter at openjdk.org Thu Apr 25 12:24:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Apr 2024 12:24:30 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 08:01:00 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: > > - JDK-8325520: take advantage of store_Opcode to avoid more checks > - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load > - JDK-8325520: merge > - JDK-8325520: fix missing semicolon > - JDK-8325520: address PR comments Nice, ah you are right, there can be issues with mask-only cases as well! It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong. I'll look at your `store_Opcode` changes now... ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2077053827 From epeter at openjdk.org Thu Apr 25 12:34:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Apr 2024 12:34:31 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 08:01:00 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: > > - JDK-8325520: take advantage of store_Opcode to avoid more checks > - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load > - JDK-8325520: merge > - JDK-8325520: fix missing semicolon > - JDK-8325520: address PR comments Changes requested by epeter (Reviewer). src/hotspot/share/opto/memnode.cpp line 3533: > 3531: const Node* offsets = stv->in(StoreVectorScatterMaskedNode::Offsets); > 3532: const Node* mask = stv->in(StoreVectorScatterMaskedNode::Mask); > 3533: if (mem->is_StoreVectorScatterMasked()) { This `if` will always be true, since we already check `mem->Opcode() == Opcode()`. The code would be simpler if you extracted the offsets and masks in parallel. src/hotspot/share/opto/vectornode.hpp line 916: > 914: virtual int store_Opcode() const { > 915: // Ensure it is different from any store opcode > 916: return Op_LoadVectorGather; I think you should take `-1`, which is what `MemNode::store_Opcode()` returns. It means "unknown". ------------- PR Review: https://git.openjdk.org/jdk/pull/18347#pullrequestreview-2022351285 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579388706 PR Review Comment: https://git.openjdk.org/jdk/pull/18347#discussion_r1579382617 From epeter at openjdk.org Thu Apr 25 12:42:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Apr 2024 12:42:39 GMT Subject: RFR: 8325520: Vector loads with offsets incorrectly compiled [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 08:01:00 GMT, Damon Fenacci wrote: >> # Issue >> When loading multiple vectors using offsets or masks (e.g. `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0` or `LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)`) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not. >> >> # Causes >> On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. `LoadVectorGather`, `LoadVectorMasked`, `LoadVectorGatherMasked`). Vector loads without mask or offsets are mapped as `LoadVector` nodes instead. >> The same is true for `StoreVector`s. >> When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in `LoadNode::Identity`). Here we call >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258 >> >> where we do an extra check for types if we deal with vectors but we don?t make sure that neither masks nor offsets interfere. >> Similarly, in `StoreNode::Identity` we first check if there is a Load and then a Store: >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> but we don?t make sure that there are no masks or offsets. >> A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent. >> >> # Solution >> To avoid folding `Load`- and `StoreVector`s with masks and offsets we add a specific `store_Opcode` method to `LoadVectorGatherNode`, `LoadVectorMaskedNode` and `LoadVectorGatherMaskedNode` that doesn?t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in `MemNode::can_see_stored_value` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166 >> >> and `StoreNode::Identity` >> >> https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515 >> >> will fail if masks or offsets are used. >> For 2 stores of the same value we instead check for mask and offset equality. >> >> Regression tests for... > > Damon Fenacci has updated the pull request incrementally with five additional commits since the last revision: > > - JDK-8325520: take advantage of store_Opcode to avoid more checks > - JDK-8325520: remove check for offsets/mask equivalence if load->store or store->load > - JDK-8325520: merge > - JDK-8325520: fix missing semicolon > - JDK-8325520: address PR comments About storing a `IntVector` to memory, and then loading as `FloatVector`: You can use a `MemorySegment`: // Wrap an array into a MemorySegment MemorySegment ms = MemorySegment.fromArray(new byte[10_000]); // Create your favourite int vector IntVector intVector = ...; // Store that int vector to the memory segment (internally, it does checkIndex and unsafe store to the byte array) intVector.intoMemorySegment(ms, offset, ByteOrder.nativeOrder()); // Load a float vector from the memory segment (internally, it does checkIndex and unsafe load from the byte array) FloatVector floatVector = FloatVector.fromMemorySegment(ms, offset, ByteOrder.nativeOrder()) I did not test this, but I think something like this should work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18347#issuecomment-2077086009 From jkarthikeyan at openjdk.org Thu Apr 25 13:16:44 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 25 Apr 2024 13:16:44 GMT Subject: Integrated: 8329194: Cleanup Type::cmp definition and usage In-Reply-To: References: Message-ID: On Thu, 28 Mar 2024 15:13:48 GMT, Jasmine Karthikeyan wrote: > Hi all, this patch aims to cleanup `Type::cmp` by changing it from returning a `0` when types are equal and `1` when they are not, to it returning a boolean denoting equality. This makes its usages at various callsites more intuitive. However, as it is passed to the type dictionary as a comparator, a lambda is needed to map the boolean to a comparison value. > > I was also considering changing the name to `Type::equals` as it's not really returning a comparison value anymore, but I felt it would be too similar to `Type::eq`. If this would be preferred though, I can change it. > > Tier 1 testing passes on my machine. Reviews and thoughts would be appreciated! This pull request has now been integrated. Changeset: b9927aa3 Author: Jasmine Karthikeyan Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/b9927aa3a4c77812bfc53b14a6695ec436737661 Stats: 35 lines in 9 files changed: 6 ins; 0 del; 29 mod 8329194: Cleanup Type::cmp definition and usage Reviewed-by: dfenacci, chagedorn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/18533 From jorn.vernee at oracle.com Thu Apr 25 17:56:43 2024 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Thu, 25 Apr 2024 19:56:43 +0200 Subject: Weird performance behavior involving VarHandles In-Reply-To: References: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <6a4da03e-9662-4d76-94a5-8271b117f6f9@oracle.com> I can reproduce this locally: Benchmark?????????????????????????????? Mode? Cnt? Score?? Error? Units ReproducerBenchmarks.control??????????? avgt??? 5? 1.280 ? 0.015? ns/op ReproducerBenchmarks.gwt2_methodhandle? avgt??? 5? 1.690 ? 0.008? ns/op ReproducerBenchmarks.gwt_methodhandle?? avgt??? 5? 1.305 ? 0.038? ns/op Disabling tiered compilation 'fixes' the performance of gwt2: Benchmark?????????????????????????????? Mode? Cnt? Score?? Error? Units ReproducerBenchmarks.control??????????? avgt??? 5? 1.299 ? 0.016? ns/op ReproducerBenchmarks.gwt2_methodhandle? avgt??? 5? 1.312 ? 0.030? ns/op ReproducerBenchmarks.gwt_methodhandle?? avgt??? 5? 1.303 ? 0.034? ns/op In both cases the assembly looks identical though. So, this may just be up to a different code cache layout (or something like that). Jorn On 25/04/2024 00:28, Maurizio Cimadamore wrote: > > Cool benchmark/test case! > > I don't know off-hand where the difference could be coming from - but > just curious: did you try accessing in a loop (e.g. to see if checks > are hoisted as expected) ? > > I seem to recall that the lambda forms for guards-with-test are rather > complex, as they need to profile the various branches. I wonder if > some "leftover" from the profiling code stays there and pollutes the > benchmark? > > Maurizio > > On 24/04/2024 07:37, Remi Forax wrote: >> I get >> >> Benchmark Mode Cnt Score Error Units >> ReproducerBenchmarks.control avgt 5 1.250 ? 0.024 ns/op >> ReproducerBenchmarks.gwt2_methodhandle avgt 5 1.852 ? 0.024 ns/op >> >> and I don't understand why there is a difference in performance because >> for c2, the strings "x" and "y" are constant so the corresponding >> VarHandles should be constant thus optimized the same way. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mli at openjdk.org Thu Apr 25 18:02:43 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 25 Apr 2024 18:02:43 GMT Subject: RFR: 8331150: RISC-V: Fix "bad AD file" bug Message-ID: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> Hi, Can you help to review this bug fix patch? The issue was introduced by [JDK-8318650](https://bugs.openjdk.org/browse/JDK-8318650) Thanks ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/18960/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18960&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331150 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18960/head:pull/18960 PR: https://git.openjdk.org/jdk/pull/18960 From sviswanathan at openjdk.org Thu Apr 25 18:24:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 25 Apr 2024 18:24:39 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 21:51:45 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > bug fix in other ::prefix_rex2 src/hotspot/cpu/x86/assembler_x86.cpp line 648: > 646: } > 647: } else if ((base_enc & 0x7) == 4) { > 648: // rbp | r12 | r20 | r28 Comment should be: // rsp | r12 | r20 | r28 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1579943571 From burban at openjdk.org Thu Apr 25 21:01:02 2024 From: burban at openjdk.org (Bernhard Urban-Forster) Date: Thu, 25 Apr 2024 21:01:02 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 Message-ID: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> x86 bits are fine. ------------- Commit messages: - 8331159: VM build without C2 fails after JDK-8180450 Changes: https://git.openjdk.org/jdk/pull/18962/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18962&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331159 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18962.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18962/head:pull/18962 PR: https://git.openjdk.org/jdk/pull/18962 From jrose at openjdk.org Thu Apr 25 21:23:41 2024 From: jrose at openjdk.org (John R Rose) Date: Thu, 25 Apr 2024 21:23:41 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable Reviewed. Good work getting mutable data out of code space. Let?s keep chipping away at it, so someday the code cache is a cache filled mostly with ? code. Some post-review musings follow? I wrote some of that stuff 1/4 century ago; some of the math-geekery is mine. Those data structures and algorithms could probably use a fresh look at some point. Today I would do it differently than the me back then did it. My fingers itch slightly as I look at that code. As a top level goal, I hope some day soon we will get all the metadata out of code space, both mutable (as in this case) and immutable. By immutable metadata I mean all the oddly encoded non-code sections in the nmethod layout In a world with Leyden, it is best to put immutable metadata in a read-only memory-mapped part of the CDS archive, rather than in either the malloc heap or inside the nmethod itself. There is an interesting question: Why did we mix metadata into nmethod blocks in the first place? Answer: There are two reasons, IIRC. First, putting everything into one block in the code cache, although ugly, minimizes the number of storage allocation transactions associated with that block. If we side-allocate stuff in metaspace or malloc heap, we have more moving parts to worry about. In the early days of HotSpot we were just learning how to write concurrent code, and having a concurrent insert and delete in the code cache that would also correctly insert and delete side data seemed uncomfortably complex. (At least, that is my memory.) Second, back in the day, we didn?t really trust malloc to do jobs like this. Not all implementations of malloc were performant, nor were they all multithread safe. (?Hey there, Solaris!) This also pushed us towards using our own stuff. Nowadays I think we are more willing to reach for malloc. (But malloc still does not fully integrate with HotSpot?s Native Memory Tracking, so that might be an issue.) If I were redesigning this now, I?d rigorously separate three kinds of storage: code, mutable memory (caches or link state), and immutable memory (debug info, PC descs, dependencies, etc.). Those items would be linked by pointers, not put in adjacent memory blocks as today. Over time CDS would learn how to adroitly manage the different components (code, mutable, immutable). As a further investment, I?d replace ad hoc compressed data (which is always hard to maintain) with uniformly compressed data, using Unsigned5 (from Pack200); that compressed format, rather than a better, fancier one, because it decompresses in registers at memory speeds. That is how about half of the immutable streams in HotSpot are already compressed, and there?s no reason I know of(*) to do it another way. For an example of such investment, see https://github.com/rose00/jdk/tree/compress-zeroes which addresses BellSoft?s observation that the compressed debug-info streams (camping out in huge swathes of code cache) could be compressed better. The PC desc mechanism requires random access, which might seem to require fixed-stride arrays (as today) but that is easily addressed by any one of several indexing tactics. Today everybody knows you can do random access into compressed streams of data, with a little extra care. (*Except for things like JAR files, where better compression is desirable, at the cost of slower decompression. Even then, as Pack200 taught us, a first pass with a fast/cheap compressor often synergizes with an optional post-pass of something really nice like zstd or deflate.) ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18895#pullrequestreview-2023615600 From kvn at openjdk.org Thu Apr 25 22:50:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 Apr 2024 22:50:41 GMT Subject: RFR: 8330181: Move PcDesc cache from nmethod header [v2] In-Reply-To: References: Message-ID: On Wed, 24 Apr 2024 15:33:42 GMT, Vladimir Kozlov wrote: >> Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. >> >> Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. >> >> Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. >> >> Tested tier1-4,stress,xcomp and performance. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Remove unneeded ThreadWXEnable Thank you, John, for review and history lesson. Few comments on your comments ;^) > As a top level goal, I hope some day soon we will get all the metadata out of code space, both mutable (as in this case) and immutable. First step for that will be my next PR for [JDK-8331087](https://bugs.openjdk.org/browse/JDK-8331087) "Move read-only nmethod data from CodeCache". They account for 30% space in CodeCache. Next step will be converting Relocation Info data to immutable by moving all encoded pointer to oops, metadata and other sections. (I not started yet) I would keep mutable sections (oops, metadata) together with code for now because `oops_do()` and `metadata_do()` process them together with code. And these section are relatively very small (vs whole nmethod size): relocation = 509520 (6.003523%) constants = 288 (0.003393%) main code = 4957240 (58.409695%) stub code = 286832 (3.379657%) oops = 20824 (0.245363%) metadata = 126944 (1.495744%) > (But malloc still does not fully integrate with HotSpot?s Native Memory Tracking, so that might be an issue.) It is not issue anymore because we are using our wrapper [os:malloc()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/os.cpp#L629) which does NMT. > As a further investment, I?d replace ad hoc compressed data (which is always hard to maintain) with uniformly compressed data, using Unsigned5 (from Pack200) Yes, it should be done to compress 0s in data we are already compressing (ScopesDesc). Compressing all data have an issue because some data (PcDesc) needs random access in big array. We discussed possibility to compress by chunks such arrays to reduce access time. This needs careful investigation. Thanks again for review, @rose00 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18895#issuecomment-2078291565 From kvn at openjdk.org Thu Apr 25 22:50:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 Apr 2024 22:50:42 GMT Subject: Integrated: 8330181: Move PcDesc cache from nmethod header In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 16:54:40 GMT, Vladimir Kozlov wrote: > Currently PcDescCache (32 bytes in 64-bit VM: PcDesc* _pc_descs[4]) is allocated in `nmethod` header. > > Moved PcDescContainer (which includes cache) to C heap similar to ExceptionCache to reduce size of `nmethod` header and to remove WXWrite transition when we update the cache in `PcDescCache::add_pc_desc()`. > > Removed `PcDescSearch` class which was leftover from `CompiledMethod` days. > > Tested tier1-4,stress,xcomp and performance. This pull request has now been integrated. Changeset: b3bcc494 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/b3bcc49491b8f8ad337eb4c06201a5468e5c1159 Stats: 120 lines in 4 files changed: 42 ins; 36 del; 42 mod 8330181: Move PcDesc cache from nmethod header Reviewed-by: dlong, jrose ------------- PR: https://git.openjdk.org/jdk/pull/18895 From sviswanathan at openjdk.org Thu Apr 25 23:56:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 25 Apr 2024 23:56:44 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 21:51:45 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > bug fix in other ::prefix_rex2 src/hotspot/cpu/x86/assembler_x86.cpp line 670: > 668: } else { > 669: // [base + disp] > 670: // !(rbp | r12 | r20 | r28) were handled above comment should be: // (rsp | r12 | r20 | r28) were handled above src/hotspot/cpu/x86/assembler_x86.cpp line 3620: > 3618: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 3619: // swap src/dst to get correct prefix > 3620: int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes, true); Here the last argument to simd_prefix_and_encode shouldn't be true as src is not gpr? src/hotspot/cpu/x86/assembler_x86.cpp line 12897: > 12895: if (adr.index_needs_rex2()) { > 12896: assert(false, "prefix(Register dst, Address adr) does not support handling of an X"); > 12897: } this could be written as: assert(!adr.index_needs_rex2(), "prefix(Register dst, Address adr) does not support handling of an X"); src/hotspot/cpu/x86/assembler_x86.cpp line 13839: > 13837: void Assembler::movsbq(Register dst, Address src) { > 13838: InstructionMark im(this); > 13839: int prefix = get_prefixq(src, dst, true /* page1 */); We are not consistent in the comment is_map1, M0, page1 all refer to the same thing. Also some places there is no comment that the true is for is_map1. src/hotspot/cpu/x86/assembler_x86.cpp line 14479: > 14477: _input_size_in_bits = input_size_in_bits; > 14478: } > 14479: } New line missing at the end of file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1579945658 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1580219101 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1580246261 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1580258716 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1580260295 From fyang at openjdk.org Fri Apr 26 01:46:32 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 Apr 2024 01:46:32 GMT Subject: RFR: 8331150: RISC-V: Fix "bad AD file" bug In-Reply-To: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> References: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> Message-ID: On Thu, 25 Apr 2024 17:55:57 GMT, Hamlin Li wrote: > Hi, > Can you help to review this bug fix patch? > The issue was introduced by [JDK-8318650](https://bugs.openjdk.org/browse/JDK-8318650) > Thanks Looks fine. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18960#pullrequestreview-2023918766 From chagedorn at openjdk.org Fri Apr 26 06:50:59 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Apr 2024 06:50:59 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Message-ID: This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. I went through all the uses of `Opaque4` nodes and did the following: - Could the `Opaque4` node be part of an Initialized Assertion Predicate? - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. - Yes: - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. - Is this `Opaque4` node only used for Template Assertion Predicates? - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). This is another patch split off from the full fix for Assertion Predicates. Thanks, Christian ------------- Commit messages: - Add more comments and asserts - Add more tests - 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode Changes: https://git.openjdk.org/jdk/pull/18951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330386 Stats: 550 lines in 15 files changed: 485 ins; 7 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/18951.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18951/head:pull/18951 PR: https://git.openjdk.org/jdk/pull/18951 From chagedorn at openjdk.org Fri Apr 26 06:51:01 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Apr 2024 06:51:01 GMT Subject: RFR: 8330386: Replace Opaque4Node of Initialized Assertion Predicate with new OpaqueInitializedAssertionPredicateNode In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 13:34:31 GMT, Christian Hagedorn wrote: > This patch replaces the `Opaque4Node` of the `If` for Initialized Assertion Predicates with a new `OpaqueInitializedAsseritonPredicateNode`. This helps to simplify pattern matching for predicate code and to distinguish from the two other uses of `Opaque4` nodes: > 1. Template Assertion Predicate: The goal is to get rid of its `Opaque4Node` as well by using a dedicated `TemplateAssertionPredicateNode` for the `IfNode`. > 2. Non-null-checks with instrinsics and unsafe accesses: This will eventually be the only use left. Once we get there, we should rename the node accordingly to `OpaqueNonNullCheck` or something like that. > > I went through all the uses of `Opaque4` nodes and did the following: > - Could the `Opaque4` node be part of an Initialized Assertion Predicate? > - No: Added an assert that we are not dealing with an Initialized Assertion Predicate. > - Yes: > - Yes **and only** for Initialized Assertion Predicates? Added an assert that we are only expecting an `OpaqueInitializedAsseritonPredicateNode` if appropriate. > - Yes but could also be something else: Added case for `OpaqueInitializedAsseritonPredicateNode` next to the `Opaque4` case. > - Is this `Opaque4` node only used for Template Assertion Predicates? > - Yes: Added assert with call to `assertion_predicate_has_loop_opaque_node()` to check that we find its `OpaqueLoop*Nodes`. > - I've added test cases where I was not sure about whether an `Opaque4` node could be part of a Template, an Initialized Assertion Predicate or a non-null-check. This was a little tricky but I think it was still worth to prevent future bugs (even though most of these special cases are quite rare). > > This is another patch split off from the full fix for Assertion Predicates. > > Thanks, > Christian src/hotspot/share/opto/loopPredicate.cpp line 353: > 351: } > 352: Node* bol = iff->in(1); > 353: assert(!bol->is_OpaqueInitializedAssertionPredicate(), "should not find an Initialized Assertion Predicate"); Initialized Assertion Predicates have halt nodes and not regions. Thus, we bail out on L350. Same argument for `Opaque4` nodes of non-null-checks, thus we assert on L355 that it's only for Template Assertion Predicates. src/hotspot/share/opto/loopTransform.cpp line 1205: > 1203: Node *bol = iff->in(1); > 1204: if (bol->req() < 2) { > 1205: continue; // dead constant test Before: We bailed out for `Opaque4` nodes because they have 3 inputs. But comment suggests that this is only for dead constant tests. I therefore updated this and added assert below for `Opaque4` nodes. src/hotspot/share/opto/loopTransform.cpp line 1208: > 1206: } > 1207: if (!bol->is_Bool()) { > 1208: assert(bol->Opcode() == Op_Conv2B, "predicate check only"); Should have been removed before when we replaced `Opaque1->If` with `ParsePredicate`. src/hotspot/share/opto/loopTransform.cpp line 1209: > 1207: if (!bol->is_Bool()) { > 1208: assert(bol->is_Opaque4() || bol->is_OpaqueInitializedAssertionPredicate(), > 1209: "Opaque node of non-null-check or of Initialized Assertion Predicate"); We need this for Initialized `OpaqueInitializedAssertionPredicate` and `Opaque4` nodes for non-null-checks. This is covered by `testPolicyRangeCheck()`. I did not try to come up with a test for Template Assertion Predicates. I think they cannot be found here since we eagerly remove them once they become useless. However, I'm planning to replace the `Opaque4` nodes for them anyways. src/hotspot/share/opto/loopTransform.cpp line 1992: > 1990: // Ignore Opaque4 from a non-null-check for an intrinsic or unsafe access. This could happen when we maximally > 1991: // unroll a non-main loop with such an If with an Opaque4 node directly above the loop entry. > 1992: assert(!loop_head->is_main_loop(), "Opaque4 node from a non-null check - should not be at main loop"); Covered by `testUnsafeAccess()`. src/hotspot/share/opto/loopopts.cpp line 1988: > 1986: } else { > 1987: assert(b->is_Bool() || b->is_Opaque4() || b->is_OpaqueInitializedAssertionPredicate(), > 1988: "bool, non-null check with Opaque4 node or Initialized Assertion Predicate with its Opaque node"); Added case for Initialized Assertion Predicate. Covered by `testOpaqueInsideIfOutsideLoop()`. src/hotspot/share/opto/loopopts.cpp line 2167: > 2165: // the AllocateArray node and its ValidLengthTest input that could cause > 2166: // split if to break. > 2167: if (use->is_If() || use->is_CMove() || use->is_Opaque4() || Added case for Initialized Assertion Predicate. Covered by `testOpaqueOutsideLoop()`. src/hotspot/share/opto/macro.cpp line 2430: > 2428: assert(n->Opcode() == Op_LoopLimit || > 2429: n->Opcode() == Op_Opaque3 || > 2430: n->is_Opaque4() || `OpaqueInitializedAssertionPredicate` are no macro nodes and are removed after loop opts. So, no changes required in this file. src/hotspot/share/opto/split_if.cpp line 325: > 323: if (bol->outcnt() == 1) { > 324: Node* use = bol->unique_out(); > 325: if (use->is_Opaque4() || use->is_OpaqueInitializedAssertionPredicate()) { Added case for Initialized Assertion Predicate. Covered by `test*CloneDown*()`. src/hotspot/share/opto/split_if.cpp line 355: > 353: Node* u = bol->out(j); > 354: // Uses are either IfNodes, CMoves, Opaque4, or OpaqueInitializedAssertionPredicates > 355: if (u->is_Opaque4() || u->is_OpaqueInitializedAssertionPredicate()) { Added case for Initialized Assertion Predicate. Covered by `test*CloneDown*()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579483749 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579561681 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579508034 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579564368 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579619546 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579613455 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579614823 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579622648 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579617065 PR Review Comment: https://git.openjdk.org/jdk/pull/18951#discussion_r1579617313 From epeter at openjdk.org Fri Apr 26 07:04:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Apr 2024 07:04:40 GMT Subject: RFR: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: <-LNkhYWrGIb3yTDFBA5GSMjSLSI2cQhEeYQkk11OrZY=.c108c65e-2fa9-419f-b7ff-74571415dfe7@github.com> Message-ID: On Tue, 23 Apr 2024 15:45:09 GMT, Roberto Casta?eda Lozano wrote: >> Fair enough, I agree that, even if there was a solution for this specific case, we would probably encounter other cases where GVN would not be powerful enough to detect the equivalence. I still wonder though what could be causing the `CastLL(invar + iv)` vs. `invar + iv` divergence in your example and whether anything could be done to get rid of it. Maybe worth filing a RFE for further investigation. > >> @robcasloz are you intending to review, or was that just a drive-by comment/question? > > I'm happy to review this, just give me a few days. @robcasloz I decided to drop this PR. Rather than making the current `VPointer` more complicated, I want to refactor it, and then allow for extensions like this to be much more integrated into the pattern matching. See: [JDK-8330991](https://bugs.openjdk.org/browse/JDK-8330991) C2 SuperWord: refactor VPointer ------------- PR Comment: https://git.openjdk.org/jdk/pull/18795#issuecomment-2078751516 From epeter at openjdk.org Fri Apr 26 07:04:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Apr 2024 07:04:41 GMT Subject: Withdrawn: 8330274: C2 SuperWord: VPointer invar: same sum with different addition order should be equal In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 12:00:51 GMT, Emanuel Peter wrote: > This is an enhancement for AutoVectorization. > > I want to improve the detection of `invar`s that are equivalent (guaranteed to compute the same value), but don't have the identical node (the computation is in a different order). > > Note: only about 100 lines are real changes, the rest is tests. These are the first tests that check vectorization for MemorySegments. > > **Solution Sketch: "canonicalize" the invar** > > - Extract all summands of the `invar`: make a list. > - Parse through `AddL`, `SubL`, `AddI`, `SubI`, to get summands. > - Bypass `CastLL` and `CastII` > - Recursively treat `ConvI2L`, `LShiftI` and `LShiftL`: i.e. canonicalize their input. > > - Sort all extracted summands by node idx. > - Add up all summands in new order. > > If two `invar`s use the same summands, then we know that after canonicalization the new nodes representing the `invar`s must be the same. > > **Example** > > > invar1 = b + c + d + a > invar2 = d + b + a + c > > -> equivalent but not identical nodes > > Sort, and add up again: > > invar1 = a + b + c + d > invar2 = a + b + c + d > > -> now the nodes are identical > > **Motivation: MemorySegment with invar** > > One might think that this is a big of a special case: why would anybody write indices to an Array or MemorySegment where the invar has a different addition order for its summands? > > This example did not vectorize, even though it should: > https://github.com/openjdk/jdk/blob/78e42d6e311c33548d16c6c74493388d9850238e/test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java#L425-L441 > > Both the `get` and the `set` look like they have the same address, and the address increases by a byte in each iteration. > > Upon inspection, I saw that the `invar` that `VPointer` produces for the two operations are not identical: the order of addition of the `invar`'s summands is different, and thus the `invar` nodes are different. > > The consequence: Only if we can prove that the two `invar` are identical can we know that the addresses are identical, and that there is no aliasing for loop carried dependencies. Since we have different `invar`, we don't know how the two addresses alias, and that prevents vectorization. > > Why does this happen? After parsing, the graph looks like this: > ![image](https://github.com/openjdk/jdk/assets/32593061/f768d0b0-0b2f-48f0-bfdc-61e93e62bb4f) > > We already see that the two addresses are different only by a `CastLL`, with type `long:>=0`. Somehow, that was only deduced for the load, and not the store. > > load_adr = base + memory_segment_offs... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18795 From mbaesken at openjdk.org Fri Apr 26 07:08:33 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 26 Apr 2024 07:08:33 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR In-Reply-To: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 19 Apr 2024 13:12:21 GMT, Thomas Stuefe wrote: > We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). > > Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. > > --- > > This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). > > ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) Please also check COPYRIGHT years, e.g. compileTask.cpp . Could you also check if there is an easy way to enhance some existing test for the JFR Compilation event. always good to have at least some basic testing . ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2078758555 From thartmann at openjdk.org Fri Apr 26 07:29:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 26 Apr 2024 07:29:32 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18962#pullrequestreview-2024306067 From mli at openjdk.org Fri Apr 26 07:53:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 07:53:39 GMT Subject: RFR: 8331150: RISC-V: Fix "bad AD file" bug In-Reply-To: References: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> Message-ID: On Fri, 26 Apr 2024 01:43:38 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this bug fix patch? >> The issue was introduced by [JDK-8318650](https://bugs.openjdk.org/browse/JDK-8318650) >> Thanks > > Looks fine. Thanks! Thanks @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18960#issuecomment-2078825167 From mli at openjdk.org Fri Apr 26 07:53:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 07:53:40 GMT Subject: Integrated: 8331150: RISC-V: Fix "bad AD file" bug In-Reply-To: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> References: <8sGdIrrgF4MZxoHdLbBmQDSRbxchveo7VWWXWOHaF04=.9b4134fb-537f-40ca-915b-04c4aace93d4@github.com> Message-ID: On Thu, 25 Apr 2024 17:55:57 GMT, Hamlin Li wrote: > Hi, > Can you help to review this bug fix patch? > The issue was introduced by [JDK-8318650](https://bugs.openjdk.org/browse/JDK-8318650) > Thanks This pull request has now been integrated. Changeset: 006f090f Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/006f090f98135e0d3b0450c455d545272cfe6a38 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8331150: RISC-V: Fix "bad AD file" bug Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/18960 From mbaesken at openjdk.org Fri Apr 26 08:08:38 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 26 Apr 2024 08:08:38 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR In-Reply-To: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 19 Apr 2024 13:12:21 GMT, Thomas Stuefe wrote: > We have the (opt-in, disabled by default) Might be helpful to add to the Description part of https://bugs.openjdk.org/browse/JDK-8317683 another sentence that describes how to enable the feature. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2078851318 From chagedorn at openjdk.org Fri Apr 26 08:29:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Apr 2024 08:29:47 GMT Subject: RFR: 8305638: Renaming and small clean-ups around predicates [v5] In-Reply-To: References: Message-ID: On Fri, 1 Dec 2023 16:33:07 GMT, Roland Westrelin wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix useful Template Assertion Predicate marking > > Looks reasonable to me. @rwestrel Could you also have another look again at the now much simpler patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16877#issuecomment-2078894205 From rcastanedalo at openjdk.org Fri Apr 26 08:31:56 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Apr 2024 08:31:56 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic Message-ID: This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The main changes are: - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. #### Testing - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! ------------- Commit messages: - Include 'code/vmreg.hpp' in arm, ppc, and s390 - Share ZGC stub spilling logic across GCs Changes: https://git.openjdk.org/jdk/pull/18967/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18967&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330685 Stats: 1958 lines in 26 files changed: 1083 ins; 859 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/18967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18967/head:pull/18967 PR: https://git.openjdk.org/jdk/pull/18967 From eosterlund at openjdk.org Fri Apr 26 09:59:32 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 26 Apr 2024 09:59:32 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 08:12:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! This looks good. Thank you for separating out the code movement. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18967#pullrequestreview-2024638959 From mdoerr at openjdk.org Fri Apr 26 11:21:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 26 Apr 2024 11:21:38 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 08:12:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! Thanks for taking care of all platforms! LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18967#pullrequestreview-2024808634 From thomas.stuefe at gmail.com Fri Apr 26 11:27:22 2024 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 26 Apr 2024 13:27:22 +0200 Subject: Enable compiler memory limits by default? In-Reply-To: References: Message-ID: Thank you, Vladimir. Issue: https://bugs.openjdk.org/browse/JDK-8331185 and proposed PR: https://github.com/openjdk/jdk/pull/18969 I am curious if testing shakes any pre-existing bugs loose. Cheers, Thomas On Mon, Apr 22, 2024 at 7:35?PM Vladimir Kozlov wrote: > Hi Thomas, > > I like option 1). > > I think 1Gb is reasonable starting value - I know that C2 may ease eat > 512Kb of memory. > But before we set exact value we need to test it in all our tiers. > I don't want to create a lot of failures which we will not have time to > fix fast. It should be rare case as you stated. > > We also need to decide how we fix/avoid such failure: > > 1. Recompile with some optimizations off (I assume we can tell which > optimization triggers big memory consumption and safely bailout from > compilation) > 2. Recompile with some inlining off > 3. Mark method not compilable by corresponding compiler > ... > > Please file RFE and PR. We will help with testing. > > Thanks, > Vladimir K > > On 4/12/24 12:30 AM, Thomas St?fe wrote: > > Hi, > > > > Issues like https://bugs.openjdk.org/browse/JDK-8330103 > > show that compiler memory > > consumption can be an issue. > > > > Since https://bugs.openjdk.org/browse/JDK-8318016 > > , we have an optional > > per-compilation memory limit. If we reach that limit, one of two things > > (configurable) happens: we either assert or abort the compilation. > > > > These memory limits build on the compiler memory statistic added with > > https://bugs.openjdk.org/browse/JDK-8317683 > > . Enabling > > memory limits also enables memory statistics. > > > > Some ideas: > > > > 1) We could enable a reasonable memory limit per default for debug > > builds. Preferably combined with the assert option. That way, we run all > > tests on a debug VM with memory limits enabled. If there are > > pathological compilations during testing, we will notice them. > > > > (I don't know if we would notice them today; even if testers let JVMs > > run with outside ulimits, these limits are typically very high to allow > > for the total expected memory consumption of the test JVM). > > > > Such a memory limit could be set at whatever we feel is pathological, > > e.g., several hundred MB. Even set at 1GB, we would hopefully see cases > > like 8318016 in our tests. > > > > 2) If we don't want (1), we could at least enable memory statistics by > > default for debug builds and print it out to hs-err files. > > > > 3) We could also enable memory limits in release builds and bail out of > > the compilations. A small cost is involved, probably negligible: on > > Arena enlargement, we increase several thread local counters. > > Unfortunately, there is a small risk, too, in that bailout paths in C2 > > may be broken, leading to follow-up errors. We fixed them all, I think, > > but there is a remaining risk. OTOH, using up excessive amounts of > > memory is also not optimal. > > > > What do you think? Would this make sense? If (1) makes sense to you, > > what limit would be reasonable? > > > > Cheers, Thomas > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuefe at openjdk.org Fri Apr 26 11:35:41 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 11:35:41 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds Message-ID: See [1] for previous discussions. We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. Examples: This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` --- The patch: 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. 3) Adapted and extended tests I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. Tested: - manually on Mac m1 (debug and release) - GHAs are running - but Oracle will do more testing before this goes in [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html ------------- Commit messages: - adapt tests - fix printout for mem limit - also print limit when printing compilation mem histo - default limit Changes: https://git.openjdk.org/jdk/pull/18969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331185 Stats: 152 lines in 5 files changed: 104 ins; 12 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From rcastanedalo at openjdk.org Fri Apr 26 11:48:34 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Apr 2024 11:48:34 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic In-Reply-To: References: Message-ID: <2QdvGP1E4hVKiPDeQS--zL5g_GuB_HBXmoxx4-IvJn0=.ba1d43de-e6d1-4e88-8f85-b8d70b98d0c3@github.com> On Fri, 26 Apr 2024 08:12:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! Thanks for reviewing, Erik and Martin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18967#issuecomment-2079228472 From stuefe at openjdk.org Fri Apr 26 12:09:41 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:09:41 GMT Subject: RFR: 8330625: Compilation memory statistic: prevent tearing of the final report [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 18:01:15 GMT, Vladimir Kozlov wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> print newlines around report > > Good. > > What kind of testing you do for changes in this code? Thanks @vnkozlov and @TobiHartmann ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18866#issuecomment-2079256833 From stuefe at openjdk.org Fri Apr 26 12:09:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:09:42 GMT Subject: Integrated: 8330625: Compilation memory statistic: prevent tearing of the final report In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 13:50:02 GMT, Thomas Stuefe wrote: > Somewhat trivial change to reduce the chance of tearing the final compilation cost history report. See JBS for details. > > --- > > The patch: > - upon end of a compilation, we print the the offending log line and account the cost in the compilation cost history table. For the latter we lock over NMTCompilationCostHistory_lock. The patch swaps these two actions such that we print after pulling the lock. That greatly reduces, albeit not completely removes, the chance of printing log lines into the final report. (I did not want to widen the scope of that lock to include the printout) > - also moves the locking of NMTCompilationCostHistory_lock up to the start of the reporting function to include printing the report header into the locking This pull request has now been integrated. Changeset: 2b7176a5 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/2b7176a55ad0e5c6ba34abba3fe8fc1a411a5e2d Stats: 41 lines in 1 file changed: 14 ins; 13 del; 14 mod 8330625: Compilation memory statistic: prevent tearing of the final report Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/18866 From stuefe at openjdk.org Fri Apr 26 12:09:59 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:09:59 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive Message-ID: We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). ------------- Commit messages: - start Changes: https://git.openjdk.org/jdk/pull/18883/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18883&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8330813 Stats: 14 lines in 1 file changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18883.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18883/head:pull/18883 PR: https://git.openjdk.org/jdk/pull/18883 From mbaesken at openjdk.org Fri Apr 26 12:12:56 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 26 Apr 2024 12:12:56 GMT Subject: RFR: 8331167: UBSan enabled build fails in adlc on macOS Message-ID: When configuring with '--enable-ubsan' (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and doing a macOS x86_64 fastdebug build, I run into this build error after very short time : jdk/src/hotspot/share/adlc/adlparse.cpp:5228:36: runtime error: applying non-zero offset 1 to null pointer #0 0x103fa4b4b in ADLParser::skipws_common(bool) adlparse.cpp:5228 #1 0x103f76aed in ADLParser::skipws() adlparse.hpp:271 #2 0x103f763c6 in ADLParser::parse() adlparse.cpp:95 #3 0x10407054d in main main.cpp:178 #4 0x7fff2044ef3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c) So it seems that UBSan support is currently not working well on macOS because the build fails early. Seems we add 1 to a nullptr in the adlc code in some cases and UBSAN complains about it. ------------- Commit messages: - JDK-8331167 Changes: https://git.openjdk.org/jdk/pull/18976/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18976&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331167 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18976/head:pull/18976 PR: https://git.openjdk.org/jdk/pull/18976 From stefank at openjdk.org Fri Apr 26 12:32:33 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 26 Apr 2024 12:32:33 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). Looks good to me. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18883#pullrequestreview-2024929723 From stuefe at openjdk.org Fri Apr 26 12:32:34 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:32:34 GMT Subject: RFR: 8331167: UBSan enabled build fails in adlc on macOS In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:06:28 GMT, Matthias Baesken wrote: > When configuring with '--enable-ubsan' (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and doing a macOS x86_64 fastdebug build, I run into this build error after very short time : > jdk/src/hotspot/share/adlc/adlparse.cpp:5228:36: runtime error: applying non-zero offset 1 to null pointer > #0 0x103fa4b4b in ADLParser::skipws_common(bool) adlparse.cpp:5228 > #1 0x103f76aed in ADLParser::skipws() adlparse.hpp:271 > #2 0x103f763c6 in ADLParser::parse() adlparse.cpp:95 > #3 0x10407054d in main main.cpp:178 > #4 0x7fff2044ef3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c) > > So it seems that UBSan support is currently not working well on macOS because the build fails early. Seems we add 1 to a nullptr in the adlc code in some cases and UBSAN complains about it. +1 ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18976#pullrequestreview-2024929595 From stuefe at openjdk.org Fri Apr 26 12:40:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:40:00 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: > We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). > > Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. > > --- > > This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). > > ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Adapt test - merge - JDK-8330677-Add-Per-Compilation-memory-usage-to-JFR ------------- Changes: https://git.openjdk.org/jdk/pull/18864/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18864&range=01 Stats: 30 lines in 8 files changed: 18 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18864/head:pull/18864 PR: https://git.openjdk.org/jdk/pull/18864 From stuefe at openjdk.org Fri Apr 26 12:40:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 12:40:00 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 08:06:14 GMT, Matthias Baesken wrote: >> We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). >> >> Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. >> >> --- >> >> This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). >> >> ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) > >> We have the (opt-in, disabled by default) > > Might be helpful to add to the Description part of https://bugs.openjdk.org/browse/JDK-8317683 another sentence that describes how to enable the feature. Thanks @MBaesken . Please check again to see if I addressed all your issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2079305078 From bkilambi at openjdk.org Fri Apr 26 12:52:15 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 26 Apr 2024 12:52:15 GMT Subject: RFR: 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction [v8] In-Reply-To: References: Message-ID: > Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. > > To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. > > With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. > > [AArch64] > On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. > > This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. > > No effects on other platforms. > > [Performance] > FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). > > ADDLanes > > Benchmark Before After Unit > FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms > > > Final code is as below: > > Before: > ` fadda z17.s, p7/m, z17.s, z16.s > ` > After: > > faddp v17.4s, v21.4s, v21.4s > faddp s18, v17.2s > fadd s18, s18, s19 > > > > > [Test] > Full jtreg passed on AArch64 and x86. > > [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 > [2] https://bugs.openjdk.org/browse/JDK-8275275 > [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge master - Adjust format for the backend rules changed in previous commit - Address some more review comments - Revert to previous indentation - Add comments, revert to requires_strict_order and other minor changes - Naming changes: replace strict/non-strict with more technical terms - Addressed review comments for changes in backend rules and code style - 8320725: C2: Add "requires_strict_order" flag for floating-point add-reduction Floating-point addition is non-associative, that is adding floating-point elements in arbitrary order may get different value. Specially, Vector API does not define the order of reduction intentionally, which allows platforms to generate more efficient codes [1]. So that needs a node to represent non strictly-ordered add-reduction for floating-point type in C2. To avoid introducing new nodes, this patch adds a bool field in `AddReductionVF/D` to distinguish whether they require strict order. It also removes `UnorderedReductionNode` and adds a virtual function `bool requires_strict_order()` in `ReductionNode`. Besides `AddReductionVF/D`, other reduction nodes' `requires_strict_order()` have a fixed value. With this patch, Vector API would always generate non strictly-ordered `AddReductionVF/D' on SVE machines with vector length <= 16B as it is more beneficial to generate non-strictly ordered instructions on such machines compared to strictly ordered ones. [AArch64] On Neon, non strictly-ordered `AddReductionVF/D` cannot be generated. Auto-vectorization has already banned these nodes in JDK-8275275 [2]. This patch adds matching rules for non strictly-ordered `AddReductionVF/D`. No effects on other platforms. [Performance] FloatMaxVector.ADDLanes [3] measures the performance of add reduction for floating-point type. With this patch, it improves ~3x on my SVE machine (128-bit). ADDLanes Benchmark Before After Unit FloatMaxVector.ADDLanes 1789.513 5264.226 ops/ms Final code is as below: ``` Before: fadda z17.s, p7/m, z17.s, z16.s After: faddp v17.4s, v21.4s, v21.4s faddp s18, v17.2s fadd s18, s18, s19 ``` [Test] Full jtreg passed on AArch64 and x86. [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/FloatVector.java#L2529 [2] https://bugs.openjdk.org/browse/JDK-8275275 [3] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L316 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18034/files - new: https://git.openjdk.org/jdk/pull/18034/files/6d25d78f..bdd0fabf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18034&range=06-07 Stats: 552999 lines in 6080 files changed: 81790 ins; 132321 del; 338888 mod Patch: https://git.openjdk.org/jdk/pull/18034.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18034/head:pull/18034 PR: https://git.openjdk.org/jdk/pull/18034 From aph-open at littlepinkcloud.com Fri Apr 26 12:57:55 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 26 Apr 2024 13:57:55 +0100 Subject: Weird performance behavior involving VarHandles In-Reply-To: References: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> Message-ID: <049fa783-817c-46ff-ba74-6f42700b9296@littlepinkcloud.com> On 4/24/24 23:28, Maurizio Cimadamore wrote: > I seem to recall that the lambda forms for guards-with-test are rather complex, as they need to profile the various branches. I wonder if some "leftover" from the profiling code stays there and pollutes the benchmark? It's definitely different inlining. On AArch64 I see ReproducerBenchmarks.control avgt 5 1.438 ? 0.005 ns/op ReproducerBenchmarks.gwt2_methodhandle avgt 5 2.112 ? 0.076 ns/op ReproducerBenchmarks.gwt_methodhandle avgt 5 1.440 ? 0.074 ns/op and the important difference is here, see the "dmb ish" that is pinned: ? 0x0000fffefcb54a70: tbnz w14, #0x1f, #0xfffefcb54cd8 ? ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; B29: # out( B42 B30 ) <- in( B28 ) Freq: 91235.6 ? 0x0000fffefcb54a74: ldr w11, [x11] ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; membar_release ? 0x0000fffefcb54a78: dmb ish ;*synchronization entry ? ; - java.lang.invoke.VarHandle::getMethodHandle at -1 (line 2203) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 59 (line 1001) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54a7c: ldr w14, [x13, #0x18] ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh This is a release fence. It could be from a constructor with a final field. I think it's this: MethodHandle getMethodHandle(int mode) { MethodHandle[] mhTable = methodHandleTable; if (mhTable == null) { mhTable = methodHandleTable = new MethodHandle[AccessMode.COUNT]; } MethodHandle mh = mhTable[mode]; if (mh == null) { mh = mhTable[mode] = getMethodHandleUncached(mode); } return mh; If I had to guess, it's that a constructor here is being scalar replaced, but its fence is remaining, and it prevents code motion, so the fields scope and min are being reloaded rather than hoisted. Even though a release barrier doesn't generate any code on x86 because x86 is TSO, it will still prevent code motion. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph-open at littlepinkcloud.com Fri Apr 26 13:03:44 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 26 Apr 2024 14:03:44 +0100 Subject: Weird performance behavior involving VarHandles In-Reply-To: <049fa783-817c-46ff-ba74-6f42700b9296@littlepinkcloud.com> References: <144453100.12179950.1713940662054.JavaMail.zimbra@univ-eiffel.fr> <049fa783-817c-46ff-ba74-6f42700b9296@littlepinkcloud.com> Message-ID: -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- ....[Hottest Region 1].............................................................................. ....[Hottest Region 1].............................................................................. c2, level 4, org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::control_avgt_jmhStub, version 5, compile id 1291 c2, level 4, org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmhTest::gwt2_methodhandle_avgt_jmhStub, version 5, compile id 1271 ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::contro ;; B20: # out( B21 ) <- in( B19 ) Freq: 0.999985 0x0000fffeb0b528a4: nop 0x0000fffefcb54a14: add x0, x16, #0x94 0x0000fffeb0b528a8: nop 0x0000fffefcb54a18: mov x1, #0x23190000 ; {metadata('jdk/internal/foreign/NativeMemorySegmentImpl')} 0x0000fffeb0b528ac: nop ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} 0x0000fffefcb54a1c: movk x1, #0x6428 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::contro ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmhTe ;; B16: # out( B44 B17 ) <- in( B15 B25 ) Loop( B16-B25 inner ) Freq: 133448 ;; B21: # out( B62 B22 ) <- in( B20 B35 ) Loop( B21-B35 inner ) Freq: 91236.1 ? 0x0000fffeb0b528b0: ldr w12, [x15, #0xc] ? 0x0000fffefcb54a20: ldr w11, [x10, #0xc] ? 0x0000fffeb0b528b4: lsl x10, x12, #3 ;*getfield segment {reexecute=0 rethrow=0 return_oop=0} ? 0x0000fffefcb54a24: lsl x17, x11, #3 ;*getfield segment {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 4 (line 92) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 9 (line 106) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh 11.47% ? 0x0000fffeb0b528b8: ldr w12, [x10, #8] ; implicit exception: dispatches to 0x0000fffeb0b52cdc 7.57% ? 0x0000fffefcb54a28: ldr w13, [x17, #8] ; implicit exception: dispatches to 0x0000fffefcb55058 ? ;; B17: # out( B43 B18 ) <- in( B16 ) Freq: 133448 ? ;; B22: # out( B61 B23 ) <- in( B21 ) Freq: 91236 ? 0x0000fffeb0b528bc: cmp w12, w17 ? 0x0000fffefcb54a2c: cmp w13, w1 ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} 0.48% ? 0x0000fffeb0b528c0: b.ne #0xfffeb0b52ca8 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ;; B18: # out( B30 B19 ) <- in( B17 ) Freq: 133448 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? 0x0000fffeb0b528c4: mov x13, x10 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 1 (line 290) ? 0x0000fffefcb54a30: b.ne #0xfffefcb55024 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971c8c00::invokeStatic at 15 ? ;; B23: # out( B41 B24 ) <- in( B22 ) Freq: 91236 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cd800::invoke at 26 ? 0x0000fffefcb54a34: mov x13, x17 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cf400::guardWithCatch at 42 ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 1 (line 290) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 18 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231c8c00::invokeStatic at 15 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cd800::invoke at 26 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cf400::guardWithCatch at 42 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 18 ? 0x0000fffeb0b528c8: ldr w14, [x13, #0x18] ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? 0x0000fffefcb54a38: ldr x11, [x13, #0x20] ;*getfield min {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetOffset at 1 (line 82) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 1 (line 386) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? 0x0000fffeb0b528cc: ldr x12, [x13, #0x20] ;*getfield min {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231c8c00::invokeStatic at 15 ? ; - jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetOffset at 1 (line 82) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cd800::invoke at 26 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 1 (line 386) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cf400::guardWithCatch at 42 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 18 ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971c8c00::invokeStatic at 15 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cd800::invoke at 26 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cf400::guardWithCatch at 42 ? 0x0000fffefcb54a3c: ldr w15, [x13, #0x18] ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 18 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? 0x0000fffeb0b528d0: lsl x16, x14, #3 ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? 0x0000fffefcb54a40: and x18, x11, #3 ;*land {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 14 (line 386) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231c8c00::invokeStatic at 15 ? 0x0000fffeb0b528d4: and x14, x12, #3 ;*land {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cd800::invoke at 26 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 14 (line 386) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cf400::guardWithCatch at 42 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 18 ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971c8c00::invokeStatic at 15 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cd800::invoke at 26 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cf400::guardWithCatch at 42 7.39% ? 0x0000fffefcb54a44: ldr x14, [x13, #0x10] ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 18 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont 0.59% ? 0x0000fffefcb54a48: lsl x15, x15, #3 ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} 10.46% ? 0x0000fffeb0b528d8: ldr x0, [x13, #0x10] ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? 0x0000fffeb0b528dc: cbnz x14, #0xfffeb0b52a3c;*ifne {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 17 (line 386) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971c8c00::invokeStatic at 15 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cd800::invoke at 26 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971cf400::guardWithCatch at 42 ? 0x0000fffefcb54a4c: cbnz x18, #0xfffefcb54c1c;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 18 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ;; B24: # out( B37 B25 ) <- in( B23 ) Freq: 91236 ? ;; B19: # out( B27 B20 ) <- in( B18 ) Freq: 133447 5.99% ? 0x0000fffefcb54a50: sub x14, x14, #3 ;*ladd {reexecute=0 rethrow=0 return_oop=0} 11.09% ? 0x0000fffeb0b528e0: sub x14, x0, #3 ;*ladd {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 14 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 14 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? 0x0000fffefcb54a54: tbnz x14, #0x3f, #0xfffefcb54b44 ? 0x0000fffeb0b528e4: tbnz x14, #0x3f, #0xfffeb0b5299c ? ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0} ? ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 16 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 16 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ;; B25: # out( B38 B26 ) <- in( B24 ) Freq: 91235.9 ? ;; B20: # out( B28 B21 ) <- in( B19 ) Freq: 133447 8.83% ? 0x0000fffefcb54a58: cmp x14, #0 10.83% ? 0x0000fffeb0b528e8: cmp x14, #0 ? 0x0000fffefcb54a5c: b.ls #0xfffefcb54b78 ? 0x0000fffeb0b528ec: b.ls #0xfffeb0b529d0 ? ;; B26: # out( B63 B27 ) <- in( B25 ) Freq: 91235.8 ? ;; B21: # out( B45 B22 ) <- in( B20 ) Freq: 133447 6.48% ? 0x0000fffefcb54a60: ldr w18, [x15, #0x14] ; implicit exception: dispatches to 0x0000fffefcb5508c 9.86% ? 0x0000fffeb0b528f0: ldr w10, [x16, #0x14] ; implicit exception: dispatches to 0x0000fffeb0b52d10 ? ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ;; B22: # out( B31 B23 ) <- in( B21 ) Freq: 133447 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? 0x0000fffeb0b528f4: lsl x10, x10, #3 ;*getfield owner {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 1 (line 192) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ;; B27: # out( B43 B28 ) <- in( B26 ) Freq: 91235.7 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? 0x0000fffefcb54a64: lsl x14, x18, #3 ;*getfield owner {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 1 (line 192) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 0.30% ? 0x0000fffeb0b528f8: mov x0, x12 ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? 0x0000fffefcb54a68: cbnz x14, #0xfffefcb54ca0;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ;; B28: # out( B44 B29 ) <- in( B27 ) Freq: 91235.6 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) 7.62% ? 0x0000fffefcb54a6c: ldr w14, [x15, #0xc] ;*getfield state {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 22 (line 195) ? 0x0000fffeb0b528fc: cbnz x10, #0xfffeb0b52a7c;*ifnull {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 4 (line 192) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? 0x0000fffefcb54a70: tbnz w14, #0x1f, #0xfffefcb54cd8 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ;; B23: # out( B32 B24 ) <- in( B22 ) Freq: 133447 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) 10.48% ? 0x0000fffeb0b52900: ldr w10, [x16, #0xc] ;*getfield state {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 22 (line 195) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ;; B29: # out( B42 B30 ) <- in( B28 ) Freq: 91235.6 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? 0x0000fffefcb54a74: ldr w11, [x11] ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? 0x0000fffeb0b52904: tbnz w10, #0x1f, #0xfffeb0b52ab4 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ;*ifge {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 13 (line 106) ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 25 (line 195) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ;; membar_release ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? 0x0000fffefcb54a78: dmb ish ;*synchronization entry ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.VarHandle::getMethodHandle at -1 (line 2203) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 59 (line 1001) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? 0x0000fffefcb54a7c: ldr w14, [x13, #0x18] ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? ;; B24: # out( B29 B25 ) <- in( B23 ) Freq: 133447 ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) 10.31% ? 0x0000fffeb0b52908: ldr w10, [x0] ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576) ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000971d1c00::invokeStatic at 14 ? 0x0000fffefcb54a80: ldr x15, [x13, #0x20] ;*getfield min {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000971d3800::invoke at 53 ? ; - jdk.internal.foreign.NativeMemorySegmentImpl::unsafeGetOffset at 1 (line 82) ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 1 (line 386) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::control at 8 (line 92) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ? 0x0000fffeb0b5290c: cmp x14, #4 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231c8c00::invokeStatic at 15 0.36% ? 0x0000fffeb0b52910: b.ls #0xfffeb0b52a04 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cd800::invoke at 26 ? ;; B25: # out( B16 B26 ) <- in( B24 ) Freq: 133447 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cf400::guardWithCatch at 42 ? 0x0000fffeb0b52914: ldarb w13, [x1] ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 18 ? 0x0000fffeb0b52918: ldr w12, [x0, #4] ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? 0x0000fffeb0b5291c: add w12, w12, w10 ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::cont ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; membar_acquire (elided) ? 0x0000fffefcb54a84: lsl x2, x14, #3 ;*getfield scope {reexecute=0 rethrow=0 return_oop=0} ? 0x0000fffeb0b52920: ldr x12, [x28, #0x498] ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::sessionImpl at 1 (line 430) ? 0x0000fffeb0b52924: add x19, x19, #1 ; ImmutableOopMap {r11=Oop r15=Oop r18_tls=Oop c_rarg1=Derived_oop_r18_tls resp=Oop } ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 24 (line 108) ? ;*ifeq {reexecute=1 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - (reexecute) org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_j ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? 0x0000fffeb0b52928: ldr wzr, [x12] ; {poll} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) 11.04% ? 0x0000fffeb0b5292c: ldrb w8, [x28, #0x4b0] ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) 0.41% ?? 0x0000fffeb0b52930: cbz x8, #0xfffeb0b52948 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ?? ;; 0xFFFEBF6BA090 ? 0x0000fffefcb54a88: and x14, x15, #3 ;*land {reexecute=0 rethrow=0 return_oop=0} ?? 0x0000fffeb0b52934: mov x8, #0xa090 ; {external_word} ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 14 (line 386) ?? 0x0000fffeb0b52938: movk x8, #0xbf6b, lsl #16 ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::isAlignedForElement at 8 (line 381) ?? 0x0000fffeb0b5293c: movk x8, #0xfffe, lsl #32 ? ; - jdk.internal.foreign.LayoutPath::checkAlign at 6 (line 290) ?? 0x0000fffeb0b52940: mov x0, x28 ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231c8c00::invokeStatic at 15 ?? 0x0000fffeb0b52944: blr x8 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cd800::invoke at 26 10.39% ?? 0x0000fffeb0b52948: cbz w13, #0xfffeb0b528b0;*aload_1 {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231cf400::guardWithCatch at 42 ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_control_jmhTest::contro ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 18 ;; B26: # out( N1 ) <- in( B25 B14 ) Freq: 0.13522 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) 0x0000fffeb0b5294c: adr x9, #0xfffeb0b52964 ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ;; 0xFFFEBFC842A0 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh 0x0000fffeb0b52950: mov x8, #0x42a0 ; {runtime_call os::javaTimeNanos()} 6.51% ? 0x0000fffefcb54a8c: ldr x18, [x13, #0x10] 0x0000fffeb0b52954: movk x8, #0xbfc8, lsl #16 ? 0x0000fffefcb54a90: cbnz x14, #0xfffefcb54c5c;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} 0x0000fffeb0b52958: movk x8, #0xfffe, lsl #32 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) 0x0000fffeb0b5295c: stp xzr, x9, [sp, #-0x10]! ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) 0x0000fffeb0b52960: blr x8 ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh 0x0000fffeb0b52964: nop ; {other} ? ;; B30: # out( B39 B31 ) <- in( B29 ) Freq: 91235.6 .................................................................................................... 8.03% ? 0x0000fffefcb54a94: sub x14, x18, #3 ;*ladd {reexecute=0 rethrow=0 return_oop=0} 97.49% ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 14 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54a98: tbnz x14, #0x3f, #0xfffefcb54bac ? ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkBounds at 16 (line 404) ? ; - jdk.internal.foreign.AbstractMemorySegmentImpl::checkAccess at 26 (line 364) ? ; - java.lang.invoke.VarHandleSegmentAsInts::checkAddress at 15 (line 81) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 14 (line 107) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; B31: # out( B40 B32 ) <- in( B30 ) Freq: 91235.5 8.20% ? 0x0000fffefcb54a9c: cmp x14, #4 0.80% ? 0x0000fffefcb54aa0: b.ls #0xfffefcb54be4 ? ;; B32: # out( B64 B33 ) <- in( B31 ) Freq: 91235.4 6.32% ? 0x0000fffefcb54aa4: ldr w13, [x2, #0x14] ; implicit exception: dispatches to 0x0000fffefcb550bc ? ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; B33: # out( B45 B34 ) <- in( B32 ) Freq: 91235.3 ? 0x0000fffefcb54aa8: lsl x13, x13, #3 ;*getfield owner {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 1 (line 192) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54aac: cbnz x13, #0xfffefcb54d10;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; B34: # out( B46 B35 ) <- in( B33 ) Freq: 91235.2 8.19% ? 0x0000fffefcb54ab0: ldr w13, [x2, #0xc] ;*getfield state {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.foreign.MemorySessionImpl::checkValidStateRaw at 22 (line 195) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 5 (line 1891) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54ab4: tbnz w13, #0x1f, #0xfffefcb54d4c ? ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; B35: # out( B21 B36 ) <- in( B34 ) Freq: 91235.2 ? 0x0000fffefcb54ab8: ldarb w17, [x0] ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54abc: mov x13, x15 ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54ac0: ldr w13, [x13, #4] ;*invokevirtual getIntUnaligned {reexecute=0 rethrow=0 return_oop=0} ? ; - jdk.internal.misc.Unsafe::getIntUnaligned at 5 (line 3576) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnalignedInternal at 15 (line 1893) ? ; - jdk.internal.misc.ScopedMemoryAccess::getIntUnaligned at 6 (line 1881) ? ; - java.lang.invoke.VarHandleSegmentAsInts::get at 48 (line 108) ? ; - java.lang.invoke.LambdaForm$DMH/0x00000000231d1c00::invokeStatic at 14 ? ; - java.lang.invoke.LambdaForm$MH/0x00000000231d3800::invoke at 53 ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54ac4: add w13, w13, w11 ;*getfield isDone {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? ;; membar_acquire (elided) ? 0x0000fffefcb54ac8: ldr x11, [x28, #0x498] ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmh ? 0x0000fffefcb54acc: add x19, x19, #1 ; ImmutableOopMap {r10=Oop r12=Oop r16=Oop c_rarg0=Derived_oop_r16 resp=Oop } ? ;*ifeq {reexecute=1 rethrow=0 return_oop=0} ? ; - (reexecute) org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_meth ? 0x0000fffefcb54ad0: ldr wzr, [x11] ; {poll} 6.89% ? 0x0000fffefcb54ad4: ldrb w8, [x28, #0x4b0] ?? 0x0000fffefcb54ad8: cbz x8, #0xfffefcb54af0 ?? ;; 0xFFFF0CABA090 ?? 0x0000fffefcb54adc: mov x8, #0xa090 ; {external_word} ?? 0x0000fffefcb54ae0: movk x8, #0xcab, lsl #16 ?? 0x0000fffefcb54ae4: movk x8, #0xffff, lsl #32 ?? 0x0000fffefcb54ae8: mov x0, x28 ?? 0x0000fffefcb54aec: blr x8 ;*invokevirtual invokeBasic {reexecute=0 rethrow=0 return_oop=0} ?? ; - java.lang.invoke.VarHandleGuards::guard_LJ_I at 80 (line 1002) ?? ; - org.openjdk.bench.vm.lang.ReproducerBenchmarks::gwt2_methodhandle at 30 (line 107) ?? ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_j 8.17% ?? 0x0000fffefcb54af0: cbz w17, #0xfffefcb54a20;*aload_1 {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.lang.jmh_generated.ReproducerBenchmarks_gwt2_methodhandle_jmhTe ;; B36: # out( N1 ) <- in( B35 B19 ) Freq: 0.0924477 0x0000fffefcb54af4: adr x9, #0xfffefcb54b0c ;; 0xFFFF0D0842A0 0x0000fffefcb54af8: mov x8, #0x42a0 ; {runtime_call os::javaTimeNanos()} 0x0000fffefcb54afc: movk x8, #0xd08, lsl #16 0x0000fffefcb54b00: movk x8, #0xffff, lsl #32 .................................................................................................... 97.59% From mbaesken at openjdk.org Fri Apr 26 13:33:26 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 26 Apr 2024 13:33:26 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 12:40:00 GMT, Thomas Stuefe wrote: >> We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). >> >> Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. >> >> --- >> >> This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). >> >> ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Adapt test > - merge > - JDK-8330677-Add-Per-Compilation-memory-usage-to-JFR Regarding src/hotspot/share/jfr/metadata/metadata.xml , could you label it 'Arena Peak Usage' or something like this ? I would like to have it more clear that it is the peak usage. Did you check my nullptr check related question ? Otherwise looks okay to me. It is a bit unfortunate that by default , we have now an 'empty' added field in the JFR Compilation event. But it is what it is; any chances that the compiler mem statistics would be enabled by default in the future, or is this considered to costly ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2079356638 From rcastanedalo at openjdk.org Fri Apr 26 13:34:20 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Apr 2024 13:34:20 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic [v2] In-Reply-To: References: Message-ID: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Put back ZGC-specific trampoline stub state into ZBarrierSetC2State ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18967/files - new: https://git.openjdk.org/jdk/pull/18967/files/17fc99df..7132780f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18967&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18967&range=00-01 Stats: 45 lines in 2 files changed: 21 ins; 22 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18967/head:pull/18967 PR: https://git.openjdk.org/jdk/pull/18967 From rcastanedalo at openjdk.org Fri Apr 26 13:34:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Apr 2024 13:34:22 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 08:12:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! The trampoline stub state is ZGC-specific (as pointed out offline by @xmas92, thanks Axel!). Commit 7132780 moves it back into its original class (`ZBarrierSetC2State`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/18967#issuecomment-2079376515 From mli at openjdk.org Fri Apr 26 14:12:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 14:12:15 GMT Subject: RFR: 8321014: RISC-V: C2 VectorLoadShuffle [v2] In-Reply-To: References: <2BnwS9jgmNd3btm9dKj_ZbU6BCiBxnvKlO1EUAIxxDo=.fe4e612c-e219-4539-b9b8-7d011a63ac99@github.com> Message-ID: On Thu, 18 Apr 2024 14:16:34 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment > > Marked as reviewed by luhenry (Committer). Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18835#issuecomment-2079471316 From mli at openjdk.org Fri Apr 26 14:12:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 14:12:16 GMT Subject: Integrated: 8321014: RISC-V: C2 VectorLoadShuffle In-Reply-To: References: Message-ID: On Thu, 18 Apr 2024 11:09:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch for instrinsic VectorLoadShuffle? > > BTW, without this intrinsic, some other vector api operation does not work as well (e.g. rearrange) on riscv. > > Thanks > > ## Test > test/jdk/jdk/incubator/vector/ > test/hotspot/jtreg/compiler/vectorapi This pull request has now been integrated. Changeset: d13e5334 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/d13e53346f3cd50bf7a4241ba86d2e21d9081bbe Stats: 44 lines in 1 file changed: 44 ins; 0 del; 0 mod 8321014: RISC-V: C2 VectorLoadShuffle Reviewed-by: luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/18835 From mli at openjdk.org Fri Apr 26 14:25:57 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 14:25:57 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v12] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:46:08 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Add vectorized and scalar version Float tests checking full 32 bits range Hey, Is someone available to take a look at this pure test addition? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2079498846 From mli at openjdk.org Fri Apr 26 14:32:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 26 Apr 2024 14:32:06 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v12] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:46:08 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Add vectorized and scalar version Float tests checking full 32 bits range Hey, Is someone available to have a look at this pure test addition? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17753#issuecomment-2079508769 From lucy at openjdk.org Fri Apr 26 15:23:51 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 26 Apr 2024 15:23:51 GMT Subject: RFR: 8331167: UBSan enabled build fails in adlc on macOS In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:06:28 GMT, Matthias Baesken wrote: > When configuring with '--enable-ubsan' (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and doing a macOS x86_64 fastdebug build, I run into this build error after very short time : > jdk/src/hotspot/share/adlc/adlparse.cpp:5228:36: runtime error: applying non-zero offset 1 to null pointer > #0 0x103fa4b4b in ADLParser::skipws_common(bool) adlparse.cpp:5228 > #1 0x103f76aed in ADLParser::skipws() adlparse.hpp:271 > #2 0x103f763c6 in ADLParser::parse() adlparse.cpp:95 > #3 0x10407054d in main main.cpp:178 > #4 0x7fff2044ef3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c) > > So it seems that UBSan support is currently not working well on macOS because the build fails early. Seems we add 1 to a nullptr in the adlc code in some cases and UBSAN complains about it. Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18976#pullrequestreview-2025310735 From snazarki at openjdk.org Fri Apr 26 15:26:52 2024 From: snazarki at openjdk.org (Sergey Nazarkin) Date: Fri, 26 Apr 2024 15:26:52 GMT Subject: RFR: 8330806: test/hotspot/jtreg/compiler/c1/TestLargeMonitorOffset.java fails on ARM32 In-Reply-To: References: Message-ID: <0TiKLBlllAunug0vnrED5etz2Asg0faInPkxw2qebE8=.327bf508-f675-4b1a-8d65-866cae772234@github.com> On Mon, 22 Apr 2024 14:21:09 GMT, Aleksei Voitylov wrote: > TestLargeMonitorOffset was introduced by 8310844 with a fix for the AArch64 platform. The same issue needs to be fixed for ARM32. With this change, we add the large slot_offset handling to the ARM32 version of IR_Assembler::osr_entry(). > > Testing: jtreg hotspot, jtreg jdk tier1-3. src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp line 156: > 154: int slot_offset = monitor_offset - (i * 2 * BytesPerWord); > 155: if (slot_offset >= 4096 - BytesPerWord) { > 156: __ add_slow(R2, OSR_buf, slot_offset); Can't we check this once before the loop? Or does such an optimization make no sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18891#discussion_r1581190935 From kvn at openjdk.org Fri Apr 26 15:47:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 15:47:52 GMT Subject: RFR: 8331159: VM build without C2 fails after JDK-8180450 In-Reply-To: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> References: <86NGpx5VVOK6KuR1qbhLRS27zau-DEwXW31EakcquYY=.4d56d214-1a07-4ff5-a1af-e18a545ad725@github.com> Message-ID: <7Sf-0fY8Zv0jxnDZk0mT13_LNTbVFjDcWTx_0kqS3CY=.5c0ec0dc-1114-4ef7-81af-1d129213b407@github.com> On Thu, 25 Apr 2024 20:54:23 GMT, Bernhard Urban-Forster wrote: > x86 bits are fine. I would like author of this code to review this. @theRealAph please look. There is difference between x64 and aarch64 how this stub is generated. For x64 it is generated as part of compiler stubs under `#ifdef COMPILER2`. For aarch64 it is in final stubs. Why is difference? If it is really C2 specific stub then why `UseSecondarySupersTable` flag is global? `MacroAssembler::lookup_secondary_supers_table*()` methods could be also in `c2_MacroAssembler*` files. ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18962#pullrequestreview-2025359590 From stuefe at openjdk.org Fri Apr 26 16:41:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 26 Apr 2024 16:41:54 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 13:03:26 GMT, Matthias Baesken wrote: > Regarding src/hotspot/share/jfr/metadata/metadata.xml , could you label it 'Arena Peak Usage' or something like this ? I would like to have it more clear that it is the peak usage. I already find "Arena Usage" to be too long, to be honest. Longer column labels make the JMC table less readable (ideally, one would have a "label," short and descriptive, and a "description," one being the column label, the other, e.g., a tooltip). And "peak" is really the only option that makes sense here. If you ask someone what they think "usage" means, they will assume its the largest footprint accumulated during compilation over the time span of the compilation, aka peak. > Did you check my nullptr check related question ? I read through the comments twice and did not find a nullptr related question. Which question? > Otherwise looks okay to me. It is a bit unfortunate that by default , we have now an 'empty' added field in the JFR Compilation event. But it is what it is; any chances that the compiler mem statistics would be enabled by default in the future, or is this considered too costly ? One synchronization per compilation. So, not that costly, no. It will be enabled by default in debug builds with https://github.com/openjdk/jdk/pull/18969 since it is implied in memlimit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2079729123 From vlivanov at openjdk.org Fri Apr 26 16:55:12 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 26 Apr 2024 16:55:12 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value Message-ID: For MethodHandle linkers all arguments are casted to signature classes when target method is known. It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. Proposed fix avoids casts when signature class is unloaded. Testing: hs-tier1 - hs-tier4 ------------- Commit messages: - Remove UseNewCode - Test & fix Changes: https://git.openjdk.org/jdk/pull/18973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322726 Stats: 181 lines in 5 files changed: 168 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/18973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18973/head:pull/18973 PR: https://git.openjdk.org/jdk/pull/18973 From kvn at openjdk.org Fri Apr 26 17:09:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 17:09:52 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 Looks reasonable and good refactoring. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18973#pullrequestreview-2025497058 From kvn at openjdk.org Fri Apr 26 17:23:53 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 17:23:53 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html I submitted our testing. Why you did not merge your latest [JDK-8330625](https://github.com/openjdk/jdk/commit/2b7176a55ad0e5c6ba34abba3fe8fc1a411a5e2d) change? Patch was applied with offsets. So I will test latest JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2079787664 From szaldana at openjdk.org Fri Apr 26 17:44:04 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 26 Apr 2024 17:44:04 GMT Subject: RFR: 8319957: PhaseOutput::code_size is unused and should be removed Message-ID: Hi all, This PR removes the unused ```PhaseOutput::code_size / method_size```. These were moved over from ```src/hotspot/share/opto/compile.hpp``` in the refactor from [8240363](https://bugs.openjdk.org/browse/JDK-8240363). Here's the git link for reference https://github.com/openjdk/jdk/commit/21cd75cb98f658639df14632680e9c5e58f11faa. I also checked whether there were any usages prior to the refactor and couldn?t find anything so I think it?s safe to remove it. Thanks, Sonia ------------- Commit messages: - 8319957: PhaseOutput::code_size is unused and should be removed Changes: https://git.openjdk.org/jdk/pull/18981/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18981&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319957 Stats: 3 lines in 2 files changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18981.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18981/head:pull/18981 PR: https://git.openjdk.org/jdk/pull/18981 From matsaave at openjdk.org Fri Apr 26 17:54:03 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 26 Apr 2024 17:54:03 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v5] In-Reply-To: References: Message-ID: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Removed empty line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18477/files - new: https://git.openjdk.org/jdk/pull/18477/files/c4789510..716a17cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From matsaave at openjdk.org Fri Apr 26 17:54:04 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 26 Apr 2024 17:54:04 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> Message-ID: On Tue, 23 Apr 2024 19:53:01 GMT, Dean Long wrote: >> My confusion is because @dean-long said >> >> _If I understand correctly, the order of writes must be: >> >> ResolvedFieldEntry fields, except _get_code and _put_code >> _get_code, _put_code >> patch_bytecode(fast_bytecode)_ >> >> therefore, if that ordering must be maintained, we'll need two store fences. And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. > >> And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. > > On the read side, I don't think we read _get_code or _put_code for the fast bytecode path, so that's why there is only one barrier needed. If I understand correctly, it seems like we agree on where the membar belongs, is this right @dean-long? The current placement of the LoadLoad barrier inside `load_field_entry` seems to be sufficient, and to reiterate information from the description, tier 1-5 test results look clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2079827086 From sviswanathan at openjdk.org Fri Apr 26 18:10:55 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 Apr 2024 18:10:55 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: <_u8OUYZTsDfl7lzwoee3zewukw-yuFsn1_37Fn7iY5o=.2824d10d-30dd-4314-bae7-0beac0d79e2d@github.com> On Fri, 19 Apr 2024 21:51:45 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > bug fix in other ::prefix_rex2 src/hotspot/cpu/x86/assembler_x86.cpp line 13260: > 13258: } else { > 13259: emit_int24((prefix & 0xFF00) >> 8, prefix & 0x00FF, b1); > 13260: } We need a check for UseAPX > 0 here. src/hotspot/cpu/x86/assembler_x86.cpp line 14004: > 14002: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true); > 14003: emit_opcode_prefix_and_encoding((unsigned char)0xB8, 0xC0, encode); > 14004: } void Assembler::popcntq(Register dst, Address src) also need to be handled for rex2 generation. get_prefixq() will return a 16 bit entity and so call to emit_int32 directly is not correct. emit_int32((unsigned char)0xF3, get_prefixq(src, dst), 0x0F, (unsigned char)0xB8); Likewise void Assembler::cvttsd2siq(Register dst, Address src) also needs to be updated to handle extended gprs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581333389 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581329964 From kvn at openjdk.org Fri Apr 26 18:34:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 18:34:52 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html test/hotspot/jtreg/compiler/print/CompileCommandPrintMemStat.java line 3: > 1: /* > 2: * Copyright (c) 2023, 2024 Red Hat, Inc. All rights reserved. > 3: * Copyright (c) 2023, 2024 Oracle and/or its affiliates. All rights reserved. Missing comma `,` after second year. Files header verifier failed for both test files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18969#discussion_r1581394676 From dlong at openjdk.org Fri Apr 26 19:19:53 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Apr 2024 19:19:53 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 This only detects TOP in the one place we know causes problems. This bug went undetected for a long time because we have no detection for TOP being used as an argument value in general. How do we know there aren't other places this could happen? Can't we detect this for Call nodes at least, or is the problem that the whole sub-tree might be dead code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2079988076 From duke at openjdk.org Fri Apr 26 19:31:50 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 26 Apr 2024 19:31:50 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 22:48:14 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> bug fix in other ::prefix_rex2 > > src/hotspot/cpu/x86/assembler_x86.cpp line 3620: > >> 3618: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 3619: // swap src/dst to get correct prefix >> 3620: int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes, true); > > Here the last argument to simd_prefix_and_encode shouldn't be true as src is not gpr? Because of the swap of src and dst args in this case, I think src is actually a gpr here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581455716 From dlong at openjdk.org Fri Apr 26 19:33:50 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Apr 2024 19:33:50 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> Message-ID: On Fri, 26 Apr 2024 17:50:03 GMT, Matias Saavedra Silva wrote: >>> And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. >> >> On the read side, I don't think we read _get_code or _put_code for the fast bytecode path, so that's why there is only one barrier needed. > > If I understand correctly, it seems like we agree on where the membar belongs, is this right @dean-long? The current placement of the LoadLoad barrier inside `load_field_entry` seems to be sufficient, and to reiterate information from the description, tier 1-5 test results look clean. @matias9927, yes that's right. But I agree with earlier reviewer comments that the comment above the LoadLoad could be improved. To me it's helpful to think of this as a kind of self-modifying code. We had a slow bytecode that used the operands in a certain way. Then we changed the opcode to a fast bytecode that uses the operands to access newly initialized data outside the bytecode stream. The LoadLoad makes sure we don't read any of the newly initialized data before we read that the bytecode has been changed to the fast version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2080006453 From kvn at openjdk.org Fri Apr 26 19:51:49 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 19:51:49 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: <1JqBculLT6d4b-mEGOqoTBbs_lYW8Fqe-umvr0EGFkc=.878f501f-0069-4d81-87a6-52919425ef64@github.com> On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html `compiler/c2/TestFindNode.java` and `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java` failed on `aarch64` when run with stress flags: `-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:+StressArrayCopyMacroNode -XX:+StressLCM -XX:+StressGCM -XX:+StressIGVN -XX:+StressCCP -XX:+StressMacroExpansion -XX:+StressMethodHandleLinkerInlining -XX:+StressCompiledExceptionHandlers` # Internal Error (/workspace/open/src/hotspot/share/compiler/compilationMemoryStatistic.cpp:551), pid=912622, tid=912639 # fatal error: c2 compiler/c2/TestFindNode::test(()V): Hit MemLimit (limit: 1073741824 now: 1073824544) V [libjvm.so+0x97496c] report_fatal(VMErrorType, char const*, int, char const*, ...)+0x108 (debug.cpp:214) V [libjvm.so+0x8a9e2c] CompilationMemoryStatistic::on_arena_change(long, Arena const*)+0x5bc (compilationMemoryStatistic.cpp:551) V [libjvm.so+0x58ee70] Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x10c (arena.cpp:300) V [libjvm.so+0x13eb86c] PhaseChaitin::Split(unsigned int, ResourceArea*)+0x3ac (resourceArea.inline.hpp:35) V [libjvm.so+0x798a44] PhaseChaitin::Register_Allocate()+0x554 (chaitin.cpp:553) V [libjvm.so+0x8ced88] Compile::Code_Gen()+0x284 (compile.cpp:2988) V [libjvm.so+0x8d1128] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1588 (compile.cpp:896) V [libjvm.so+0x72d480] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x17c (c2compiler.cpp:142) Looks like Register Allocator eats memory. There also few closed tests which failed with `-XX:-TieredCompilation -XX:+AlwaysIncrementalInline` on `x64` and `aarch64`: V [libjvm.so+0xa90b95] report_fatal(VMErrorType, char const*, int, char const*, ...)+0x105 (debug.cpp:214) V [libjvm.so+0x9c06dd] CompilationMemoryStatistic::on_arena_change(long, Arena const*)+0x5cd (compilationMemoryStatistic.cpp:551) V [libjvm.so+0x5f45fc] Arena::grow(unsigned long, AllocFailStrategy::AllocFailEnum)+0x10c (arena.cpp:300) V [libjvm.so+0x14e4706] Parse::init_blocks()+0x46 (parse1.cpp:1290) V [libjvm.so+0x14f0aff] Parse::Parse(JVMState*, ciMethod*, float)+0x52f (parse1.cpp:565) V [libjvm.so+0x8412b9] ParseGenerator::generate(JVMState*)+0x169 (callGenerator.cpp:99) V [libjvm.so+0x84482a] PredictedCallGenerator::generate(JVMState*)+0x3aa (callGenerator.cpp:928) V [libjvm.so+0x8466de] CallGenerator::do_late_inline_helper()+0x90e (callGenerator.cpp:704) V [libjvm.so+0x9e2de4] Compile::inline_incrementally_one()+0xd4 (compile.cpp:2054) V [libjvm.so+0x9e3b32] Compile::inline_incrementally(PhaseIterGVN&)+0x292 (compile.cpp:2137) V [libjvm.so+0x9e5b50] Compile::Optimize()+0x340 (compile.cpp:2272) V [libjvm.so+0x9e9c60] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b50 (compile.cpp:863) V [libjvm.so+0x83ed15] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2080024827 PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2080030280 From matsaave at openjdk.org Fri Apr 26 20:22:50 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 26 Apr 2024 20:22:50 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v4] In-Reply-To: References: <5vm-ObonhSknfagWkqY1AROpb3LiDfUGj1B06MHt31E=.e5b59d78-6b8f-4c8c-83bc-fb19cb96c30e@github.com> Message-ID: On Tue, 23 Apr 2024 15:32:19 GMT, Andrew Haley wrote: >>> So, I guess the loadload fence being inserted here is the one we need between [2] and [3]. >> >> The way I would say it is we need a LoadLoad betwen [3] and [2] or between [3] and [1]. The code assumes that if it is a fast bytecode, then it can read [1] without checking [2] again. > > My confusion is because @dean-long said > > _If I understand correctly, the order of writes must be: > > ResolvedFieldEntry fields, except _get_code and _put_code > _get_code, _put_code > patch_bytecode(fast_bytecode)_ > > therefore, if that ordering must be maintained, we'll need two store fences. And on the reading side, we'll need two load fences. If that total order is more than is necessary, OK. @theRealAph @dean-long how about replacing the comment with this: `// Prevents stale data from being read by another thread after the bytecode is patched to the fast bytecode` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2080065219 From sviswanathan at openjdk.org Fri Apr 26 20:22:52 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 Apr 2024 20:22:52 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 19:29:04 GMT, Steve Dohrmann wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 3620: >> >>> 3618: InstructionAttr attributes(AVX_128bit, /* rex_w */ true, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >>> 3619: // swap src/dst to get correct prefix >>> 3620: int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes, true); >> >> Here the last argument to simd_prefix_and_encode shouldn't be true as src is not gpr? > > Because of the swap of src and dst args in this case, I think src is actually a gpr here. Yes, you are correct. Thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581494567 From kvn at openjdk.org Fri Apr 26 20:38:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 20:38:50 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Closed test passed with `1200M` limit ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2080084265 From sviswanathan at openjdk.org Fri Apr 26 20:38:51 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 Apr 2024 20:38:51 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Fri, 19 Apr 2024 21:51:45 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > bug fix in other ::prefix_rex2 Should is_src_gpr be set to true for the following: void Assembler::movdl(Register dst, XMMRegister src) { NOT_LP64(assert(VM_Version::supports_sse2(), "")); InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); // swap src/dst to get correct prefix int encode = simd_prefix_and_encode(src, xnoreg, as_XMMRegister(dst->encoding()), VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int16(0x7E, (0xC0 | encode)); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2080084910 From duke at openjdk.org Fri Apr 26 20:44:03 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 26 Apr 2024 20:44:03 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v5] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: from review comments: simplification, fix comments and white space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/eb246fd7..7bd4b885 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=03-04 Stats: 68 lines in 2 files changed: 0 ins; 2 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Fri Apr 26 20:44:04 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 26 Apr 2024 20:44:04 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: References: Message-ID: On Thu, 25 Apr 2024 18:21:28 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> bug fix in other ::prefix_rex2 > > src/hotspot/cpu/x86/assembler_x86.cpp line 648: > >> 646: } >> 647: } else if ((base_enc & 0x7) == 4) { >> 648: // rbp | r12 | r20 | r28 > > Comment should be: > // rsp | r12 | r20 | r28 Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 670: > >> 668: } else { >> 669: // [base + disp] >> 670: // !(rbp | r12 | r20 | r28) were handled above > > comment should be: > // (rsp | r12 | r20 | r28) were handled above Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 12897: > >> 12895: if (adr.index_needs_rex2()) { >> 12896: assert(false, "prefix(Register dst, Address adr) does not support handling of an X"); >> 12897: } > > this could be written as: > assert(!adr.index_needs_rex2(), "prefix(Register dst, Address adr) does not support handling of an X"); Very nice. Thank you, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 13839: > >> 13837: void Assembler::movsbq(Register dst, Address src) { >> 13838: InstructionMark im(this); >> 13839: int prefix = get_prefixq(src, dst, true /* page1 */); > > We are not consistent in the comment is_map1, M0, page1 all refer to the same thing. Also some places there is no comment that the true is for is_map1. Thank you. Naming is now consistent with APX spec naming (is_map1 for the bool argument and M0 for the bit name). I added inline comment /* is_map1 */ for calls that pass the boolean argument. > src/hotspot/cpu/x86/assembler_x86.cpp line 14479: > >> 14477: _input_size_in_bits = input_size_in_bits; >> 14478: } >> 14479: } > > New line missing at the end of file. Thanks, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581510510 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581511199 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581511569 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581511880 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581512106 From dlong at openjdk.org Fri Apr 26 21:06:52 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Apr 2024 21:06:52 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:54:03 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed empty line OK, except for the "another thread" part. The reads are done in the current thread, so that's the thread the barrier is for. // Prevents stale data from being read after the bytecode is patched to the fast bytecode ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2080117243 From matsaave at openjdk.org Fri Apr 26 21:10:00 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 26 Apr 2024 21:10:00 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6] In-Reply-To: References: Message-ID: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Improved comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18477/files - new: https://git.openjdk.org/jdk/pull/18477/files/716a17cf..a08af97f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18477&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18477/head:pull/18477 PR: https://git.openjdk.org/jdk/pull/18477 From kvn at openjdk.org Fri Apr 26 22:07:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Apr 2024 22:07:42 GMT Subject: RFR: 8326421: Add jtreg test for large arrayCopy disjoint case. [v2] In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 07:11:44 GMT, Swati Sharma wrote: >> Hi All, >> >> Added a new jtreg test case for large arrayCopy disjoint case. >> This will test byte array copy operation for aligned and non aligned cases with array length greater than 2.5MB. >> >> Please review and provide your feedback. >> >> Thanks, >> Swati >> Intel > > Swati Sharma has updated the pull request incrementally with one additional commit since the last revision: > > 8326421: Resolved review comments. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17962#pullrequestreview-2025928479 From vlivanov at openjdk.org Fri Apr 26 22:23:51 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 26 Apr 2024 22:23:51 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: <0M7fHztBonkFB1OWu4FmKFjpqo-JE4QRrdWl5pMWfp8=.8ed7567d-a108-4f9d-8aa0-395429be3ccd@github.com> On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 This fix specifically focuses on the issue with MethodHandle linkers. It was reported to cause crashes in the field and has to be backported. First of all, MethodHandle linkers are special: no other call sites introduce casts to signature types on arguments. Ruling out Call nodes with TOP arguments is problematic because they may arise in paradoxical situations (e.g., in effectively dead code). But such Call nodes can be turned into Halt nodes to clearly signal the code can't be executed and be able to catch similar bugs at runtime. Also, I believe the handling of unloaded classes may be broken. There's no reliable way to tell apart classes from interfaces until they are loaded. But their effects in type system are different: while a meet with unloaded class may be TOP, it's not the case for interfaces (which are erased to Object during verification). Overall, all aforementioned points deserve follow-up RFEs/fixes, but the fix for 8322726 should be focused on the problem with MethodHandle linkers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2080178661 From dlong at openjdk.org Fri Apr 26 22:27:26 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Apr 2024 22:27:26 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18973#pullrequestreview-2025945190 From sviswanathan at openjdk.org Fri Apr 26 23:10:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 Apr 2024 23:10:05 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 20:44:03 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > from review comments: simplification, fix comments and white space Should is_src_gpr be set to true for the additional following instructions as well: void Assembler::pextrd(Register dst, XMMRegister src, int imm8) void Assembler::pextrq(Register dst, XMMRegister src, int imm8) void Assembler::pextrb(Register dst, XMMRegister src, int imm8) void Assembler::extractps(Register dst, XMMRegister src, uint8_t imm8) void Assembler::pextl(Register dst, Register src1, Address src2) void Assembler::pdepl(Register dst, Register src1, Address src2) void Assembler::pextq(Register dst, Register src1, Address src2) void Assembler::pdepq(Register dst, Register src1, Address src2) void Assembler::movdq(Register dst, XMMRegister src) Also the following instruction is not handled for egprs: void Assembler::popq(Register dst) It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: // Do we have this wrong src and dst reversed in simd_prefix_and_encode? void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { assert(VM_Version::supports_sse2(), ""); InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ false); int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); } Once that PR is fixed, is_src_gpr should be set to true for this one as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2080217467 PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2080217893 PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2080219230 From duke at openjdk.org Fri Apr 26 23:23:23 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Fri, 26 Apr 2024 23:23:23 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v6] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix is_gpr arg on two functions with reversed src / dst operands ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/7bd4b885..21524eea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From sviswanathan at openjdk.org Fri Apr 26 23:41:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 26 Apr 2024 23:41:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 20:44:03 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > from review comments: simplification, fix comments and white space src/hotspot/cpu/x86/assembler_x86.cpp line 11717: > 11715: > 11716: void Assembler::vex_prefix(Address adr, int nds_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes) { > 11717: bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16; We could add an assert here: if (adr.base_needs_rex2() || adr.index_needs_rex2()) { assert(UseAPX, "APX features not enabled"); } src/hotspot/cpu/x86/assembler_x86.cpp line 11769: > 11767: } > 11768: > 11769: int Assembler::vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool src_is_gpr) { We could add an assert here: if (src_is_gpr && src_enc >= 16) { assert(UseAPX, "APX features not enabled"); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581608225 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581609969 From duke at openjdk.org Sat Apr 27 00:07:21 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Sat, 27 Apr 2024 00:07:21 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v7] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/21524eea..7f845511 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=05-06 Stats: 10 lines in 1 file changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Sat Apr 27 00:07:22 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Sat, 27 Apr 2024 00:07:22 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 23:04:46 GMT, Sandhya Viswanathan wrote: > Should is_src_gpr be set to true for the additional following instructions as well: Thanks @sviswa7, yes some of these should also have src_is_gpr = true. Fixed: void Assembler::extractps(Register dst, XMMRegister src, uint8_t imm8) void Assembler::pextrd(Register dst, XMMRegister src, int imm8) void Assembler::pextrq(Register dst, XMMRegister src, int imm8) void Assembler::pextrb(Register dst, XMMRegister src, int imm8) void Assembler::movdq(Register dst, XMMRegister src) Use a different prefix function that does use the src_if_gpr flag: void Assembler::pextl(Register dst, Register src1, Address src2) void Assembler::pdepl(Register dst, Register src1, Address src2) void Assembler::pextq(Register dst, Register src1, Address src2) void Assembler::pdepq(Register dst, Register src1, Address src2) > src/hotspot/cpu/x86/assembler_x86.cpp line 11717: > >> 11715: >> 11716: void Assembler::vex_prefix(Address adr, int nds_enc, int xreg_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes) { >> 11717: bool is_extended = adr.base_needs_rex2() || adr.index_needs_rex2() || nds_enc >= 16 || xreg_enc >= 16; > > We could add an assert here: > if (adr.base_needs_rex2() || adr.index_needs_rex2()) { > assert(UseAPX, "APX features not enabled"); > } Thanks, done. > src/hotspot/cpu/x86/assembler_x86.cpp line 11769: > >> 11767: } >> 11768: >> 11769: int Assembler::vex_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, InstructionAttr *attributes, bool src_is_gpr) { > > We could add an assert here: > if (src_is_gpr && src_enc >= 16) { > assert(UseAPX, "APX features not enabled"); > } Thanks, done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2080248834 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581630063 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1581630162 From stuefe at openjdk.org Sat Apr 27 05:20:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 27 Apr 2024 05:20:04 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 20:36:02 GMT, Vladimir Kozlov wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Closed test passed with `1200M` limit @vnkozlov How do you want to handle this? Are these memory usage numbers pathological or normal? Should I increase the general limit to 1200M? Alternatively, we can also just run failing the test with a higher memory limit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2080368042 From stuefe at openjdk.org Sat Apr 27 05:35:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 27 Apr 2024 05:35:10 GMT Subject: RFR: 8330813: Don't call methods from Compressed(Oops|Klass) if the associated mode is inactive In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 11:04:07 GMT, Thomas Stuefe wrote: > We should not call methods from CompressedOops if we run with -XX:-UseCompressedOops, and the same goes for CompressedKlass and -XX:-UseCompressedClassPointers. (the latter we do assert in Lilliput). @rkennke? Its trivial ------------- PR Comment: https://git.openjdk.org/jdk/pull/18883#issuecomment-2080372021 From aph at openjdk.org Sat Apr 27 09:06:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 27 Apr 2024 09:06:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v16] In-Reply-To: References: Message-ID: <9kNRyvW-MjLcO1WtStKf5du0TsRIBMqu8ROIUfVzU78=.2778473e-db27-497f-9a5c-40120779b453@github.com> On Fri, 19 Apr 2024 13:09:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Emanuel Peter How are we doing here? I've just submitted the Scoped Values (Third Preview) JEP, and it would be helpful to get this work committed in time for the next JDK release. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2080423705 From kvn at openjdk.org Sat Apr 27 18:24:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Apr 2024 18:24:05 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 20:36:02 GMT, Vladimir Kozlov wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Closed test passed with `1200M` limit > @vnkozlov How do you want to handle this? Are these memory usage numbers pathological or normal? > > Should I increase the general limit to 1200M? Alternatively, we can also just run failing the test with a higher memory limit. The closed failed test has very log chain of lambda forms and I don't think memory consumption is pathological. May be something could be done to improve it (as enhancement RFE) but it is not urgent. I will help you with increase limit for it. I did not check which limit will allow to pass failed open tests. I will ask you to investigate and increase limit for them for now and file bugs to investigate the cause. I also need to run later tiers (I ran only up to tier5) before integration. If I find more failing cases we may consider increase default limit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2081120560 From duke at openjdk.org Sun Apr 28 03:04:10 2024 From: duke at openjdk.org (SUN Guoyun) Date: Sun, 28 Apr 2024 03:04:10 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 21:10:00 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1787: > 1785: lea(cache, Address(cache, index)); > 1786: // Prevents stale data from being read after the bytecode is patched to the fast bytecode > 1787: membar(MacroAssembler::LoadLoad); if we put LoadLoad in here, maybe it is redundant for TemplateTable::patch_bytecode(..) https://github.com/openjdk/jdk/pull/18477/files#diff-739c969c24180bfe592cd0a75940d3838503390d3f51c8362a56ec4903252b67R192 because there has a Load-acquire in L199. can we change `load_field_entry(Register cache, Register index, int bcp_offset = 1) ` to `load_field_entry(Register cache, Register index, int bcp_offset = 1, bool needLoadLoad = 1) `? then change L192 like this
@@ -189,7 +189,7 @@ void TemplateTable::patch_bytecode(Bytecodes::Code bc, Register bc_reg,
       // additional, required work.
       assert(byte_no == f1_byte || byte_no == f2_byte, "byte_no out of range");
       assert(load_bc_into_bc_reg, "we use bc_reg as temp");
-      __ load_field_entry(temp_reg, bc_reg);
+      __ load_field_entry(temp_reg, bc_reg, 1 /*bcp_offset*/, false /*needLoadLoad*/);
       if (byte_no == f1_byte) {
         __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::get_code_offset())));
       } else {
         __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::put_code_offset())));
       }                                                                         
       // Load-acquire the bytecode to match store-release in ResolvedFieldEntry::fill_in()
       __ ldarb(temp_reg, temp_reg);
------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1582002700 From dlong at openjdk.org Sun Apr 28 07:29:08 2024 From: dlong at openjdk.org (Dean Long) Date: Sun, 28 Apr 2024 07:29:08 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6] In-Reply-To: References: Message-ID: On Sun, 28 Apr 2024 03:01:36 GMT, SUN Guoyun wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Improved comment > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1787: > >> 1785: lea(cache, Address(cache, index)); >> 1786: // Prevents stale data from being read after the bytecode is patched to the fast bytecode >> 1787: membar(MacroAssembler::LoadLoad); > > if we put LoadLoad in here, maybe it is redundant for TemplateTable::patch_bytecode(..) > > https://github.com/openjdk/jdk/blob/a08af97fda06e785d1c3a5a17e562e65150bbb07/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L192 > > because there has a Load-acquire in L199. > can we change > `load_field_entry(Register cache, Register index, int bcp_offset = 1) ` > to > `load_field_entry(Register cache, Register index, int bcp_offset = 1, bool needLoadLoad = 1) `? > > then change L192 like this >
> @@ -189,7 +189,7 @@ void TemplateTable::patch_bytecode(Bytecodes::Code bc, Register bc_reg,
>        // additional, required work.
>        assert(byte_no == f1_byte || byte_no == f2_byte, "byte_no out of range");
>        assert(load_bc_into_bc_reg, "we use bc_reg as temp");
> -      __ load_field_entry(temp_reg, bc_reg);
> +      __ load_field_entry(temp_reg, bc_reg, 1 /*bcp_offset*/, false /*needLoadLoad*/);
>        if (byte_no == f1_byte) {
>          __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::get_code_offset())));
>        } else {
>          __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::put_code_offset())));
>        }                                                                         
>        // Load-acquire the bytecode to match store-release in ResolvedFieldEntry::fill_in()
>        __ ldarb(temp_reg, temp_reg);
> 
I believe patch_bytecode only needs a LoadStore between reading the cache bytecode and patching at bcp[0], so I would agree the LoadLoad in load_field_entry is not needed here. But the reason is not because we have already load-acquire. The reason is because there are not two loads that need ordering. There is only a load and a store. We can't replace the load-acquire with a LoadLoad, for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1582048233 From aph at openjdk.org Sun Apr 28 11:37:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 28 Apr 2024 11:37:09 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v12] In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 09:46:08 GMT, Hamlin Li wrote: >> HI, >> Can you have a look at this patch adding some tests for Math.round instrinsics? >> Thanks! >> >> ### FYI: >> During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. >> It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. >> Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Add vectorized and scalar version Float tests checking full 32 bits range test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 99: > 97: System.out.println("Verification"); > 98: int errn = 0; > 99: for (long l = Integer.MIN_VALUE; l <= Integer.MAX_VALUE; l+=ARRLEN) { Can't you just do the obvious simple thing here? test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 102: > 100: for (int i = 0; i < ARRLEN; i++) { > 101: input[i] = (int)(l+i); > 102: } What is this array for? As far as i can tell it does nothing useful to batch the test results. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1582089522 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1582089157 From fyang at openjdk.org Mon Apr 29 02:10:12 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 Apr 2024 02:10:12 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic [v2] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 13:34:20 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). >> >> - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. >> >> - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. >> >> - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Put back ZGC-specific trampoline stub state into ZBarrierSetC2State Test good on linux-riscv64 platform too. LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18967#pullrequestreview-2027479205 From kvn at openjdk.org Mon Apr 29 03:22:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 03:22:08 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 10:04:28 GMT, Thomas Stuefe wrote: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html I did not find any other failures in tier6-9. So let's adjust limit and file bugs for failed tests I found before. And keep default limit 1Gb. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2081824930 From thartmann at openjdk.org Mon Apr 29 05:39:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Apr 2024 05:39:04 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 Looks good to me. Please file a follow-up RFE for the FIXMEs. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18973#pullrequestreview-2027592195 From thartmann at openjdk.org Mon Apr 29 05:40:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Apr 2024 05:40:04 GMT Subject: RFR: 8319957: PhaseOutput::code_size is unused and should be removed In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:31:45 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused ```PhaseOutput::code_size / method_size```. > > These were moved over from ```src/hotspot/share/opto/compile.hpp``` in the refactor from [8240363](https://bugs.openjdk.org/browse/JDK-8240363). Here's the git link for reference https://github.com/openjdk/jdk/commit/21cd75cb98f658639df14632680e9c5e58f11faa. > > I also checked whether there were any usages prior to the refactor and couldn?t find anything so I think it?s safe to remove it. > > Thanks, > Sonia Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18981#pullrequestreview-2027593634 From aboldtch at openjdk.org Mon Apr 29 06:37:11 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 29 Apr 2024 06:37:11 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic [v2] In-Reply-To: References: Message-ID: <6zMadq3gDLmH-pQUYEbc_J6G4MjVT2Z2J8tdIPZ48_Q=.cf95cea7-3b22-404b-ad5b-32d66b3c9f7a@github.com> On Fri, 26 Apr 2024 13:34:20 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). >> >> - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. >> >> - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. >> >> - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Put back ZGC-specific trampoline stub state into ZBarrierSetC2State Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18967#pullrequestreview-2027662044 From stuefe at openjdk.org Mon Apr 29 06:44:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 06:44:05 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:20:46 GMT, Vladimir Kozlov wrote: > I submitted our testing. > > Why you did not merge your latest [JDK-8330625](https://github.com/openjdk/jdk/commit/2b7176a55ad0e5c6ba34abba3fe8fc1a411a5e2d) change? Patch was applied with offsets. So I will test latest JDK. Ah forgot. Sorry. Thanks for testing with the latest head. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2081987641 From epeter at openjdk.org Mon Apr 29 07:17:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Apr 2024 07:17:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Tue, 16 Apr 2024 14:46:22 GMT, Roland Westrelin wrote: >> @rwestrel thanks for asking. About 10% seems to still be scheduled and have not completed, on `macosx-x64`. But the rest seems fine. I'll re-review next week :) > > @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. I'm waiting for @rwestrel to respond to my last list of comments/questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2082031126 From roland at openjdk.org Mon Apr 29 07:17:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Apr 2024 07:17:12 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v9] In-Reply-To: References: <5H6XV7Agl6ZNfGWT-bCbIPsimFTYM0pyIGiAHDQUUyA=.168e21cc-6cd8-42d8-ab59-d5e02e241ea2@github.com> <0RKnLUgc6UBtyxSyezCMWsSbP50hu6fQ6UJPHpGlgSU=.9fafa10f-62ee-4ec8-9093-4e204fcbe504@github.com> <5QbsVmYi0tYGlOvDL4LjJb1SjChIZtaWSMthFM9grMI=.0900e1c3-90b3-4726-a7c6-c2aff49d07ce@github.com> Message-ID: On Mon, 29 Apr 2024 07:12:55 GMT, Emanuel Peter wrote: >> @eme64 can you go over my replies above and let me know if they sound good to you? Thanks. > > I'm waiting for @rwestrel to respond to my last list of comments/questions. I'm working on @eme64 latest round of comments (I was out last week). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-2082032881 From rcastanedalo at openjdk.org Mon Apr 29 07:35:05 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 29 Apr 2024 07:35:05 GMT Subject: RFR: 8330685: ZGC: share barrier spilling logic [v2] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 13:34:20 GMT, Roberto Casta?eda Lozano wrote: >> This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). >> >> The main changes are: >> >> - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). >> >> - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. >> >> - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. >> >> - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. >> >> #### Testing >> >> - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). >> - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. >> - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Put back ZGC-specific trampoline stub state into ZBarrierSetC2State Thanks for reviewing, Axel and Fei Yang! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18967#issuecomment-2082058575 From roland at openjdk.org Mon Apr 29 07:39:10 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Apr 2024 07:39:10 GMT Subject: RFR: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Mon, 22 Apr 2024 08:55:48 GMT, Aleksey Shipilev wrote: >> Thanks for reviewing this. >> >>> Have you tried running tests with #18751 applied? >> >> I only ran the particular test that you mentioned in the bug. > > @rwestrel, if you could integrate this, we can then go forward with #18751. Thanks! @shipilev @dean-long @vnkozlov @martinuy thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/18813#issuecomment-2082062974 From roland at openjdk.org Mon Apr 29 07:39:11 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Apr 2024 07:39:11 GMT Subject: Integrated: 8330158: C2: Loop strip mining uses ABS with min int In-Reply-To: References: Message-ID: On Wed, 17 Apr 2024 10:45:04 GMT, Roland Westrelin wrote: > This fixes 3 calls to ABS with a min int argument. I think all of them > are harmless: > > - in `PhaseIdealLoop::exact_limit()`, I removed the call to ABS. The > check is for a stride of 1 or -1. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, for the > computation of `scaled_iters_long`, the stride is passed to `ABS()` > and then implicitly casted to long. I now cast the stride to long > before `ABS()`. For a min int stride, `LoopStripMiningIter * stride` > overflows the int range for all values of `LoopStripMiningIter` > except 0 or 1. Those values are handled early on in that method. So > for a min in stride: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > is always true and the method returns early. > > - in `OuterStripMinedLoopNode::adjust_strip_mined_loop()`, the > computation of `short_scaled_iters` also calls `ABS()` with the > stride as argument. But the result of that computation is only used > if the test for: > ``` > (jlong)scaled_iters != scaled_iters_long > ``` > doesn't cause an early return of the method. I reordered statements > so the `ABS()` calls happens after that test which will cause an early > return if the stride is min int. This pull request has now been integrated. Changeset: c615c18e Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/c615c18e9f92dc9fdc2db512fbd47fd255f7fe86 Stats: 14 lines in 1 file changed: 9 ins; 0 del; 5 mod 8330158: C2: Loop strip mining uses ABS with min int Reviewed-by: shade, kvn, dlong, mbalao ------------- PR: https://git.openjdk.org/jdk/pull/18813 From mbaesken at openjdk.org Mon Apr 29 07:42:05 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Apr 2024 07:42:05 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 16:39:15 GMT, Thomas Stuefe wrote: > I read through the comments twice and did not find a nullptr related question. Which question? See compilationMemoryStatistic.cpp . At some places we check the result for nullptr e.g. jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp Line 79 in b3bcc49 const CompileTask* const task = th->task(); ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2082068154 From mbaesken at openjdk.org Mon Apr 29 08:01:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Apr 2024 08:01:08 GMT Subject: RFR: 8331167: UBSan enabled build fails in adlc on macOS In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:06:28 GMT, Matthias Baesken wrote: > When configuring with '--enable-ubsan' (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and doing a macOS x86_64 fastdebug build, I run into this build error after very short time : > jdk/src/hotspot/share/adlc/adlparse.cpp:5228:36: runtime error: applying non-zero offset 1 to null pointer > #0 0x103fa4b4b in ADLParser::skipws_common(bool) adlparse.cpp:5228 > #1 0x103f76aed in ADLParser::skipws() adlparse.hpp:271 > #2 0x103f763c6 in ADLParser::parse() adlparse.cpp:95 > #3 0x10407054d in main main.cpp:178 > #4 0x7fff2044ef3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c) > > So it seems that UBSan support is currently not working well on macOS because the build fails early. Seems we add 1 to a nullptr in the adlc code in some cases and UBSAN complains about it. Hi Thomas and Lutz, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18976#issuecomment-2082096760 From mbaesken at openjdk.org Mon Apr 29 08:01:09 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Apr 2024 08:01:09 GMT Subject: Integrated: 8331167: UBSan enabled build fails in adlc on macOS In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 12:06:28 GMT, Matthias Baesken wrote: > When configuring with '--enable-ubsan' (https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) and doing a macOS x86_64 fastdebug build, I run into this build error after very short time : > jdk/src/hotspot/share/adlc/adlparse.cpp:5228:36: runtime error: applying non-zero offset 1 to null pointer > #0 0x103fa4b4b in ADLParser::skipws_common(bool) adlparse.cpp:5228 > #1 0x103f76aed in ADLParser::skipws() adlparse.hpp:271 > #2 0x103f763c6 in ADLParser::parse() adlparse.cpp:95 > #3 0x10407054d in main main.cpp:178 > #4 0x7fff2044ef3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c) > > So it seems that UBSan support is currently not working well on macOS because the build fails early. Seems we add 1 to a nullptr in the adlc code in some cases and UBSAN complains about it. This pull request has now been integrated. Changeset: 4edac349 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/4edac349a5d695ce7923344ad5ad0400842241eb Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8331167: UBSan enabled build fails in adlc on macOS Reviewed-by: stuefe, lucy ------------- PR: https://git.openjdk.org/jdk/pull/18976 From stuefe at openjdk.org Mon Apr 29 08:27:07 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 08:27:07 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 12:40:00 GMT, Thomas Stuefe wrote: >> We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). >> >> Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. >> >> --- >> >> This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). >> >> ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Adapt test > - merge > - JDK-8330677-Add-Per-Compilation-memory-usage-to-JFR > > I read through the comments twice and did not find a nullptr related question. Which question? > > See compilationMemoryStatistic.cpp . > > At some places we check the result for nullptr e.g. > > jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp > > Line 79 in [b3bcc49](https://github.com/openjdk/jdk/commit/b3bcc49491b8f8ad337eb4c06201a5468e5c1159) > > ``` > const CompileTask* const task = th->task(); > ``` Yes, it is inconsistent. Allmost all code here (notably anything triggered from start- or end-compilation) are called from the compiler so we run on a compiler thread and in the scope of a ciEnv. So most of the existing nullptr checks are probably not needed. We may want to make this consistent with subsequent RFEs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2082141621 From rcastanedalo at openjdk.org Mon Apr 29 08:45:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 29 Apr 2024 08:45:15 GMT Subject: Integrated: 8330685: ZGC: share barrier spilling logic In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 08:12:46 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to detect, represent, spill, and restore live registers around runtime calls in ZGC C2 barriers so that it can be shared by different collectors (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Create GC-agnostic `BarrierStubC2` and `BarrierSetC2State` classes from which `ZBarrierStubC2` and `ZBarrierSetC2State` derive. In order to make `BarrierSetC2State` GC-agnostic, define a virtual function `bool BarrierSetC2State::needs_liveness_data(const MachNode* mach)` that every derived class can instantiate to specify whether liveness data should be computed for a given C2 node (`mach`). > > - Move the late register liveness computation function `void ZBarrierSetC2::compute_liveness_at_stubs()` to its parent class `BarrierSetC2`. > > - For platforms that support ZGC (x64, aarch64, riscv, ppc), make ZGC's spill and restore logic (`ZSaveLiveRegisters` class and supporting functions in `ZBarrierSetAssembler`) GC-agnostic by moving it into the corresponding `BarrierSetAssembler` file (e.g. from `src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp` to `src/hotspot/cpu/aarch64/gc/shared/barrierSetAssembler_aarch64.(h|c)pp`), and replacing all references to `ZBarrierStubC2` with its parent class `BarrierStubC2`. > > - For platforms that do not support ZGC (x86, arm, s390), define a minimal, unimplemented `BarrierSetAssembler::refine_register` function. This definition is expected by the GC-agnostic `BarrierSetC2::compute_liveness_at_stubs()` function, but it is currently never executed by these platforms since they do not support any late barrier expansion collector yet. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/68b924be2b84c7cb96e55a870b47952464ad96f3) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug, linux-x86-debug, linux-arm32-debug, linux-s390x-debug). @RealFYang, @TheRealMDoerr: could you please test the changeset on riscv and ppc? Thanks! This pull request has now been integrated. Changeset: 549bc6a0 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/549bc6a0398906df3cc08679c751eb0c633ef0be Stats: 1915 lines in 26 files changed: 1061 ins; 838 del; 16 mod 8330685: ZGC: share barrier spilling logic Reviewed-by: eosterlund, mdoerr, fyang, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/18967 From stuefe at openjdk.org Mon Apr 29 09:41:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 09:41:05 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 03:19:27 GMT, Vladimir Kozlov wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > I did not find any other failures in tier6-9. So let's adjust limit and file bugs for failed tests I found before. And keep default limit 1Gb. @vnkozlov Opened a follow up issue for `compiler/c2/TestFindNode.java` : https://bugs.openjdk.org/browse/JDK-8331283 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2082285303 From stuefe at openjdk.org Mon Apr 29 10:14:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 10:14:04 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 03:19:27 GMT, Vladimir Kozlov wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > I did not find any other failures in tier6-9. So let's adjust limit and file bugs for failed tests I found before. And keep default limit 1Gb. @vnkozlov ... but I was unable to reproduce the problem with `compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java` on aarch64. Memory usage during compilation of the test method is ~170MB, which is fine. I tried to reproduce it on both MacOS m1, and on a Raspberry 4 with 64bit linux. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2082346861 From mbaesken at openjdk.org Mon Apr 29 10:46:08 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Apr 2024 10:46:08 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 12:40:00 GMT, Thomas Stuefe wrote: >> We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). >> >> Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. >> >> --- >> >> This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). >> >> ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Adapt test > - merge > - JDK-8330677-Add-Per-Compilation-memory-usage-to-JFR Marked as reviewed by mbaesken (Reviewer). src/hotspot/share/compiler/compilationMemoryStatistic.cpp line 397: > 395: CompilerThread* const th = Thread::current()->as_Compiler_thread(); > 396: ArenaStatCounter* const arena_stat = th->arena_stat(); > 397: CompileTask* const task = th->task(); At some places we check the result for nullptr e.g. https://github.com/openjdk/jdk/blob/b3bcc49491b8f8ad337eb4c06201a5468e5c1159/src/hotspot/share/compiler/compilationMemoryStatistic.cpp#L79 Is that over cautious or should it better be done ? src/hotspot/share/compiler/compileTask.hpp line 117: > 115: // Specifies if _failure_reason is on the C heap. > 116: bool _failure_reason_on_C_heap; > 117: size_t _arena_bytes; // peak size of temporary memory during compilation (e.g. node arenas) Is there a good reason not to name it _peak_arena_bytes when it is always the peak as stated ? src/hotspot/share/jfr/metadata/metadata.xml line 611: > 609: > 610: > 611: Maybe say Peak arena usage, if this is the case as stated above ? And some info that it is optional / must be enabled to see this data would probably help too. ------------- PR Review: https://git.openjdk.org/jdk/pull/18864#pullrequestreview-2024243499 PR Review Comment: https://git.openjdk.org/jdk/pull/18864#discussion_r1580553526 PR Review Comment: https://git.openjdk.org/jdk/pull/18864#discussion_r1580563005 PR Review Comment: https://git.openjdk.org/jdk/pull/18864#discussion_r1580566235 From mbaesken at openjdk.org Mon Apr 29 10:46:09 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 29 Apr 2024 10:46:09 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Mon, 29 Apr 2024 08:24:29 GMT, Thomas Stuefe wrote: > We may want to make this consistent with subsequent RFEs. I agree, this can be a follow up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2082398044 From stuefe at openjdk.org Mon Apr 29 11:01:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:01:11 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Mon, 29 Apr 2024 10:42:34 GMT, Matthias Baesken wrote: >>> > I read through the comments twice and did not find a nullptr related question. Which question? >>> >>> See compilationMemoryStatistic.cpp . >>> >>> At some places we check the result for nullptr e.g. >>> >>> jdk/src/hotspot/share/compiler/compilationMemoryStatistic.cpp >>> >>> Line 79 in [b3bcc49](https://github.com/openjdk/jdk/commit/b3bcc49491b8f8ad337eb4c06201a5468e5c1159) >>> >>> ``` >>> const CompileTask* const task = th->task(); >>> ``` >> >> Yes, it is inconsistent. Allmost all code here (notably anything triggered from start- or end-compilation) are called from the compiler so we run on a compiler thread and in the scope of a ciEnv. So most of the existing nullptr checks are probably not needed. >> >> We may want to make this consistent with subsequent RFEs. > >> We may want to make this consistent with subsequent RFEs. > > I agree, this can be a follow up. Thanks @MBaesken . For some reason your remarks only got posted now, but I think we covered all points already. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18864#issuecomment-2082423162 From stuefe at openjdk.org Mon Apr 29 11:01:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:01:14 GMT Subject: RFR: 8330677: Add Per-Compilation memory usage to JFR [v2] In-Reply-To: References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 26 Apr 2024 07:03:19 GMT, Matthias Baesken wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Adapt test >> - merge >> - JDK-8330677-Add-Per-Compilation-memory-usage-to-JFR > > src/hotspot/share/jfr/metadata/metadata.xml line 611: > >> 609: >> 610: >> 611: > > Maybe say Peak arena usage, if this is the case as stated above ? > And some info that it is optional / must be enabled to see this data would probably help too. Again, there is no space for this. "label" gets used as column header. No space to add a descriptive text. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18864#discussion_r1582865694 From stuefe at openjdk.org Mon Apr 29 11:01:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:01:15 GMT Subject: Integrated: 8330677: Add Per-Compilation memory usage to JFR In-Reply-To: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> References: <30ZOBcQtn7hbUWoZ_p034H38c6meZmFQoymtSP0L7oM=.e0d297a8-3a53-4d9b-b26e-2eb4d93549e0@github.com> Message-ID: On Fri, 19 Apr 2024 13:12:21 GMT, Thomas Stuefe wrote: > We have the (opt-in, disabled by default) compiler memory statistics introduced with [JDK-8317683](https://bugs.openjdk.org/browse/JDK-8317683). > > Since temporary memory usage by compilers can significantly affect process footprint, it would make sense to expose at least the total peak usage per compilation via JFR. > > --- > > This patch adds "Arena Usage" to CompilationEvent. We now see in JMC how costly a compilation had been. (The cost can get very high, as we have seen just recently again with [JDK-8327247](https://bugs.openjdk.org/browse/JDK-8327247) ). > > ![jmc-memstat](https://github.com/openjdk/jdk/assets/6041414/8cac366a-2a8f-45ca-be40-d419712f81a7) This pull request has now been integrated. Changeset: 151ef5d4 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/151ef5d4d261c9fc740d3ccd64a70d3b9ccc1ab5 Stats: 30 lines in 8 files changed: 18 ins; 0 del; 12 mod 8330677: Add Per-Compilation memory usage to JFR Reviewed-by: kvn, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/18864 From stuefe at openjdk.org Mon Apr 29 11:25:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:25:21 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into compiler-default-limit - Disable memory limit for compiler/c2/TestFindNode.java until JDK-8331283 is fixed - Merge branch 'master' into compiler-default-limit - adapt tests - fix printout for mem limit - also print limit when printing compilation mem histo - default limit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/b9d9d5eb..eb547f60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=00-01 Stats: 4568 lines in 157 files changed: 2089 ins; 1993 del; 486 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Mon Apr 29 11:36:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:36:05 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:25:21 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into compiler-default-limit > - Disable memory limit for compiler/c2/TestFindNode.java until JDK-8331283 is fixed > - Merge branch 'master' into compiler-default-limit > - adapt tests > - fix printout for mem limit > - also print limit when printing compilation mem histo > - default limit Opened a bug to track the memory limit break in compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java. Since I cannot reproduce it, someone at Oracle should look at this: https://bugs.openjdk.org/browse/JDK-8331295 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2082482653 From mli at openjdk.org Mon Apr 29 11:38:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 11:38:27 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v13] In-Reply-To: References: Message-ID: > HI, > Can you have a look at this patch adding some tests for Math.round instrinsics? > Thanks! > > ### FYI: > During the development of RoundVF/RoundF, we faced the issues which were only spotted by running test exhaustively against 32/64 bits range of int/long. > It's helpful to add these exhaustive tests in jdk for future possible usage, rather than build it everytime when needed. > Of course, we need to put it in `manual` mode, so it's not run when `-automatic` jtreg option is specified which I guess is the mode CI used, please correct me if I'm assume incorrectly. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix issues; modify vm options to make sure test the expected behaviors. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17753/files - new: https://git.openjdk.org/jdk/pull/17753/files/02d7600f..b5207436 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17753&range=11-12 Stats: 12 lines in 2 files changed: 0 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17753/head:pull/17753 PR: https://git.openjdk.org/jdk/pull/17753 From mli at openjdk.org Mon Apr 29 11:38:27 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 11:38:27 GMT Subject: RFR: 8325438: Add exhaustive tests for Math.round intrinsics [v12] In-Reply-To: References: Message-ID: On Sun, 28 Apr 2024 11:34:57 GMT, Andrew Haley wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Add vectorized and scalar version Float tests checking full 32 bits range > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 99: > >> 97: System.out.println("Verification"); >> 98: int errn = 0; >> 99: for (long l = Integer.MIN_VALUE; l <= Integer.MAX_VALUE; l+=ARRLEN) { > > Can't you just do the obvious simple thing here? Not sure if I understand you correctly. Do you mean just use a while loop? seems it will only test the scalar version in that way. > test/hotspot/jtreg/compiler/vectorization/TestRoundVectorFloatAll.java line 102: > >> 100: for (int i = 0; i < ARRLEN; i++) { >> 101: input[i] = (int)(l+i); >> 102: } > > What is this array for? As far as i can tell it does nothing useful to batch the test results. Sorry, it's bug. Also fixed some other issues, e.g. in fact previously the newly added tests are not run, they still triggered TestRoundVectorFloatRandom. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1582907258 PR Review Comment: https://git.openjdk.org/jdk/pull/17753#discussion_r1582907013 From stuefe at openjdk.org Mon Apr 29 11:47:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 11:47:16 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v3] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/eb547f60..2728f2f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From fyang at openjdk.org Mon Apr 29 13:28:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 Apr 2024 13:28:05 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: References: Message-ID: <7YDEEcs3UmD2HYnyTq3qpF9xk_OGzx-2Qho8c7q6Phk=.7e94787a-521b-4c6c-af6a-d48b5b24bb1e@github.com> On Tue, 23 Apr 2024 15:02:10 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > The motivation is to implement `MulAddVS2VI`. > But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. > So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. > > Thanks src/hotspot/cpu/riscv/riscv_v.ad line 898: > 896: > 897: __ vmul_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), as_VectorRegister($tmp2$$reg)); > 898: __ vmacc_vv(as_VectorRegister($dst$$reg), as_VectorRegister($tmp1$$reg), as_VectorRegister($tmp3$$reg)); Hmm ... This doesn't look like a simple/straightforward sequence, isn't it? It's hard to tell whether we will benifit from this change without JMH testing on real RVV hardwares especially when VLEN is not large (No big difference in respect of number of instructions executed when VLEN=128-bits). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583085528 From mli at openjdk.org Mon Apr 29 13:34:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 13:34:04 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: <7YDEEcs3UmD2HYnyTq3qpF9xk_OGzx-2Qho8c7q6Phk=.7e94787a-521b-4c6c-af6a-d48b5b24bb1e@github.com> References: <7YDEEcs3UmD2HYnyTq3qpF9xk_OGzx-2Qho8c7q6Phk=.7e94787a-521b-4c6c-af6a-d48b5b24bb1e@github.com> Message-ID: On Mon, 29 Apr 2024 13:24:29 GMT, Fei Yang wrote: >> Hi, >> Can you help to review the patch? >> >> The motivation is to implement `MulAddVS2VI`. >> But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. >> So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. >> >> Thanks > > src/hotspot/cpu/riscv/riscv_v.ad line 898: > >> 896: >> 897: __ vmul_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), as_VectorRegister($tmp2$$reg)); >> 898: __ vmacc_vv(as_VectorRegister($dst$$reg), as_VectorRegister($tmp1$$reg), as_VectorRegister($tmp3$$reg)); > > Hmm ... This doesn't look like a simple/straightforward sequence, isn't it? It's hard to tell whether we will benifit from this change without JMH testing on real RVV hardwares especially when VLEN is not large (At least no big difference in respect of number of instructions executed when VLEN=128-bits). You're right. I'm waiting for my board to test it. I'll update when I get the data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583096349 From fyang at openjdk.org Mon Apr 29 13:42:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 Apr 2024 13:42:08 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: References: Message-ID: On Tue, 23 Apr 2024 15:02:10 GMT, Hamlin Li wrote: > Hi, > Can you help to review the patch? > > The motivation is to implement `MulAddVS2VI`. > But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. > So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. > > Thanks src/hotspot/cpu/riscv/riscv.ad line 6614: > 6612: ins_encode %{ > 6613: __ mul(t0, as_Register($src1$$reg), as_Register($src2$$reg)); > 6614: __ mul(t1, as_Register($src3$$reg), as_Register($src4$$reg)); Note that it's risky to use `t1` here as it's the flags register for C2 on riscv. So you might want to reserve another temporary register to replace it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583112081 From mli at openjdk.org Mon Apr 29 13:48:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 13:48:07 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: References: Message-ID: <5jhjXdKqwbZX4WYhwJyouI1zW4YhaUTDMUhOxflTbHc=.356a199a-0dea-4415-b92f-426c717d1ede@github.com> On Mon, 29 Apr 2024 13:38:42 GMT, Fei Yang wrote: >> Hi, >> Can you help to review the patch? >> >> The motivation is to implement `MulAddVS2VI`. >> But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. >> So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. >> >> Thanks > > src/hotspot/cpu/riscv/riscv.ad line 6614: > >> 6612: ins_encode %{ >> 6613: __ mul(t0, as_Register($src1$$reg), as_Register($src2$$reg)); >> 6614: __ mul(t1, as_Register($src3$$reg), as_Register($src4$$reg)); > > Note that it's risky to use `t1` here as it's the flags register for C2 on riscv. So you might want to reserve another temporary register to replace it. You're right. I'm not quite familiar with this part. Just a question about the flag register (t1) in riscv, do we already use t1 as flag register in any code? I've worked on something related to it, but seems it's not performant (https://bugs.openjdk.org/browse/JDK-8320989) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583122507 From fyang at openjdk.org Mon Apr 29 13:58:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 29 Apr 2024 13:58:04 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI In-Reply-To: <5jhjXdKqwbZX4WYhwJyouI1zW4YhaUTDMUhOxflTbHc=.356a199a-0dea-4415-b92f-426c717d1ede@github.com> References: <5jhjXdKqwbZX4WYhwJyouI1zW4YhaUTDMUhOxflTbHc=.356a199a-0dea-4415-b92f-426c717d1ede@github.com> Message-ID: On Mon, 29 Apr 2024 13:45:32 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 6614: >> >>> 6612: ins_encode %{ >>> 6613: __ mul(t0, as_Register($src1$$reg), as_Register($src2$$reg)); >>> 6614: __ mul(t1, as_Register($src3$$reg), as_Register($src4$$reg)); >> >> Note that it's risky to use `t1` here as it's the flags register for C2 on riscv. So you might want to reserve another temporary register to replace it. > > You're right. > > I'm not quite familiar with this part. Just a question about the flag register (t1) in riscv, do we already use t1 as flag register in any code? I've worked on something related to it, but seems it's not performant (https://bugs.openjdk.org/browse/JDK-8320989) Yes and you will find some usages if you grep "Set cr" in file riscv.ad: match(Set cr (CmpP (PartialSubtypeCheck sub super) zero)); match(Set cr (FastLock object box)); match(Set cr (FastUnlock object box)); match(Set cr (FastLock object box)); match(Set cr (FastUnlock object box)); These C2 nodes are expecting a return value in the flag register (aka `rFlagsReg cr`): 3814 // Flags register, used as output of compare logic 3815 operand rFlagsReg() 3816 %{ 3817 constraint(ALLOC_IN_RC(reg_flags)); 3818 match(RegFlags); 3819 3820 op_cost(0); 3821 format %{ "RFLAGS" %} 3822 interface(REG_INTER); 3823 %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583136900 From rrich at openjdk.org Mon Apr 29 14:21:20 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 29 Apr 2024 14:21:20 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 07:05:29 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve RC comment for Vladimir The case where only stores of constant values are merged wouldn't be difficult to get working also on big endian platforms I think. https://github.com/openjdk/jdk/pull/15990 seems to be a use of this optimization and it only makes use of this case, doesn't it? Do you have an idea how important the second pattern in the JBS issue is? a[1] = (byte)v; a[2] = (byte)(v >> 8 ); a[3] = (byte)(v >> 16); a[4] = (byte)(v >> 24); ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2082880150 From mli at openjdk.org Mon Apr 29 14:21:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 14:21:20 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review the patch? > > The motivation is to implement `MulAddVS2VI`. > But to enable `MulAddVS2VI`, `MulAddS2I` is prerequisite, although `MulAddS2I` does not bring extra benefit on riscv as we don't have an specific instruction of muladd on riscv. > So, this patch implement both `MulAddVS2VI` and `MulAddS2I`. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix t1 usage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18919/files - new: https://git.openjdk.org/jdk/pull/18919/files/1a20e9f0..7a34097a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18919&range=00-01 Stats: 6 lines in 1 file changed: 2 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18919/head:pull/18919 PR: https://git.openjdk.org/jdk/pull/18919 From mli at openjdk.org Mon Apr 29 14:21:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 29 Apr 2024 14:21:20 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI [v2] In-Reply-To: References: <5jhjXdKqwbZX4WYhwJyouI1zW4YhaUTDMUhOxflTbHc=.356a199a-0dea-4415-b92f-426c717d1ede@github.com> Message-ID: On Mon, 29 Apr 2024 13:55:21 GMT, Fei Yang wrote: >> You're right. >> >> I'm not quite familiar with this part. Just a question about the flag register (t1) in riscv, do we already use t1 as flag register in any code? I've worked on something related to it, but seems it's not performant (https://bugs.openjdk.org/browse/JDK-8320989) > > Yes and you will find some usages if you grep "Set cr" in file riscv.ad: > > match(Set cr (CmpP (PartialSubtypeCheck sub super) zero)); > match(Set cr (FastLock object box)); > match(Set cr (FastUnlock object box)); > match(Set cr (FastLock object box)); > match(Set cr (FastUnlock object box)); > > > These C2 nodes are expecting a result in the flag register (aka `rFlagsReg cr`): > > 3814 // Flags register, used as output of compare logic > 3815 operand rFlagsReg() > 3816 %{ > 3817 constraint(ALLOC_IN_RC(reg_flags)); > 3818 match(RegFlags); > 3819 > 3820 op_cost(0); > 3821 format %{ "RFLAGS" %} > 3822 interface(REG_INTER); > 3823 %} > > > And the result will be further used by compare flags and branch instructions like: > > match(If cmp cr); Thanks, I've updated it to use a passed in tmp reg. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583170683 From epeter at openjdk.org Mon Apr 29 14:26:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Apr 2024 14:26:20 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: <5fEEmFcudR2iZBlEJmQpjQRXFsfVNc-8BEz3WEAanDM=.17794dbf-611e-484c-a5b9-7562cabe8e50@github.com> On Mon, 29 Apr 2024 14:18:46 GMT, Richard Reingruber wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> improve RC comment for Vladimir > > The case where only stores of constant values are merged wouldn't be difficult to get working also on big endian platforms I think. > https://github.com/openjdk/jdk/pull/15990 seems to be a use of this optimization and it only makes use of this case, doesn't it? > Do you have an idea how important the second pattern in the JBS issue is? > > a[1] = (byte)v; > a[2] = (byte)(v >> 8 ); > a[3] = (byte)(v >> 16); > a[4] = (byte)(v >> 24); @reinrich feel free to implement and thest the big-endian version. I just wanted to limit the scope of the PR, and I don't really have a big-endian machine to test on. I'm currently tracking down a follow-up bug or two from this patch, so I have my hands full. I think one could surely get both, the constant and variable case implemented, in analogy to what I did. But maybe it would require some refactoring, to make sure the two versions live together nicely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2082890730 From rrich at openjdk.org Mon Apr 29 14:34:21 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 29 Apr 2024 14:34:21 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v35] In-Reply-To: References: Message-ID: On Tue, 16 Apr 2024 07:05:29 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve RC comment for Vladimir Thanks for the quick answer. I might play a little bit with the version that stores constants. I expect the effort to be small there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-2082907736 From roland at openjdk.org Mon Apr 29 14:37:15 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Apr 2024 14:37:15 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: On Thu, 25 Apr 2024 09:29:59 GMT, Galder Zamarre?o wrote: >> Adding C1 intrinsic for primitive array clone invocations for aarch64 and x86 architectures. >> >> The intrinsic includes a change to avoid zeroing the newly allocated array because its contents are copied over within the same intrinsic with arraycopy. This means that the performance of primitive array clone exceeds that of primitive array copy. As an example, here are the microbenchmark results on darwin/aarch64: >> >> >> $ make test TEST="micro:java.lang.ArrayClone" MICRO="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> Benchmark (size) Mode Cnt Score Error Units >> ArrayClone.byteArraycopy 0 avgt 15 3.476 ? 0.018 ns/op >> ArrayClone.byteArraycopy 10 avgt 15 3.740 ? 0.017 ns/op >> ArrayClone.byteArraycopy 100 avgt 15 7.124 ? 0.010 ns/op >> ArrayClone.byteArraycopy 1000 avgt 15 39.301 ? 0.106 ns/op >> ArrayClone.byteClone 0 avgt 15 3.478 ? 0.008 ns/op >> ArrayClone.byteClone 10 avgt 15 3.562 ? 0.007 ns/op >> ArrayClone.byteClone 100 avgt 15 5.888 ? 0.206 ns/op >> ArrayClone.byteClone 1000 avgt 15 25.762 ? 0.203 ns/op >> ArrayClone.intArraycopy 0 avgt 15 3.199 ? 0.016 ns/op >> ArrayClone.intArraycopy 10 avgt 15 4.521 ? 0.008 ns/op >> ArrayClone.intArraycopy 100 avgt 15 17.429 ? 0.039 ns/op >> ArrayClone.intArraycopy 1000 avgt 15 178.432 ? 0.777 ns/op >> ArrayClone.intClone 0 avgt 15 3.406 ? 0.016 ns/op >> ArrayClone.intClone 10 avgt 15 4.272 ? 0.006 ns/op >> ArrayClone.intClone 100 avgt 15 13.110 ? 0.122 ns/op >> ArrayClone.intClone 1000 avgt 15 113.196 ? 13.400 ns/op >> >> >> It also includes an optimization to avoid instantiating the array copy stub in scenarios like this. >> >> I run hotspot compiler tests successfully limiting them to C1 compilation darwin/aarch64, linux/x86_64 and linux/686. E.g. >> >> >> $ make test TEST="hotspot_compiler" JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=1" >> ... >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:hotspot_compiler 1234 1234 0 0 >> >> >> One question I had is what to do about non-primitive object arrays, see my [question](https://bugs.openjdk.org/browse/JDK-8302850?focusedId=14634879&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14634879) on the issue. @cl4es any thoughts? >> >>... > > Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: > > Remove whitespace Looks good overall otherwise. src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2508: > 2506: } > 2507: > 2508: if (stub != nullptr) { Isn't that left over code from your previous implementation? Can `stub` be null? src/hotspot/share/c1/c1_GraphBuilder.cpp line 2030: > 2028: receiver = state()->stack_at(index); > 2029: ciType* type = receiver->exact_type(); > 2030: if (type != nullptr && type->is_loaded()) { Is it the case that we can't see an interface here? Or that we think it's ok if we see an interface here? ------------- PR Review: https://git.openjdk.org/jdk/pull/17667#pullrequestreview-2028695703 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1583194942 PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1583193309 From luhenry at openjdk.org Mon Apr 29 15:28:09 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 29 Apr 2024 15:28:09 GMT Subject: RFR: 8321008: RISC-V: C2 MulAddVS2VI [v2] In-Reply-To: References: <7YDEEcs3UmD2HYnyTq3qpF9xk_OGzx-2Qho8c7q6Phk=.7e94787a-521b-4c6c-af6a-d48b5b24bb1e@github.com> Message-ID: On Mon, 29 Apr 2024 13:31:18 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv_v.ad line 898: >> >>> 896: >>> 897: __ vmul_vv(as_VectorRegister($dst$$reg), as_VectorRegister($dst$$reg), as_VectorRegister($tmp2$$reg)); >>> 898: __ vmacc_vv(as_VectorRegister($dst$$reg), as_VectorRegister($tmp1$$reg), as_VectorRegister($tmp3$$reg)); >> >> Hmm ... This doesn't look like a simple/straightforward sequence, isn't it? It's hard to tell whether we will benifit from this change without JMH testing on real RVV hardwares especially when VLEN is not large (At least no big difference in respect of number of instructions executed when VLEN=128-bits). > > You're right. > I'm waiting for my board to test it. I'll update when I get the data. The general advantage over no intrinsic is that we avoid the spilling from the vector register to the heap in case it falls back to whatever the C2 compiler would compile to. That avoids hit to the L1 or load/store CPU pipe even. I agree, we'll need runs with JMH on actual hardware to verify that it's indeed a win. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18919#discussion_r1583276523 From epeter at openjdk.org Mon Apr 29 16:09:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Apr 2024 16:09:12 GMT Subject: RFR: 8331252: C2: MergeStores: handle negative shift values Message-ID: Somehow, I have not thought of negative shift constants, and there was no regression test for it. The fuzzer now found a case. **I convert the assert into a condition.** ------------- Commit messages: - the fix itself - 8331252 Changes: https://git.openjdk.org/jdk/pull/19001/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19001&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331252 Stats: 26 lines in 2 files changed: 23 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19001.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19001/head:pull/19001 PR: https://git.openjdk.org/jdk/pull/19001 From kvn at openjdk.org Mon Apr 29 16:35:11 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 16:35:11 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:33:29 GMT, Thomas Stuefe wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into compiler-default-limit >> - Disable memory limit for compiler/c2/TestFindNode.java until JDK-8331283 is fixed >> - Merge branch 'master' into compiler-default-limit >> - adapt tests >> - fix printout for mem limit >> - also print limit when printing compilation mem histo >> - default limit > > Opened a bug to track the memory limit break in compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java. Since I cannot reproduce it, someone at Oracle should look at this: https://bugs.openjdk.org/browse/JDK-8331295 Thank you, @tstuefe, for filing these bugs. One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2083178344 From kvn at openjdk.org Mon Apr 29 16:46:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 16:46:06 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v3] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:47:16 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed I filed bug for our closed test failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2083196733 From kvn at openjdk.org Mon Apr 29 16:49:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 16:49:15 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v3] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:47:16 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Disable memory limit for compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java until JDK-8331295 is fixed test/hotspot/jtreg/compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java line 36: > 34: package compiler.loopopts; > 35: > 36: // Note; we disable the implicit memory limit of 1G in debug JVMs until JDK-8331283 is fixed Different bug for this test: JDK-8331295 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18969#discussion_r1583400429 From kvn at openjdk.org Mon Apr 29 16:50:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 16:50:04 GMT Subject: RFR: 8331252: C2: MergeStores: handle negative shift values In-Reply-To: References: Message-ID: <8RcZ5sfJYBUPdA24EO2ut5mMFSCJFtFGNdQoTCmr84w=.650750f9-8312-4842-91a7-cb508325803f@github.com> On Mon, 29 Apr 2024 15:43:06 GMT, Emanuel Peter wrote: > Somehow, I have not thought of negative shift constants, and there was no regression test for it. > The fuzzer now found a case. > > **I convert the assert into a condition.** Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19001#pullrequestreview-2029038240 From shade at openjdk.org Mon Apr 29 17:36:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 29 Apr 2024 17:36:06 GMT Subject: RFR: 8331252: C2: MergeStores: handle negative shift values In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 15:43:06 GMT, Emanuel Peter wrote: > Somehow, I have not thought of negative shift constants, and there was no regression test for it. > The fuzzer now found a case. > > **I convert the assert into a condition.** Ah, just caught it in my testing. Fix looks good. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19001#pullrequestreview-2029137160 From stuefe at openjdk.org Mon Apr 29 17:49:38 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 17:49:38 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v4] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix jdk note number in test comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/2728f2f0..42f36401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From szaldana at openjdk.org Mon Apr 29 18:24:24 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 29 Apr 2024 18:24:24 GMT Subject: RFR: 8331088: Incorrect TraceLoopPredicate output Message-ID: Hi all, This PR addresses [8331088](https://bugs.openjdk.org/browse/JDK-8331088) fixing the incorrect print output. Thanks, Sonia ------------- Commit messages: - Incorrect TraceLoopPredicate output Changes: https://git.openjdk.org/jdk/pull/19004/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19004&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331088 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19004.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19004/head:pull/19004 PR: https://git.openjdk.org/jdk/pull/19004 From stuefe at openjdk.org Mon Apr 29 18:33:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 18:33:05 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 11:33:29 GMT, Thomas Stuefe wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into compiler-default-limit >> - Disable memory limit for compiler/c2/TestFindNode.java until JDK-8331283 is fixed >> - Merge branch 'master' into compiler-default-limit >> - adapt tests >> - fix printout for mem limit >> - also print limit when printing compilation mem histo >> - default limit > > Opened a bug to track the memory limit break in compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java. Since I cannot reproduce it, someone at Oracle should look at this: https://bugs.openjdk.org/browse/JDK-8331295 > Thank you, @tstuefe, for filing these bugs. > > One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2083390670 From kvn at openjdk.org Mon Apr 29 18:33:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 18:33:07 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v4] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 17:49:38 GMT, Thomas Stuefe wrote: >> See [1] for previous discussions. >> >> We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. >> >> The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. >> >> Examples: >> >> This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` >> >> This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` >> >> >> --- >> >> The patch: >> >> 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. >> 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. >> 3) Adapted and extended tests >> >> I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. >> >> >> Tested: >> >> - manually on Mac m1 (debug and release) >> - GHAs are running >> - but Oracle will do more testing before this goes in >> >> [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > fix jdk note number in test comment test/hotspot/jtreg/compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java line 30: > 28: * @summary Test which causes a stack overflow segmentation fault with -XX:VerifyIterativeGVN=1 due to a too deep recursion in Node::verify_recur(). > 29: * > 30: * @run main/othervm/timeout=600 -Xcomp -XX:VerifyIterativeGVN=1 -XX:CompileCommand=compileonly,compiler.loopopts.TestDeepGraphVerifyIterativeGVN::* -XX:CompileCommand=memlimit,TestFindNode::test,0 Why you list TestsFindNode here? It is mistake in JDK-8331295 Description referencing TestsFindNode failure output. I added comment in JBS with correct TestDeepGraphVerifyIterativeGVN failure output. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18969#discussion_r1583551705 From stuefe at openjdk.org Mon Apr 29 18:48:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 18:48:16 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v5] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - another fix - fix accidental slip in of another test name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/42f36401..d06406ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From stuefe at openjdk.org Mon Apr 29 18:52:29 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 29 Apr 2024 18:52:29 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v6] In-Reply-To: References: Message-ID: > See [1] for previous discussions. > > We'd like to introduce a default memory limit for compilations in debug builds. That way, we can catch pathological compiler errors that have an unreasonably high per-compilation memory footprint early during testing. > > The default limit affects all compilations, unless the method is subject to a memory limit set from command line. Meaning, `-XX:CompileCommand=MemLimit,...` overrules the default. > > Examples: > > This lowers the memlimit for j.l.String methods - all methods will have the default 1GB limit in a debug JVM. Only j.l.String will run with a 100M limit: `-XX:CompileCommand=MemLimit,java.lang.String::*,100m` > > This disables the default memlimit globally: `-XX:CompileCommand=MemLimit,*.*,0` > > > --- > > The patch: > > 1) adds a debug-only default memory limit of **1GB** (as proposed by @vnkozlov). The limit action is "crash", meaning we will assert. > 2) To test the mechanics, we now print out the memory limit for each compilation in the compilation cost record. > 3) Adapted and extended tests > > I also fixed up some copyrights that I overlooked last year when adding the compiler memory statistics this patch builds atop of. > > > Tested: > > - manually on Mac m1 (debug and release) > - GHAs are running > - but Oracle will do more testing before this goes in > > [1] https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - fix copyrights - fix copyrights ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18969/files - new: https://git.openjdk.org/jdk/pull/18969/files/d06406ec..5a460a1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18969&range=04-05 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18969/head:pull/18969 PR: https://git.openjdk.org/jdk/pull/18969 From kvn at openjdk.org Mon Apr 29 19:00:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 19:00:06 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:30:30 GMT, Thomas Stuefe wrote: > > Thank you, @tstuefe, for filing these bugs. > > One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? > > Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. > > The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. Good. I think we need to push it before this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2083444525 From cslucas at openjdk.org Mon Apr 29 20:22:23 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 29 Apr 2024 20:22:23 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v2] In-Reply-To: References: Message-ID: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Add test case with non-exact object Allocation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18851/files - new: https://git.openjdk.org/jdk/pull/18851/files/b1398003..350ca6b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18851&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18851&range=00-01 Stats: 73 lines in 1 file changed: 73 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18851/head:pull/18851 PR: https://git.openjdk.org/jdk/pull/18851 From cslucas at openjdk.org Mon Apr 29 20:22:23 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 29 Apr 2024 20:22:23 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v2] In-Reply-To: References: Message-ID: <7Ou3simvVB0PA1QO9N9Q6wFJ6JrepWrWhsmBDayw0Lo=.2fff7c66-6d56-4185-a6a8-5822bd9298d8@github.com> On Fri, 19 Apr 2024 01:00:11 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Add test case with non-exact object Allocation. > > Good. > > Did you run CTW test from bug report? Is it possible to extract simple reproducer from it and add it to this PR? @vnkozlov - I just pushed a test case to trigger the issue. When you have time please run your tests on it. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18851#issuecomment-2083591128 From duke at openjdk.org Mon Apr 29 20:28:12 2024 From: duke at openjdk.org (Charles Connell) Date: Mon, 29 Apr 2024 20:28:12 GMT Subject: RFR: 8330611: AES-CTR vector intrinsic may read out of bounds (x86_64, AVX-512) [v2] In-Reply-To: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> References: <3Cfqil9P4ui_ye-tqv9jSmYmbMQCmmXWcokYO96jLEY=.fb4c05cf-1309-45f6-b2b4-53897e9ae6d1@github.com> Message-ID: <0GUIgJjTdvyuQoVzPxKdfOFXRQnEjiT-SuMeZzULx1A=.2d3fc6e8-240f-4fd3-853e-1ceadcf2f063@github.com> On Wed, 24 Apr 2024 00:21:40 GMT, Martin Balao wrote: >> We would like to propose a fix for 8330611. >> >> To avoid an out of bounds memory read when the input's size is not multiple of the block size, we read the plaintext/ciphertext tail in 8, 4, 2 and 1 byte batches depending on what it is guaranteed to be available by 'len_reg'. This behavior replaces the read of 16 bytes of input upfront and later discard of spurious data. >> >> While we add 3 extra instructions + 3 extra memory reads in the worst case ?to the same cache line probably?, the performance impact of this fix should be low because it only occurs at the end of the input and when its length is not multiple of the block size. >> >> A reliable test case for this bug is hard to develop because we would need accurate heap allocation. The fact that spuriously read data is silently discarded most of the time makes this bug harder to observe. No regressions have been observed in the compiler/codegen/aes jtreg category. Additionally, we verified the fix manually with the debugger. >> >> This work is in collaboration with @franferrax . > > Martin Balao has updated the pull request incrementally with one additional commit since the last revision: > > Avoid register conflict in Windows. > > Co-authored-by: Francisco Ferrari Bihurriet > Co-authored-by: Martin Balao I am planning a blog post about discovering and reporting this bug that will appear on my company's website. I plan to credit everyone involved with the names I see on your profiles here on Github and/or on the OpenJDK JIRA instance. I also may link to github profiles or personal websites if I come across them. If you would like to have your name or link changed or omitted, please let me know. My email address is cconnell at hubspot.com if you would like to contact me there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18849#issuecomment-2083601585 From coleenp at openjdk.org Mon Apr 29 21:16:07 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 29 Apr 2024 21:16:07 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6] In-Reply-To: References: Message-ID: <7JpSztSwTJ1l0-NhVPd8PtOuLCa3mMZq_1r2E-pqKyk=.65a5c7b7-2628-4b50-8cc1-3a816e43b863@github.com> On Sun, 28 Apr 2024 07:26:48 GMT, Dean Long wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1787: >> >>> 1785: lea(cache, Address(cache, index)); >>> 1786: // Prevents stale data from being read after the bytecode is patched to the fast bytecode >>> 1787: membar(MacroAssembler::LoadLoad); >> >> if we put LoadLoad in here, maybe it is redundant for TemplateTable::patch_bytecode(..) >> >> https://github.com/openjdk/jdk/blob/a08af97fda06e785d1c3a5a17e562e65150bbb07/src/hotspot/cpu/aarch64/templateTable_aarch64.cpp#L192 >> >> because there has a Load-acquire in L199. >> can we change >> `load_field_entry(Register cache, Register index, int bcp_offset = 1) ` >> to >> `load_field_entry(Register cache, Register index, int bcp_offset = 1, bool needLoadLoad = 1) `? >> >> then change L192 like this >>
>> @@ -189,7 +189,7 @@ void TemplateTable::patch_bytecode(Bytecodes::Code bc, Register bc_reg,
>>        // additional, required work.
>>        assert(byte_no == f1_byte || byte_no == f2_byte, "byte_no out of range");
>>        assert(load_bc_into_bc_reg, "we use bc_reg as temp");
>> -      __ load_field_entry(temp_reg, bc_reg);
>> +      __ load_field_entry(temp_reg, bc_reg, 1 /*bcp_offset*/, false /*needLoadLoad*/);
>>        if (byte_no == f1_byte) {
>>          __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::get_code_offset())));
>>        } else {
>>          __ lea(temp_reg, Address(temp_reg, in_bytes(ResolvedFieldEntry::put_code_offset())));
>>        }                                                                         
>>        // Load-acquire the bytecode to match store-release in ResolvedFieldEntry::fill_in()
>>        __ ldarb(temp_reg, temp_reg);
>> 
> > I believe patch_bytecode only needs a LoadStore between reading the cache bytecode and patching at bcp[0], so I would agree the LoadLoad in load_field_entry is not needed here. But the reason is not because we have already load-acquire. The reason is because there are not two loads that need ordering. There is only a load and a store. We can't replace the load-acquire with a LoadLoad, for example. The LoadLoad in load_field_entry is needed for the second thread that (t2-1) loads the patched bytecode, and then (t2-2) loads the pointer to the cpCache entry for field, then (t2-3) loads the values from the cpCache entry pointer. The LoadLoad is between t2-2 and t2-3. The ldarb in patch_bytecode does make the LoadLoad in load_field_entry redundant as pointed out by @sunny868 but seems harmless in terms of performance and correctness, ie. not needing a special parameter. The reason that it's in load_field_entry is that there are other callers who do not load acquire the get_code and put_code fields. So it's safer and less error prone to forget with it in load_field_entry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18477#discussion_r1583783084 From cslucas at openjdk.org Mon Apr 29 21:19:33 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 29 Apr 2024 21:19:33 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v3] In-Reply-To: References: Message-ID: > The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. > > The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. > > Tested on Linux x64 tiers1-3. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix formatting. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18851/files - new: https://git.openjdk.org/jdk/pull/18851/files/350ca6b0..425e0d36 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18851&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18851&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18851/head:pull/18851 PR: https://git.openjdk.org/jdk/pull/18851 From duke at openjdk.org Mon Apr 29 21:55:31 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Mon, 29 Apr 2024 21:55:31 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/7f845511..41398bba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=06-07 Stats: 10 lines in 1 file changed: 4 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From duke at openjdk.org Mon Apr 29 21:55:31 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Mon, 29 Apr 2024 21:55:31 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 23:07:44 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> from review comments: simplification, fix comments and white space > > It looks to me that the source and dest are reversed in the following instruction in call to simd_prefix_and_encode, perhaps that should be a separate PR: > // Do we have this wrong src and dst reversed in simd_prefix_and_encode? > void Assembler::pextrw(Register dst, XMMRegister src, int imm8) { > assert(VM_Version::supports_sse2(), ""); > InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ false); > int encode = simd_prefix_and_encode(as_XMMRegister(dst->encoding()), xnoreg, src, VEX_SIMD_66, VEX_OPCODE_0F, &attributes); > emit_int24((unsigned char)0xC5, (0xC0 | encode), imm8); > } > Once that PR is fixed, is_src_gpr should be set to true for this one as well. @sviswa7 wrote > Also the following instruction is not handled for egprs: void Assembler::popq(Register dst) Thank you. Updated popq(Register) for egpr support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2083747132 From duke at openjdk.org Mon Apr 29 21:55:32 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Mon, 29 Apr 2024 21:55:32 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v4] In-Reply-To: <_u8OUYZTsDfl7lzwoee3zewukw-yuFsn1_37Fn7iY5o=.2824d10d-30dd-4314-bae7-0beac0d79e2d@github.com> References: <_u8OUYZTsDfl7lzwoee3zewukw-yuFsn1_37Fn7iY5o=.2824d10d-30dd-4314-bae7-0beac0d79e2d@github.com> Message-ID: On Fri, 26 Apr 2024 17:37:05 GMT, Sandhya Viswanathan wrote: >> Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: >> >> bug fix in other ::prefix_rex2 > > src/hotspot/cpu/x86/assembler_x86.cpp line 13260: > >> 13258: } else { >> 13259: emit_int24((prefix & 0xFF00) >> 8, prefix & 0x00FF, b1); >> 13260: } > > We need a check for UseAPX > 0 here. Thank you. Added check. > src/hotspot/cpu/x86/assembler_x86.cpp line 14004: > >> 14002: int encode = prefixq_and_encode(dst->encoding(), src->encoding(), true); >> 14003: emit_opcode_prefix_and_encoding((unsigned char)0xB8, 0xC0, encode); >> 14004: } > > void Assembler::popcntq(Register dst, Address src) also need to be handled for rex2 generation. get_prefixq() will return a 16 bit entity and so call to emit_int32 directly is not correct. > emit_int32((unsigned char)0xF3, > get_prefixq(src, dst), > 0x0F, > (unsigned char)0xB8); > > Likewise void Assembler::cvttsd2siq(Register dst, Address src) also needs to be updated to handle extended gprs. Thank you, missed these. Updated popcntq(Register dst, Address src) and cvttsd2siq(Register dst, Address src) for egpr support. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1583824704 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1583824212 From jrose at openjdk.org Mon Apr 29 22:04:05 2024 From: jrose at openjdk.org (John R Rose) Date: Mon, 29 Apr 2024 22:04:05 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 11:35:25 GMT, Vladimir Ivanov wrote: > For MethodHandle linkers all arguments are casted to signature classes when target method is known. > > It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. > > Proposed fix avoids casts when signature class is unloaded. > > Testing: hs-tier1 - hs-tier4 When a parameter happens to be typed as an unloaded class, the call site can be can be compiled and survive a long time in optimized form, as long as only nulls as passed. This is because (a) methods do not necessarily issue casts to their arguments, and (b) even if a `checkcast` instruction is issued, it short-circuits on null, without even trying to resolve the `checkcast` class. C2 supports this corner case, in part, using the `assert_null` IR generation option, which says ?as long as this value is null, keep it, otherwise recompile?. If we want to emulate this for MHs, we need to ensure that null short-circuits, even if non-null values must somehow cope with the unloaded class. (It?s not too hard, but requires some change to configuration, since the non-null value is a witness that the class has been loaded, so the existing code shape, which assumes there is no resolution for the class, is out of date. BTW a return type can also be an unloaded class, with similar considerations.) So ,what do we do on the case of a null value of an unloaded class parameter (or return value) when a call instruction is being emulated by a MH? And, does this fix change that policy? ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2083758935 From sviswanathan at openjdk.org Mon Apr 29 22:09:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 29 Apr 2024 22:09:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v7] In-Reply-To: References: Message-ID: On Sat, 27 Apr 2024 00:07:21 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fix 4 more src_is_gpr = true cases, add asserts to check for UseAPX src/hotspot/cpu/x86/assembler_x86.cpp line 2632: > 2630: prefix(src, true /* is_map1 */); > 2631: emit_int8((unsigned char)0xAE); > 2632: emit_operand(as_Register(2), src, 0); Even when UseAVX > 0, if the src address uses higher bank registers, ldmxcsr/stmxcsr should be encoded using the REX2 i.e. the else path. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1583835712 From kvn at openjdk.org Mon Apr 29 23:04:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Apr 2024 23:04:04 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v3] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 21:19:33 GMT, Cesar Soares Lucas wrote: >> The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. >> >> The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. >> >> Tested on Linux x64 tiers1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix formatting. I submitted testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18851#issuecomment-2083828678 From dlong at openjdk.org Mon Apr 29 23:05:10 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 29 Apr 2024 23:05:10 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: On Mon, 29 Apr 2024 14:33:07 GMT, Roland Westrelin wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove whitespace > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 2030: > >> 2028: receiver = state()->stack_at(index); >> 2029: ciType* type = receiver->exact_type(); >> 2030: if (type != nullptr && type->is_loaded()) { > > Is it the case that we can't see an interface here? Or that we think it's ok if we see an interface here? We can't see an interface here because it will get rejected by `ciInstanceKlass::exact_klass`, so we could even assert for that here if we wanted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1583883023 From sviswanathan at openjdk.org Mon Apr 29 23:31:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 29 Apr 2024 23:31:07 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v8] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 21:55:31 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > add egpr support for popcntq(R,A), cvttsd2siq(R,A), popq(R) src/hotspot/cpu/x86/assembler_x86.cpp line 1941: > 1939: if (needs_rex2(crc, v)) { > 1940: InstructionAttr attributes(AVX_128bit, /* rex_w */ sizeInBytes == 8, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 1941: int encode = vex_prefix_and_encode(crc->encoding(), 0, v->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, true); The PP bits are VEX_SIMD_66 if sizeInBytes=2. src/hotspot/cpu/x86/assembler_x86.cpp line 1989: > 1987: InstructionAttr attributes(AVX_128bit, /* vex_w */ sizeInBytes == 8, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); > 1988: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_64bit); > 1989: vex_prefix(adr, 0, crc->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes); The PP bits are VEX_SIMD_66 if sizeInBytes=2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1583888619 PR Review Comment: https://git.openjdk.org/jdk/pull/18476#discussion_r1583889127 From duke at openjdk.org Mon Apr 29 23:54:19 2024 From: duke at openjdk.org (Steve Dohrmann) Date: Mon, 29 Apr 2024 23:54:19 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: > Add instruction encoding support for Intel APX extended general-purpose registers: > > Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. > > By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: fixes: pp bits in crc32, REX2 branch in ldmxcsr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18476/files - new: https://git.openjdk.org/jdk/pull/18476/files/41398bba..01241d48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18476&range=07-08 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18476.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18476/head:pull/18476 PR: https://git.openjdk.org/jdk/pull/18476 From dlong at openjdk.org Tue Apr 30 01:16:14 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 30 Apr 2024 01:16:14 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v6] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 21:10:00 GMT, Matias Saavedra Silva wrote: >> A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Improved comment Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18477#pullrequestreview-2030002885 From chagedorn at openjdk.org Tue Apr 30 06:58:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 Apr 2024 06:58:04 GMT Subject: RFR: 8331088: Incorrect TraceLoopPredicate output In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:11:51 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8331088](https://bugs.openjdk.org/browse/JDK-8331088) fixing the incorrect print output. > > Thanks, > Sonia This would have been fixed with https://github.com/openjdk/jdk/pull/16877. But I don't mind fixing it separately. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19004#pullrequestreview-2030352054 From galder at openjdk.org Tue Apr 30 08:43:10 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 30 Apr 2024 08:43:10 GMT Subject: RFR: 8302850: Implement C1 clone intrinsic that reuses arraycopy code for primitive arrays [v13] In-Reply-To: References: <9Eoh8hOSSVvAtf9iVQ6hflQyceUtt4dpZdqm61zg5XI=.358a4d79-70d9-4b54-85d5-37c6817f0fae@github.com> Message-ID: On Mon, 29 Apr 2024 14:34:11 GMT, Roland Westrelin wrote: >> Galder Zamarre?o has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove whitespace > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2508: > >> 2506: } >> 2507: >> 2508: if (stub != nullptr) { > > Isn't that left over code from your previous implementation? Can `stub` be null? I think it can be null, see https://github.com/openjdk/jdk/pull/17667/files#diff-e6f3ae4492965efd0d73c3f31073ec8b77e020740b009f92312658bac1e5f978R358 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17667#discussion_r1584371713 From stuefe at openjdk.org Tue Apr 30 09:04:28 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 30 Apr 2024 09:04:28 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit Message-ID: When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. Also provided regression test. Tested: - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. - GHAs ------------- Commit messages: - add regression test - JDK-8331344-Hitting-compiler-memory-limit-may-not-produce-a-replay-file Changes: https://git.openjdk.org/jdk/pull/19005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331344 Stats: 29 lines in 3 files changed: 27 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19005/head:pull/19005 PR: https://git.openjdk.org/jdk/pull/19005 From mli at openjdk.org Tue Apr 30 09:54:16 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 30 Apr 2024 09:54:16 GMT Subject: RFR: 8321003: RISC-V: C2 MulReductionVI Message-ID: Hi, Can you help to review this patch to implement MulReductionVI/MulReductionVL/MulReductionVF/MulReductionVD? On riscv, there is no straightforward instructions to do it, but we can do it with a reduction tree, which could reduce the time complexity to lg(N). Thanks ## Performance TBD ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/19015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321003 Stats: 150 lines in 9 files changed: 149 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19015/head:pull/19015 PR: https://git.openjdk.org/jdk/pull/19015 From asotona at openjdk.org Tue Apr 30 11:56:20 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 11:56:20 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations Message-ID: Hi, During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. Thank you, Adam ------------- Commit messages: - added impl comment - removed list of predefined attributes - move mappers implementations to AbstractAttributeMapper - 8331291: java.lang.classfile.Attributes class performs a lot of static initializations Changes: https://git.openjdk.org/jdk/pull/19006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331291 Stats: 2029 lines in 47 files changed: 904 ins; 619 del; 506 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From liach at openjdk.org Tue Apr 30 11:56:20 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 30 Apr 2024 11:56:20 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:48:53 GMT, Adam Sotona wrote: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Nice changes! Remarks 1. The `INSTANCE` fields should be declared `final`, still safe for lazy initialization. 2. Not from this patch, but the `AttributeStability stability()` overrides can be moved to `AbstractAttributeMapper`, stored in a field, to decrease source code size. 3. `AbstractAttributeMapper` might become sealed now that its implementations are no longer anonymous, might move it to be a private nested class of `Attributes` to omit the long list of permits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2083832111 From asotona at openjdk.org Tue Apr 30 11:56:20 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 11:56:20 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations In-Reply-To: References: Message-ID: <9NS-fjr36lj9hA4htcRwnIEeN3qZHTXSOAWBeyBdAs4=.d2b88f96-f9f6-4f66-887c-1bfcd709e7d9@github.com> On Mon, 29 Apr 2024 23:04:38 GMT, Chen Liang wrote: > Nice changes! Remarks > > 1. The `INSTANCE` fields should be declared `final`, still safe for lazy initialization. > 2. Not from this patch, but the `AttributeStability stability()` overrides can be moved to `AbstractAttributeMapper`, stored in a field, to decrease source code size. > 3. `AbstractAttributeMapper` might become sealed now that its implementations are no longer anonymous, might move it to be a private nested class of `Attributes` to omit the long list of permits. Yes, that is part of the plan for today :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2084446739 From liach at openjdk.org Tue Apr 30 12:12:06 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 30 Apr 2024 12:12:06 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations In-Reply-To: References: Message-ID: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> On Mon, 29 Apr 2024 18:48:53 GMT, Adam Sotona wrote: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Also the `final` modifier addition should be included in the CSR, as it changes the access modifiers even if it has no real impact. src/java.base/share/classes/jdk/internal/classfile/impl/BoundAttribute.java line 996: > 994: public static AttributeMapper standardAttribute(Utf8Entry name) { > 995: // critical bootstrap path, so no lambdas nor method handles here > 996: return switch (name.hashCode()) { I think we can safely switch over strings, as they are compiled to hashCode switch like what you explicitly have right now. Isn't that the case? test/jdk/jdk/classfile/AttributesTest.java line 26: > 24: /* > 25: * @test > 26: * @summary Testing Attributes API. Can add a line `@bug 8331291` ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2085160221 PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1584686905 PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1584687500 From asotona at openjdk.org Tue Apr 30 12:23:36 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 12:23:36 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v2] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: added bug number ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/b7b35c5d..27238368 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From asotona at openjdk.org Tue Apr 30 12:23:36 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 12:23:36 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations In-Reply-To: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> References: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> Message-ID: On Tue, 30 Apr 2024 12:09:05 GMT, Chen Liang wrote: > Also the `final` modifier addition should be included in the CSR, as it changes the access modifiers even if it has no real impact. The class already had only private constructor, so there was no way to extend it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2085173489 From asotona at openjdk.org Tue Apr 30 12:23:37 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 12:23:37 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v2] In-Reply-To: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> References: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> Message-ID: On Tue, 30 Apr 2024 12:05:46 GMT, Chen Liang wrote: >> Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: >> >> added bug number > > src/java.base/share/classes/jdk/internal/classfile/impl/BoundAttribute.java line 996: > >> 994: public static AttributeMapper standardAttribute(Utf8Entry name) { >> 995: // critical bootstrap path, so no lambdas nor method handles here >> 996: return switch (name.hashCode()) { > > I think we can safely switch over strings, as they are compiled to hashCode switch like what you explicitly have right now. Isn't that the case? Freshly parsed Utf8Entries conversion to String is expensive and unnecessary. We should be very careful when to ask for the conversion as it significantly affects some benchmarks. > test/jdk/jdk/classfile/AttributesTest.java line 26: > >> 24: /* >> 25: * @test >> 26: * @summary Testing Attributes API. > > Can add a line `@bug 8331291` Added, thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1584696556 PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1584707230 From liach at openjdk.org Tue Apr 30 12:31:05 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 30 Apr 2024 12:31:05 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v2] In-Reply-To: References: <6tf20Z4MT9nZkowA_NZmPnCJSJBqAvv2oVx55QzlGA0=.3b8bc56a-1fd0-4fb8-a81d-97e505c242d8@github.com> Message-ID: On Tue, 30 Apr 2024 12:13:59 GMT, Adam Sotona wrote: >> src/java.base/share/classes/jdk/internal/classfile/impl/BoundAttribute.java line 996: >> >>> 994: public static AttributeMapper standardAttribute(Utf8Entry name) { >>> 995: // critical bootstrap path, so no lambdas nor method handles here >>> 996: return switch (name.hashCode()) { >> >> I think we can safely switch over strings, as they are compiled to hashCode switch like what you explicitly have right now. Isn't that the case? > > Freshly parsed Utf8Entries conversion to String is expensive and unnecessary. We should be very careful when to ask for the conversion as it significantly affects some benchmarks. You are right, I forgot these are Utf8Entry instead of Strings. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19006#discussion_r1584719802 From liach at openjdk.org Tue Apr 30 12:41:06 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 30 Apr 2024 12:41:06 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v2] In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 12:23:36 GMT, Adam Sotona wrote: >> Hi, >> During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. >> One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. >> >> I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. >> >> Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. >> >> Thank you, >> Adam > > Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: > > added bug number Yeah, it has no real usage impact but such changes do require CSRs, like https://bugs.openjdk.org/browse/JDK-8305158 for a `final` on `Arrays` class, so you should include the `final` modifier change in your CSR's specdiff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19006#issuecomment-2085221255 From gcao at openjdk.org Tue Apr 30 13:03:16 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Apr 2024 13:03:16 GMT Subject: RFR: 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions Message-ID: Hi, We want to support vector-scalar and vector-immediate bitwise logic instructions, It was implemented by referring to RVV v1.0 [1]. please take a look and have some reviews. Thanks a lot. We can use the Int256VectorTests.java[2] to print the compilation log, verify and observe the generation of nodes. For example, we can use the following command to print the compilation log of a jtreg test case: /home/zifeihan/jdk-tools/jtreg/bin/jtreg \ -v:default \ -concurrency:16 -timeout:50 \ -javaoption:-XX:+UnlockExperimentalVMOptions \ -javaoption:-XX:+UseRVV \ -javaoption:-XX:+PrintOptoAssembly \ -javaoption:-XX:LogFile=/home/zifeihan/jdk/Int256VectorTests_PrintOptoAssembly.log \ -jdk:/home/zifeihan/jdk/build/linux-riscv64-server-fastdebug/jdk \ /home/zifeihan/jdk/test/jdk/jdk/incubator/vector/Int256VectorTests.java we can observe the specified compilation log `Int256VectorTests_PrintOptoAssembly.log`, which contains the vector-scalar and vector-immediate bitwise logic node for the PR implementation. vand_immI Node 0b4 vloadcon V3 # generate iota indices 0bc vmla V2, V2, V3, V1 0c4 vand_immI V2, V2, #7 0cc addi R7, R30, #16 # ptr, #@addP_reg_imm 0d0 storeV [R7], V2 # vector (rvv) vor_regI Node 180 vor_regI V1, V1, R30 188 add R31, R14, R31 # ptr, #@addP_reg_reg 18a addi R31, R31, #16 # ptr, #@addP_reg_imm 18c storeV [R31], V1 # vector (rvv) 194 addiw R11, R11, #8 #@addI_reg_imm 196 blt R11, R13, B17 #@cmpI_loop P=0.500000 C=30564.000000 vxor_regI Node 198 vxor_regI V1, V1, R30 1a0 add R14, R16, R14 # ptr, #@addP_reg_reg 1a2 addi R14, R14, #16 # ptr, #@addP_reg_imm 1a4 storeV [R14], V1 # vector (rvv) 1ac addiw R11, R11, #8 #@addI_reg_imm 1ae blt R11, R13, B21 #@cmpI_loop P=0.500000 C=30564.000000 vand_regI_masked Node 234 B31: # out( B40 B32 ) <- in( B30 ) Freq: 78.5481 234 loadV V2, [R15] # vector (rvv) 23c vand_regI_masked V2, V2, R11 244 storeV [R9], V2 # vector (rvv) 24c mv R10, #8 # int, #@loadConI 24e ble R7, R10, B40 #@cmpI_branch P=0.000001 C=-1.000000 vor_regI_masked Node 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 1ee loadV V1, [R11] # vector (rvv) 1f6 vor_regI_masked V1, V1, R31 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 vxor_regI_masked Node 1ee B32: # out( B38 B33 ) <- in( B31 ) Freq: 75.8475 1ee loadV V1, [R11] # vector (rvv) 1f6 vxor_regI_masked V1, V1, R31 1fe addi R11, R13, #32 # ptr, #@addP_reg_imm 202 bgeu R29, R10, B38 #@cmpU_branch P=0.000001 C=-1.000000 vnotI Node 13c B23: # out( B52 B24 ) <- in( B22 ) Freq: 75.1106 13c loadV V2, [R16] # vector (rvv) 144 vnotI V2, V2 14c vand V1, V1, V2 154 bgeu R9, R12, B52 #@cmpU_branch P=0.000001 C=-1.000000 vnotI_masked Node 14a B19: # out( B22 ) <- in( B18 ) Freq: 0.99999 14a replicate_imm5 V1, #-3 152 vnotI_masked V1, V1, V0 15a -- // R23=Thread::current(), empty, #@tlsLoadP 15a mv R31, #0 # int, #@loadConI 15c j B22 #@branch We can test test/jdk/jdk/incubator/vector/Long256VectorTests.java in the same way, and looking at the Opto logs, we will see nodes similar to vand_regL?vor_regL?vxor_regL?vnotL. vand_regL Node 180 vand_regL V1, V1, R22 188 add R30, R17, R30 # ptr, #@addP_reg_reg 18a addi R30, R30, #16 # ptr, #@addP_reg_imm 18c storeV [R30], V1 # vector (rvv) 194 addiw R20, R20, #2 #@addI_reg_imm 196 blt R20, R15, B17 #@cmpI_loop P=0.500000 C=30564.000000 vor_regL Node 178 loadV V1, [R12] # vector (rvv) 180 vor_regL V1, V1, R22 188 add R30, R17, R30 # ptr, #@addP_reg_reg 18a addi R30, R30, #16 # ptr, #@addP_reg_imm 18c storeV [R30], V1 # vector (rvv) 194 addiw R20, R20, #2 #@addI_reg_imm 196 blt R20, R15, B17 #@cmpI_loop P=0.500000 C=30564.000000 vxor_regL Node 178 loadV V1, [R12] # vector (rvv) 180 vxor_regL V1, V1, R22 188 add R30, R17, R30 # ptr, #@addP_reg_reg 18a addi R30, R30, #16 # ptr, #@addP_reg_imm 18c storeV [R30], V1 # vector (rvv) 194 addiw R20, R20, #2 #@addI_reg_imm 196 blt R20, R15, B17 #@cmpI_loop P=0.500000 C=30564.000000 vand_regL_masked Node 1da B31: # out( B37 B32 ) <- in( B30 ) Freq: 75.8503 1da loadV V1, [R31] # vector (rvv) 1e2 vand_regL_masked V1, V1, R11 1ea addi R31, R10, #32 # ptr, #@addP_reg_imm 1ee bgeu R30, R29, B37 #@cmpU_branch P=0.000001 C=-1.000000 vor_regL_masked Node 1da B31: # out( B37 B32 ) <- in( B30 ) Freq: 75.8503 1da loadV V1, [R31] # vector (rvv) 1e2 vor_regL_masked V1, V1, R11 1ea addi R31, R10, #32 # ptr, #@addP_reg_imm 1ee bgeu R30, R29, B37 #@cmpU_branch P=0.000001 C=-1.000000 vxor_regL_masked Node 1da B31: # out( B37 B32 ) <- in( B30 ) Freq: 75.8503 1da loadV V1, [R31] # vector (rvv) 1e2 vxor_regL_masked V1, V1, R11 1ea addi R31, R10, #32 # ptr, #@addP_reg_imm 1ee bgeu R30, R29, B37 #@cmpU_branch P=0.000001 C=-1.000000 vnotL Node 0f4 B17: # out( B38 B18 ) <- in( B16 ) Freq: 76.238 0f4 # castII of R19, #@castII 0f4 addw R30, R19, zr #@convI2L_reg_reg 0f8 slli R30, R30, (#3 & 0x3f) #@lShiftL_reg_imm 0fa add R13, R31, R30 # ptr, #@addP_reg_reg 0fe addi R13, R13, #16 # ptr, #@addP_reg_imm 100 loadV V1, [R13] # vector (rvv) 108 vnotL V1, V1 110 bgeu R19, R12, B38 #@cmpU_branch P=0.000001 C=-1.000000 [1] https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc [2] https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/Int256VectorTests.java ### Testing - [x] Run tier1-3 tests on SOPHON SG2042 (release) - [x] test/jdk/jdk/incubator/vector (fastdebug) qemu 8.1.50 with UseRVV ------------- Commit messages: - Polishing Code comment - Add vand/vor/vxor predicated Node - Polishing Code Comment - 8331281: RISC-V: C2: Support vector-scalar and vector-immediate bitwise logic instructions Changes: https://git.openjdk.org/jdk/pull/18999/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18999&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331281 Stats: 471 lines in 2 files changed: 469 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18999/head:pull/18999 PR: https://git.openjdk.org/jdk/pull/18999 From asotona at openjdk.org Tue Apr 30 13:26:19 2024 From: asotona at openjdk.org (Adam Sotona) Date: Tue, 30 Apr 2024 13:26:19 GMT Subject: RFR: 8331291: java.lang.classfile.Attributes class performs a lot of static initializations [v3] In-Reply-To: References: Message-ID: > Hi, > During performance optimization work on Class-File API as JDK lambda generator we found some static initialization killers. > One of them is `java.lang.classfile.Attributes` with tens of static fields initialized with individual attribute mappers, and common set of all mappers, and static map from attribute names to the mappers. > > I propose to turn all the static fields into lazy-initialized static methods and remove `PREDEFINED_ATTRIBUTES` and `standardAttribute(Utf8Entry name)` static mapping method from the `Attributes` API class. > > Please let me know your comments or objections and please review the [PR](https://github.com/openjdk/jdk/pull/19006) and [CSR](https://bugs.openjdk.org/browse/JDK-8331414), so we can make it into 23. > > Thank you, > Adam Adam Sotona has updated the pull request incrementally with one additional commit since the last revision: changed order in allowed modules attributes check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19006/files - new: https://git.openjdk.org/jdk/pull/19006/files/27238368..f0d9174e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19006&range=01-02 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19006/head:pull/19006 PR: https://git.openjdk.org/jdk/pull/19006 From mdoerr at openjdk.org Tue Apr 30 14:01:15 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Apr 2024 14:01:15 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address Message-ID: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping addition and subtraction of `first()` which is a NOP. ------------- Commit messages: - 8331421: ubsan: vmreg.cpp checking error member call on misaligned address Changes: https://git.openjdk.org/jdk/pull/19022/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19022&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331421 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/19022.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19022/head:pull/19022 PR: https://git.openjdk.org/jdk/pull/19022 From mbaesken at openjdk.org Tue Apr 30 14:51:03 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 30 Apr 2024 14:51:03 GMT Subject: RFR: 8331421: ubsan: vmreg.cpp checking error member call on misaligned address In-Reply-To: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> References: <-0CT3e78TSiBvMrwImLSJDFJkQ7k7BwcMhfoW5tKklA=.aab953cb-9fb3-4b89-acf4-ae6967276c0b@github.com> Message-ID: <1yCwAi2Co7p0IZ8Z5-5rDm5VcQxV6jxI39EobHs8x_M=.36d441ed-a869-4ff1-8fbb-541957963cab@github.com> On Tue, 30 Apr 2024 13:56:07 GMT, Martin Doerr wrote: > As shown in the JBS issue, the Undefined Behavior Sanitizer complains about `VMRegImpl::stack_0()->value()`. This can easily be avoided by skipping the more complicated way which includes addition and subtraction of `first()`. looks good and fixes the error reported by ubsan. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19022#pullrequestreview-2031613469 From chagedorn at openjdk.org Tue Apr 30 15:43:13 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 Apr 2024 15:43:13 GMT Subject: RFR: 8331404: IGV: Show line numbers for callees in properties Message-ID: IGV shows the `bci` for a node in the callee, followed by the bci in the caller method and so on until we reach the root method. For the `line` property, we currently only show the line number found in the root method (`first()` is the root method being compiled and `second()` and `third()` are inlined): Example program: ![image](https://github.com/openjdk/jdk/assets/17833009/579fe9eb-4bd8-42d8-9d03-875f25bd97ae) Properties of the store to `fFld`: ![image](https://github.com/openjdk/jdk/assets/17833009/3763cccf-c1ba-4d7f-a986-eae8bf0654b0) One could read the line number from the `jvms` property above. But you would need to expand that property with the button on the right side which opens a window. But then you cannot click anything else anymore in IGV until you close the window again. A simpler and easier to read solution is to add the line number information to match the bci numbers (they are printed in callee->root method order which I think is okay - especially if there are a lot of inlinees, it could be easier to have the really interesting numbers at the start on the left side). This would look something like that: ![image](https://github.com/openjdk/jdk/assets/17833009/fcab3af6-69ac-43ae-89be-19fc4476d12f) If there is no line number information for a bci, I simply emit a `_`. Testing: - Manual testing in IGV - Sanity testing by running `java -Xcomp -XX:+PrintIdealGraph -XX:PrintIdealGraphLevel=4 -XX:PrintIdealGraphFile=graph.xml HelloWorld.java`. Thanks, Christian ------------- Commit messages: - 8331404: IGV: Show line numbers for callees in properties Changes: https://git.openjdk.org/jdk/pull/19025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331404 Stats: 51 lines in 2 files changed: 31 ins; 16 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19025/head:pull/19025 PR: https://git.openjdk.org/jdk/pull/19025 From chagedorn at openjdk.org Tue Apr 30 15:47:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 Apr 2024 15:47:04 GMT Subject: RFR: 8319957: PhaseOutput::code_size is unused and should be removed In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 17:31:45 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR removes the unused ```PhaseOutput::code_size / method_size```. > > These were moved over from ```src/hotspot/share/opto/compile.hpp``` in the refactor from [8240363](https://bugs.openjdk.org/browse/JDK-8240363). Here's the git link for reference https://github.com/openjdk/jdk/commit/21cd75cb98f658639df14632680e9c5e58f11faa. > > I also checked whether there were any usages prior to the refactor and couldn?t find anything so I think it?s safe to remove it. > > Thanks, > Sonia Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18981#pullrequestreview-2031824508 From chagedorn at openjdk.org Tue Apr 30 15:51:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 30 Apr 2024 15:51:12 GMT Subject: RFR: 8326742: Change compiler tests without additional VM flags from @run driver to @run main [v3] In-Reply-To: References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: On Thu, 25 Apr 2024 08:35:54 GMT, Evgeny Nikitin wrote: >> The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. >> >> I found only one test that seem to use driver mode incorrectly, this PR fixes it. >> Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Fix header Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18854#pullrequestreview-2031828449 From enikitin at openjdk.org Tue Apr 30 15:51:13 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 30 Apr 2024 15:51:13 GMT Subject: Integrated: 8326742: Change compiler tests without additional VM flags from @run driver to @run main In-Reply-To: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> References: <8EFM-vRCaordYQnZbG7ehiAwvt42FGPJjRrspfyq4Vw=.1a62c7a4-460b-4681-817c-f8e47a558841@github.com> Message-ID: On Fri, 19 Apr 2024 06:11:15 GMT, Evgeny Nikitin wrote: > The idea of the bug was to lower down the number of tests that use driver mode incorrectly. My investigation shows up that the vast majority of such tests start other processes and in fact are more of the wrappers around that processes. Most of the secondary processes are started using ProcessTools and therefore getting additional VM flags provided by user. > > I found only one test that seem to use driver mode incorrectly, this PR fixes it. > Testing: tiers1-5, linux-x64/aarch64, macosx-x64/aarch64, windows-x64, all in debug flavours. This pull request has now been integrated. Changeset: 130f71ca Author: Evgeny Nikitin Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/130f71cadca5b46d9bf589708dcea03ad51e8de0 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8326742: Change compiler tests without additional VM flags from @run driver to @run main Reviewed-by: kvn, thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/18854 From matsaave at openjdk.org Tue Apr 30 16:06:12 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 30 Apr 2024 16:06:12 GMT Subject: RFR: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow [v5] In-Reply-To: References: Message-ID: On Fri, 26 Apr 2024 21:04:39 GMT, Dean Long wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed empty line > > OK, except for the "another thread" part. The reads are done in the current thread, so that's the thread the barrier is for. > > // Prevents stale data from being read after the bytecode is patched to the fast bytecode Thank you for the reviews and detailed discussion @dean-long @coleenp @sunny868 @RealFYang! I think it is best to move forward with the current design. If we do find that the unneeded LoadLoad negatively impacts performance, I will follow up with an appropriate fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18477#issuecomment-2085778675 From matsaave at openjdk.org Tue Apr 30 16:06:13 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 30 Apr 2024 16:06:13 GMT Subject: Integrated: 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow In-Reply-To: References: Message-ID: On Mon, 25 Mar 2024 19:41:02 GMT, Matias Saavedra Silva wrote: > A misplaced memory barrier causes a very intermittent crash on on some aarch64 systems. This patch adds an appropriate LoadLoad barrier after a constant pool cache field entry is loaded. Verified with tier 1-5 tests. This pull request has now been integrated. Changeset: 9ce21d13 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/9ce21d1382a4f5ad601a7ee610bab64a9c575302 Stats: 29 lines in 5 files changed: 8 ins; 14 del; 7 mod 8327647: Occasional SIGSEGV in markWord::displaced_mark_helper() for SPECjvm2008 sunflow Reviewed-by: coleenp, fyang, dlong ------------- PR: https://git.openjdk.org/jdk/pull/18477 From stuefe at openjdk.org Tue Apr 30 16:16:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 30 Apr 2024 16:16:09 GMT Subject: RFR: 8331185: Enable compiler memory limits in debug builds [v2] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:30:30 GMT, Thomas Stuefe wrote: >> Opened a bug to track the memory limit break in compiler/loopopts/TestDeepGraphVerifyIterativeGVN.java. Since I cannot reproduce it, someone at Oracle should look at this: https://bugs.openjdk.org/browse/JDK-8331295 > >> Thank you, @tstuefe, for filing these bugs. >> >> One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? > > Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. > > The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > > > Thank you, @tstuefe, for filing these bugs. > > > One additional thing I noticed is that we don't produce compilation replay file (its size is 0) for such failures. Can you look why is that? > > > > > > Yes, its https://bugs.openjdk.org/browse/JDK-8331344 . I'll post a PR shortly. > > The problem behind this is more generic, namely that producing replay files needs resource area, and it shouldn't. We should not allocate resource area or heap in fatal error handling. But for now, I'll fix this locally by avoiding the recursion. > > Good. I think we need to push it before this PR. @vnkozlov Openend PR: https://github.com/openjdk/jdk/pull/19005 ------------- PR Comment: https://git.openjdk.org/jdk/pull/18969#issuecomment-2085824934 From epeter at openjdk.org Tue Apr 30 16:18:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Apr 2024 16:18:09 GMT Subject: RFR: 8331252: C2: MergeStores: handle negative shift values In-Reply-To: <8RcZ5sfJYBUPdA24EO2ut5mMFSCJFtFGNdQoTCmr84w=.650750f9-8312-4842-91a7-cb508325803f@github.com> References: <8RcZ5sfJYBUPdA24EO2ut5mMFSCJFtFGNdQoTCmr84w=.650750f9-8312-4842-91a7-cb508325803f@github.com> Message-ID: On Mon, 29 Apr 2024 16:47:03 GMT, Vladimir Kozlov wrote: >> Somehow, I have not thought of negative shift constants, and there was no regression test for it. >> The fuzzer now found a case. >> >> **I convert the assert into a condition.** > > Looks good. Thanks for the reviews @vnkozlov @shipilev ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19001#issuecomment-2085825255 From epeter at openjdk.org Tue Apr 30 16:18:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Apr 2024 16:18:10 GMT Subject: Integrated: 8331252: C2: MergeStores: handle negative shift values In-Reply-To: References: Message-ID: <2miV2eZFPj_H5p_-VVhqDfwON-f6PfVSwfK4WNJzyug=.a0166c9f-e5d2-4c42-b9d2-1d8f600a3b22@github.com> On Mon, 29 Apr 2024 15:43:06 GMT, Emanuel Peter wrote: > Somehow, I have not thought of negative shift constants, and there was no regression test for it. > The fuzzer now found a case. > > **I convert the assert into a condition.** This pull request has now been integrated. Changeset: 3d11692b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3d11692bf369af951867209962e8bf5886d1655f Stats: 26 lines in 2 files changed: 23 ins; 0 del; 3 mod 8331252: C2: MergeStores: handle negative shift values Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/19001 From kvn at openjdk.org Tue Apr 30 16:25:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Apr 2024 16:25:06 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v3] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 21:19:33 GMT, Cesar Soares Lucas wrote: >> The logic in reduce allocation merges (RAM) makes use of `PhaseMacroExpand:;can_eliminate_allocation` to check whether an allocation can be scalar replaced. However, we can only SR allocations of exact types - due to rematerialization logic. >> >> The scalar replacement logic not related to RAM has this check in `split_unique_types` so there is no performance regression by adding this check here. >> >> Tested on Linux x64 tiers1-3. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix formatting. My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18851#pullrequestreview-2031942526 From kvn at openjdk.org Tue Apr 30 17:10:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Apr 2024 17:10:04 GMT Subject: RFR: 8331344: No compiler replay file with CompilerCommand MemLimit In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:39:33 GMT, Thomas Stuefe wrote: > When using the compiler memory limit with the crash suboption (e.g. `-XX:CompileCommand=MemLimit,*.*,1g~crash`), the JVM asserts but may fail to produce a replay file. We also may see partly corrupted hs-err files. > > This happens if the memory limit hit was caused by growing ResourceAreas, not the C2 node arena. We also use ResourceArea when producing the replay file. > > If those RA usages cause another Arena chunk to be allocated, we re-enter `CompilationMemoryStatistic::on_arena_change` recursively, possibly multiple times. This will at least prevent replay file generation, but also may abort error handling altogether if a stack overflow happens. > > The patch prevents that recursion. It would be better to prevent replay file generation from using RA altogether, but this would be a larger patch and difficult to keep from bitrotting. > > Also provided regression test. > > Tested: > > - manually on Linux x64 and MacOS m1, with and without an artificially inflated resource area usage that reliably triggers the error. With the patch, the error is gone. > - GHAs Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19005#pullrequestreview-2032042263 From cslucas at openjdk.org Tue Apr 30 17:31:05 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 30 Apr 2024 17:31:05 GMT Subject: RFR: 8330247: C2: CTW fail with assert(adr_t->is_known_instance_field()) failed: instance required [v3] In-Reply-To: References: Message-ID: <2wEKGHLX1GXVrGDKHsGeF-3e6CUs9szc25F181BrDDA=.ae9ba2ca-cbce-4117-84ed-a6bc3d5faeaa@github.com> On Mon, 29 Apr 2024 23:01:03 GMT, Vladimir Kozlov wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix formatting. > > I submitted testing. Thank you @vnkozlov ------------- PR Comment: https://git.openjdk.org/jdk/pull/18851#issuecomment-2086133602 From rcastanedalo at openjdk.org Tue Apr 30 18:55:04 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 30 Apr 2024 18:55:04 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic Message-ID: This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). The main changes are: - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. #### Testing - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! ------------- Commit messages: - Generalize barrier stubs Changes: https://git.openjdk.org/jdk/pull/19026/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19026&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331418 Stats: 118 lines in 9 files changed: 66 ins; 37 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19026.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19026/head:pull/19026 PR: https://git.openjdk.org/jdk/pull/19026 From sviswanathan at openjdk.org Tue Apr 30 20:24:54 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 30 Apr 2024 20:24:54 GMT Subject: RFR: 8328998: Encoding support for Intel APX extended general-purpose registers [v9] In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 23:54:19 GMT, Steve Dohrmann wrote: >> Add instruction encoding support for Intel APX extended general-purpose registers: >> >> Intel Advanced Performance Extensions (APX) doubles the number of general-purpose registers, from 16 to 32. For more information about APX, see https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. >> >> By specification, instruction encoding remains unchanged for instructions using only the lower 16 GPRs. For cases where one or more instruction operands reference extended GPRs (Egprs), encoding targets either REX2, an extension of REX encoding, or an extended version of EVEX encoding. These new encoding schemes extend or modify existing instruction prefixes only when Egprs are used. > > Steve Dohrmann has updated the pull request incrementally with one additional commit since the last revision: > > fixes: pp bits in crc32, REX2 branch in ldmxcsr SHA instructions (sha1rnds4, sha1nexte, sha1msg1, sha1msg2, sha256rnds2, sha256msg1, sha256msg2) needs to be encoded using EVEX encoding when egprs are in use. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18476#issuecomment-2087093721 From vlivanov at openjdk.org Tue Apr 30 21:14:53 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 30 Apr 2024 21:14:53 GMT Subject: RFR: 8322726: C2: Unloaded signature class kills argument value In-Reply-To: References: Message-ID: <5ZCK0UQ2KzQmExuEtF28fnhM0QQUQERPVG34i3moM-0=.ef2f6651-9eb8-4432-8486-cc7ece273a6f@github.com> On Mon, 29 Apr 2024 22:01:37 GMT, John R Rose wrote: >> For MethodHandle linkers all arguments are casted to signature classes when target method is known. >> >> It causes problems when target method signature contains unloaded classes: when loaded class meets unloaded class it turns into a TOP. It effectively kills argument values which correspond to unloaded signature types. >> >> Proposed fix avoids casts when signature class is unloaded. >> >> Testing: hs-tier1 - hs-tier4 > > When a parameter happens to be typed as an unloaded class, the call site can be can be compiled and survive a long time in optimized form, as long as only nulls as passed. This is because (a) methods do not necessarily issue casts to their arguments, and (b) even if a `checkcast` instruction is issued, it short-circuits on null, without even trying to resolve the `checkcast` class. C2 supports this corner case, in part, using the `assert_null` IR generation option, which says ?as long as this value is null, keep it, otherwise recompile?. > > If we want to emulate this for MHs, we need to ensure that null short-circuits, even if non-null values must somehow cope with the unloaded class. > > (It?s not too hard, but requires some change to configuration, since the non-null value is a witness that the class has been loaded, so the existing code shape, which assumes there is no resolution for the class, is out of date. BTW a return type can also be an unloaded class, with similar considerations.) > > So ,what do we do on the case of a null value of an unloaded class parameter (or return value) when a call instruction is being emulated by a MH? And, does this fix change that policy? @rose00 in this particular case, the problem arises for locally not-yet-loaded classes. Moreover, "truly" unloaded signature classes are not possible for MethodHandles, because their signatures are reified as MethodTypes. So, asserting null is too strong. The fix aligns the behavior with bytecode invoke instructions where an unloaded signature class blocks inlining, but no additional checks are issued. I have a follow-up enhancement to improve behavior for MethodHandles and completely eliminate the case of locally unloaded classes. For bytecode invoke instructions, JVM installs loader constraints for signature classes. We don't do that for MethodHandles. Instead, signature classes are eagerly loaded. But it is omitted for `java.*` classes (as an optimization) and those are the only cases which cause the problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18973#issuecomment-2087339051 From mbalao at openjdk.org Tue Apr 30 21:22:53 2024 From: mbalao at openjdk.org (Martin Balao) Date: Tue, 30 Apr 2024 21:22:53 GMT Subject: RFR: 8327381: Refactor type-improving transformations in BoolNode::Ideal to BoolNode::Value [v4] In-Reply-To: References: Message-ID: On Mon, 18 Mar 2024 04:36:18 GMT, Quan Anh Mai wrote: > `(x & m) u< m + 1` is false for `m = -1`, right? > This bug should be handled separately. I'll do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18198#issuecomment-2087363670 From mdoerr at openjdk.org Tue Apr 30 21:37:57 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Apr 2024 21:37:57 GMT Subject: RFR: 8331418: ZGC: generalize barrier liveness logic In-Reply-To: References: Message-ID: On Tue, 30 Apr 2024 18:43:03 GMT, Roberto Casta?eda Lozano wrote: > This changeset generalizes the logic to analyze, declare, and communicate which registers are live at a C2 barrier stub so that it can be used by other collectors than ZGC adopting the late barrier expansion model (including G1 in the near future, see [JEP 475](https://openjdk.org/jeps/475)). > > The main changes are: > > - Make it possible to compute register liveness information before (live-in) or after (live-out) each barrier, and let the collector choose by implementing `BarrierSetC2State::needs_livein_data()`. > > - Generalize the interface with which collectors declare which registers must be additionally preserved across barrier runtime calls, adding the methods `BarrierStubC2::preserve(Register r)` and `BarrierStubC2::dont_preserve(Register r)`. > > - Simplify the interface with which platform-specific logic computes which registers to preserve across barrier runtime calls, replacing the calls to `BarrierStubC2::result()` and `BarrierStubC2::live()` with a single call to `BarrierStubC2::preserve_set()`. > > #### Testing > > - tier1-5 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64; release and debug mode). > - tier1-7 (linux-x64, linux-aarch64; release and debug mode; ZGC tests only) with [an additional patch](https://github.com/openjdk/jdk/commit/4d4e743d8f4cddd5288cee1d69c70fe2b9bea066) that exercises the spilling and restoring logic by forcing ZGC read barriers to always take the slow path and clearing all general-purpose save-on-call registers upon the slow path's runtime call. > - Build with `make hotspot` (linux-riscv64-debug, linux-ppc64le-debug). @RealFYang, @TheRealMDoerr: could you please test and review the riscv and ppc changes? Thanks! src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 127: > 125: while (OptoReg::is_reg(reg)) { > 126: const VMReg vm_reg = OptoReg::as_VMReg(reg); > 127: if (!(vm_reg->is_Register()) || vm_reg->as_Register() != r) { This doesn't work on PPC64: We run into "assert(is_Register() && is_even(value())) failed: even-aligned GPR name" (vmreg_ppc.hpp:54). Calling `as_Register()` is only supported for the even ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19026#discussion_r1585549387 From dlong at openjdk.org Tue Apr 30 22:52:51 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 30 Apr 2024 22:52:51 GMT Subject: RFR: 8331088: Incorrect TraceLoopPredicate output In-Reply-To: References: Message-ID: On Mon, 29 Apr 2024 18:11:51 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8331088](https://bugs.openjdk.org/browse/JDK-8331088) fixing the incorrect print output. > > Thanks, > Sonia Marked as reviewed by dlong (Reviewer). Looks trivial and OK to push with 1 review. ------------- PR Review: https://git.openjdk.org/jdk/pull/19004#pullrequestreview-2032750892 PR Comment: https://git.openjdk.org/jdk/pull/19004#issuecomment-2087655936